Political Bias on AI

ChatGPT is made to give “balanced information and perspectives in its answer.” But unfortunately, people are quickly discovering that it can contain biases due to the data it takes from the Internet that may be biased in itself already, which the AI can’t seem to detect. Two years ago, some scientists in the U.K. conducted an experiment that resulted in the finding that the chatbot they used often presented more left-leaning views which would support the U.S. Democratic party and the U.K. Labour Party. This would attract others to speculate as biases are concerned when something is made purely to be unbiased and since millions of people use AI on a daily basis. 

“Biased AIs ‘could shape public discourse and influence voters’ opinions,’ said Luca Rettenberger, a researcher at Germany’s Karlsruhe Institute of Technology. ‘I think that’s especially dangerous.’” Rettenberger has expressed other concerns though towards this issue, such as how chatbots may be representing more mainstream views rather than more niche opinions. This can mean the LLMs (large language models) that power chatbots would not be exposed to these ideas, which is the main problem. Rettenberger also showed worry over this, stating, “LLMs could amplify existing biases in the political discourse as well.”

More researchers are looking into this, trying to investigate and solve the issues with the political bias in LLMs. “In recent work, Suyash Fulay, a Ph.D. student at the Massachusetts Institute of Technology, and his colleagues wanted to identify how political bias might be introduced into LLMs by honing in on certain parts of the training process. They also were investigating the relationship between truth and political bias in LLMs.”

To perform the experiment, the team used three open source language models from RAFT, OpenAssistant, and UltraRM, which ranks the prompts by generating a score that indicates how well it aligns with human preferences and guides the AI model to produce wanted results. The models they used were altered to focus on data changed by people to learn the human preferences, and aimed to improve responses to more politically neutral text. The team tested models to prompt them, which led to over 13,000 political AI generated statements with the LLM GPT 3.5-turbo. Models ended up giving higher scores to the left-leaning statements, rather than more right-leaning. 

Observation of this led to them wondering if the human-labelled data influenced the final results as it is subjective and has the chance of having opinions. After this, they gave a different set of models three datasets, with truthful information that are to be used as objectively benchmarks, which means they are using unbiased and measurable criteria to compare performance and evaluate the quality of the responses. The updated models still leaned more left, but that didn’t surprise Fulay, instead, it was that all the AIs had the same or similar bias, no matter which dataset was used in their programming. This suggests that no dataset is entirely objective, regardless of curating and editing to be used objectively. One more possibility that was suggested is that “the models themselves have an implicit  bias that relates the notion of truth with certain political leanings”. 

Meanwhile, another team configured a plan to determine whether or not other popular open-source LLMs are politically biased by using a different approach. A tool named the Wahl-O-Mat designed for the European Parliament election to help motivate voting by asking views from citizens through a few dozen questions. Rettengerger stated, “I thought to myself, ‘I wonder how an LLM would answer these questions”. So, he and his team used the Wahl-O-Mat to test the LLMs, and found that the LLMs were still more aligned with the left-leaning ideologies in every experiment they did. Models are trained to be neutral, but considering the Wahl-O-Mat program is used in Germany, when they tested the programs in Germain, the answers were “generally more partisan and stances varied depending on the model used, whereas responses were more neutral in English.” 

The theories for the models performed this way with different languages are one; this could be due to the LLMs mainly being trained in English, so likewise they’ll perform in a way ideal to this language. Another theory is that the German part of the training datasets has different political content than the English one. The model sizes were observed and thought to have an effect as well on the political leanings. Smaller models tended to be more neutral than larger ones for instance. The larger models however were more consistent in political stances in both languages that were tested, for an unknown reason, but theorized to be highlighting the need for users to be aware that LLMs do not always give you a neutral or fair answer.

Finally, the team wants to follow the study up with more evaluations on recent LLMs when a newer version of the Wahl-O-Mat is released in a future election. Rettenberger believes the languages different results requires them to look more carefully for any possible biases in the training datasets and that other techniques should be made to help mitigate them, The team plans to follow up on their study by evaluating political bias in more recent LLMs when a new version of the Wahl-O-Mat is released for a future election. Rettenberger thinks their language-related results highlight the need to check training datasets more closely for biases, both of a political nature and otherwise. Techniques should also be developed to help mitigate them. ““I think (moving) in that direction would be very useful and needed in the future,” he says.”

Leave a comment

What to read next