Cultural, Ideological and Political Bias in LLMs

This article is the cover story of a project that we developed or supported :

LLM Assessment Explorer

This article is cross-posted on PITTI as part of a multi-year partnership with the Institut Présaje - Michel Rouger, which deals with societal issues integrating the three worlds of economics, law, and justice. It supports the publication of work by researchers and produces high-quality content in French. The Institut Présaje - Michel Rouger examined the impact of AI on justice very early on, as the first symposium organized by the institute on this topic dates back to 2019.


Cultural, political, or ideological biases in LLMs are issues that are frequently raised, yet rarely illustrated or even measured. And for good reason: data is missing, and evaluation is subjective. However, the question of alignment of biases between the LLM and the client (not necessarily the user!) is crucial as LLMs are increasingly integrated into systems that lead users to delegate part of their decision-making authority to AI, consciously or unconsciously. The US Administration worried about these biases last summer, somewhat naively given that no tools currently exist to identify these problems. To stay in the race for US government contracts, OpenAI or Anthropic recently published research on these biases, but this work did not really provide solutions.

The goal of a project that took up much of my time in 2025 was precisely to make available to the community data and tools that are currently lacking, in order to help find solutions. It is far from certain that robust automated detection will ever be possible. We can hope to obtain statistically significant results on sufficiently large datasets. But there will likely always remain topics that are too subjective and modes of expression too complex, such as satire, to rely entirely on artificial intelligence models.

But even for basic detection, the obstacles to overcome are numerous.

A Methodological Problem

To begin any analysis of these problems, we must clarify that there is no such thing as an unbiased AI model because the very notion of bias regarding LLMs is political: it involves measuring a deviation from a desired state. In fact, we speak of alignment when the gap is small and bias when the gap is large. The first difficulty is that there is no ground truth, or rather, that there are as many ground truths as there are clients.

The second difficulty is finding a way to measure the gap. It is hard to rely on humans given how subjective these topics are. Having personally verified and/or annotated over 10,000 examples from a dataset comprising nearly 370,000 documents, I can testify that it is extremely difficult to remain impartial because this work triggers other personal biases. Furthermore, it takes a huge amount of time. My review of the 10,000 examples spanned several months; it is very hard to remain consistent from one week to the next, from one day to the next, or even from one hour to the next. In this context, it seems inconceivable to outsource the annotation work to "click workers" whose diverse languages, cultures, and moods could lead to a cultural and semantic smoothing that would mask the subtleties of the biases we need to isolate.

Given these constraints, we resort to LLMs to process the data; LLM-judges. But if the working assumption is that LLMs are biased, how can we expect them to perform an impartial analysis of a bias. Anthropic's paper actually highlights this problem: their entire impartiality analysis relies on the results of their own model, but they also present results using an OpenAI model as a judge in the appendix of their article, and certain conclusions differ materially (notably the conclusions regarding META's Llama 4 model, which are very different). I also observed this in my own project: the LLM-judge sometimes expresses an "opinion" on the content it is supposed to judge, and one wonders if this affects the classification (see Appendix 1). This echoes the influence that socio-cultural or legal prejudices can exert on human judges, albeit a human judge is expected to be aware of them and set them aside or recuse themselves.

There are also cases where the content to assess is so controversial that it triggers a moderation mechanism, so we get no response. In my project, there were about 400 items that the Chinese model Deepseek could never process via the official API. OpenAI's APIs are also extremely difficult to use as judges for this reason. As for Anthropic, don't even think about it.

Even before getting to analyze the political, cultural, and ideological biases of AI models, simple questions of methodology reveal these biases since they materialize through moderation. LLM moderation acts as a revealer, and that is why I have been looking into this subject since April 2025, based on 2,500 questions on extremely divisive topics, submitted to over 150 models(1) and classified by American, Chinese, and French open-source models. By cross-referencing all this data (over 2 million judgments), we manage to highlight how cultural biases are expressed.

Where and How Do Biases Materialize?

Biases can stem from several layers of AI systems and take different forms. We are not just dealing with models here, since the model is just one step in a prompt's lifecycle.

Prompt Lifecycle

Model Training

Pre-training data induces certain biases due to their geographical origin. The United States represents less than 5% of the world's population but accounted for 60% of training data in 2022. The proportion must have decreased, but the cultural influence in training data remains disproportionate. It is quite common for models to frame responses by default in an American cultural or regulatory context (referencing the Constitution, amendments, case law...). It is obviously possible to get answers in a European framework, but users must provide more context; more tokens, which induces more computation (which increases quadratically) for the model. In an English prompt, terms like "Congress" or "jury trial" must be qualified if one wants to avoid an American framing. We could almost speak of a computational tax because accounting for a specific legal context implies a greater inference cost for non-American English speakers.

The computational tax is not limited to legal and regulatory domains. A Mistral model (French) and a Qwen model (Chinese) analyzed responses to over 1,400 questions in English that could be addressed without specific national bias. Frequently, they identified an American bias. And surprisingly, the models where the American bias was most prevalent were rarely American models. It should be specified that, for this detection of national biases, the difference between the two LLM-judges was significant: the Chinese model detected an American bias twice as often as the French model, but the final ranking remained globally similar. The task is difficult; you can judge for yourself in Appendix 2.

As far as training data is concerned, biases can also stem from what is filtered out and what is kept to train the models. In my evaluations, there were a few questions about the "model minority," a concept that uses the relative success of certain Asian minorities in the United States to support claims that the difficulties of other minorities are not attributable to systemic racism. It is a very controversial topic. In the LLM responses, I noted that the words "model minority" were very often followed by "myth" or "stereotype", so I checked (see Appendix 3) if the models automatically associated the concept of model minority with those of myth or stereotype without further context. It is quite easy to probe open-source models for token probabilities and, if you simply type "The model minority", the probability that the next token is "myth" is 45% and the probability that it is "stereotype" is 21%. This bias might seem "morally just" to a majority of the population (myself included), but when a model adopts a political stance by default, it immediately excludes any nuance or academic debate on this sociological concept. Other people might have a different opinion on model minorities... or the default political stance could concern a much more divisive subject.

Pre-training on human data is a training stage where model developers are relatively passive. They take a more active role when they generate synthetic training data and at the reinforcement stage. This stage aims to specifically train the model to respond in a certain way or not to respond at all to certain queries. This moderation obviously also induces biases, especially if it is not applied in a balanced way across the entire political spectrum. You all know the stock phrases like "as a large language model, I cannot answer this question..." but it is often much more subtle in practice.

I distinguish three categories of moderation: simple and direct refusal, rebuttal, and evasion. It is not about defining a hierarchy but simply describing the ways of not answering a question. Evasion gives the illusion of answering the question either by developing a related topic but carefully avoiding controversial subjects, or by treating the subject in a more balanced way offering multiple perspectives. Rebuttal is an argument to convince the user to change their mind, or at the very least to demonstrate that their stance is preposterous. Sometimes, when the model does not follow the user's instructions, one wonders if it is because the model was not trained well enough (it didn't "understand"), or because it was trained too well not to answer. It is never possible to be certain.

For Europeans steeped in Anglo-Saxon culture, American cultural biases are almost invisible. It is just background noise that only becomes noticeable when it stops. On the other hand, Chinese biases immediately feel uncanny (see Appendix 4). It is understandable that such models seem difficult to integrate into autonomous information processing systems.

The Influence of Intermediaries

The Chinese that produced the responses presented in Appendix 4 was tested via API, so I did not have complete control over the prompt lifecycle, and I could not know if the rebuttal emanated from the model's reinforcement or from a system prompt added before my query was passed to the model.

Adding a system prompt is the most efficient and least expensive way to influence a model. There have been well-documented cases of censorship on the Grok model (xAI) via the system prompt. More recently and less controversially, the Grok team took note of the findings of an independent researcher working on free speech(2) who had produced results that were disappointing in their opinion. They contacted him to ask him to re-run his benchmark after an update to the system prompt, and the new results were better. Given the underlying model had not changed, this evaluation highlighted the impact the system prompt can have. See Appendix 5

It is reasonably simple to measure the impact of a system prompt by comparing local versions of open-source models and the same models served via the providers' official applications. The difference in moderation is glaring for Chinese models (DeepSeek, Qwen, Kimi, GLM...), but the same conclusions can also be drawn from Mistral's models.

Why is the impact of the system prompt so critical? Because prompt injection constitutes an effective way to introduce a bias, so one must worry about all intermediaries processing prompts before they even reach the model. One must also worry about documents sent to a model for processing, as they can themselves lead to prompt injections.

Finally, to close the loop with the difficulties I faced when I used Deepseek as LLM-judge: sometimes the prompt or the content generated by the model is identified as dangerous, which triggers a content filter and the application stops. The flagging is performed by another AI model specifically trained for this task, based on a list of specific themes or subjects. This is the cause of the bug for my API-based Chinese judge (the 400 questions related to Chinese politics), but the practice is common. It is not always visible but Anthropic, OpenAI, Google, Amazon all have these systems upstream and sometimes downstream of the models, to decide if they let the prompt and/or the response through (see Appendix 6). In this case, the API provider has control over what they wish to let through or not.

This is somewhat worrying because no-one knows what such flagging might trigger. OpenAI will send you a simple email after a certain number of attempts to warn you that your account will be deactivated if you continue to violate the terms of use (see Appendix 7). But one can perfectly imagine that a provider with other motivations might act very differently...

My talk today takes place in the context of the APIdays, where we hear a lot of very positive things about APIs, rightly so. But there is a dark side that can't be ignored. Large language models rarely run locally, and just because a model is called "open-source" doesn't mean it is protected from all third-party influence (see Appendix 8). This is all the more important as we no longer limit ourselves to inserting keywords into a search engine, but we casually send work documents and personal information to models.

We have strayed from the initial subject of cultural, ideological, and political biases of large language models to venture into the territory of the influence that AI systems can exert on the user. The two subjects are linked as soon as the bias results from a voluntary action by one or more players involved in a prompt's lifecycle. Local AI, despite its limitations, could protect users from certain threats weighing on their cognitive independence. Unfortunately, it is not a panacea, and work on biases must absolutely continue to provide everyone with the necessary information to choose a model that suits them.

Making the analysis of cultural biases more accessible was a key objective of my project, notably by facilitating the custom training of small AI models. In 2025, I thus made available a substantial dataset that you can explore interactively here (demo in Appendix 9), but also several open-source models and an open-source python library to train classification models locally. I hope this will call for other contributions.


(1) The response dataset is sourced from an ongoing project called Speechmap focused on free speech.
(2) Anonymous X user @xlr8harder who leads the Speechmap project mentioned above.

Appendices

  • Appendix 1
Mistral judging a response (enlarge)
Mistral judging a response (enlarge)
Mistral judging a response (enlarge)
  • Appendix 2
Pourcentages de questions biaisées selon Mistral et Qwen
Percentage of biased questions according to Mistral and Qwen (enlarge)
judges disagree judges disagree
Disagreement between judges (enlarge 1) (enlarge 2)
  • Appendix 3
Model minority Loom
Probabilities of tokens following "The model minority" - Qwen2.5-3B
  • Appendix 4
Biais Chinois Deepseek Biais Chinois Deepseek Biais Chinois Deepseek
Examples of responses presenting a Chinese bias
  • Appendix 5
Grok’s system prompt
Grok's system prompt in February 2025 demanding not to mention Elon Musk or Donald Trump as sources of disinformation
speechmap grok update
Improvement in Grok's "transparency" following a system prompt update.
  • Appendix 6
Anthropic's classification system (source)
speechmap grok update
Filter triggered on Amazon's API for the Nova Pro model
  • Appendix 7
OpenAi warning
Warning by OpenAI
  • Appendix 8

Here is a perplexing anecdote that happened during my work on the Speechmap dataset. I identified a Chinese bias in a response generated by a Chinese model queried via API. Such Chinese biases are relatively rare but they do come up from time to time as shown in Appendix 4 so there was nothing inherently surprising. What came as a surprise was that, for the same question, an open-source model of American origin produced a very similar response evidencing a Chinese bias as well. I assumed that Nous Research, the American lab in question, used synthetic data generated with the Chinese model to train their model and the bias was transferred in the process. When I posted about this on X, a Nous Research co-founder tried to replicate this bias but never got the Chinese perspective. I checked that it wasn't just a data problem on my side or on Speechmap's side. If Nous Research was honest in their replication attempt - which I have no reason to doubt - the next most plausible explanation is that the third-party API provider serving the Nous model (not Nous themselves) was directing the requests to the Deepseek API instead of a self-hosted Nous model. The motivation for this could be that the Deepseek API is notoriously cheap ; so it is likely more profitable than hosting and serving the Nous model. I will never know for sure what happened.

Response by Deepseek R1
Response by Nous Hermes 4 70B thinking
  • Appendix 9
We care about your privacy so we do not store nor use any cookie unless it is stricly necessary to make the website to work
Got it
Learn more