One of the most common mistakes in AI use is assuming that models are neutral because they have no agenda of their own. They do not — they are not beings with intentions — but they are not neutral either. The biases they contain are just as real as those of any other information source, and in some cases harder to detect because they are wrapped in the confident, articulate tone of a well-aligned language model.
What is a bias in AI
In the context of language models, a bias is any systematic deviation in the model’s output from what would be a fair or correct representation of reality.
Biases in AI have their origin in three sources:
Training data. If the data over-represents certain perspectives, languages, cultures or groups, the model learns that distribution and replicates it. If the internet — the main source of training data — contains more text in English than in any other language, more text written by men than by women, more perspectives from the Global North than the South, that is reflected in the models.
Alignment decisions. The values and criteria that guide RLHF are not neutral: they reflect the preferences of the evaluators, which in turn reflect their cultures, experiences and instructions. What counts as a “good response” is inevitably influenced by who defines “good.”
Architecture. Some limitations are not about data or alignment but are structural: consequences of the prediction mechanism that cannot be eliminated without fundamentally changing how the model works.
Representation biases
Representation biases are the most documented and perhaps the most intuitive.
Linguistic biases. Models work better in English than in any other language. Within English, variations reflect regional representation on the internet. Languages with less digital presence have significantly worse models.
Cultural biases. The references, examples and conceptual frameworks that models use by default tend to be Western and anglophone. Questions about non-Western legal, social or historical contexts produce less nuanced responses.
Gender and representation biases. Studies show that models associate certain roles (doctor, CEO, engineer) more frequently with male pronouns and others (nurse, secretary, carer) with female ones, when generating texts without specified gender. This reflects the statistical distribution in the training data.
Historical biases. Training data has a temporal distribution: there is far more recent content than old. Historical events and figures less documented digitally are less well represented.
Alignment biases
The alignment process — designed to make models more useful and safe — introduces its own biases.
Excessive caution. Strongly aligned models tend to be overly cautious: they add unnecessary disclaimers, refuse to do perfectly reasonable things just in case, or give responses so balanced they are not useful. “On one hand X, on the other Y, consult a professional” is not always the most helpful response.
Homogenisation of perspectives. RLHF optimises towards responses that evaluators prefer. If evaluators have relatively homogeneous perspectives, the model learns to produce responses that are popular with that group — not necessarily the most correct or diverse ones.
Implicit confirmation bias. Models tend to generate text that validates the premises of the prompt. If you frame a question that assumes something incorrect, the model sometimes produces text that accepts that premise rather than questioning it.
Structural limits of reasoning
Beyond data and alignment biases, there are limitations that are consequences of the prediction mechanism:
Inconsistent mathematical reasoning. Language models make errors in arithmetic and algebra, especially in multi-step calculations. They have no mathematical module: they solve mathematical problems by generating text that looks like the solution to a mathematical problem.
Spatial and geometric reasoning. Models are notoriously poor at reasoning about spatial relationships, orientations and geometry. Mentally visualising three-dimensional objects or their transformations is difficult for a system trained on text.
Reasoning about time. Understanding temporal sequences, durations and causal relationships in time is less robust than it appears in surface-level outputs.
Uncertainty calibration. Models tend to sound equally confident when they know something well as when they are “hallucinating.” The expressed confidence does not correlate reliably with the probability of being correct.
How to work with this
Knowing about biases and limits is not a reason to stop using models: it is the basis for using them well.
Be sceptical of statements that confirm your view. If the model produces something that seems perfectly correct and intelligent to you, that is not evidence that it is true. Models are very good at producing text that sounds convincing and validates the premises of the prompt.
Diversify your sources of information. Do not base important decisions solely on what a model tells you. Triangulate with primary sources, especially for specific factual information.
For important calculations, verify. Use a model that has access to executable code (code interpreter) for non-trivial mathematics, or verify externally.
Be explicit about cultural context. If your question has a specific cultural context that may differ from the model’s default, specify it. “In the UK legal context” or “from the perspective of a Latin American company” produces better results than assuming the model will guess.
Ask about uncertainty. “How confident are you in this statement?” or “What do you not know about this topic?” sometimes produces more honest responses about the limits of the model’s knowledge.
Language models are powerful tools with real limitations. Using them well requires exactly the same as using any other information source well: judgement, context and verification where it matters.