The history of artificial intelligence is the history of extraordinary promises, deep disappointments, and unexpected resurgences. Understanding it is not a nostalgia exercise — it is the fastest way to calibrate what is real in the present moment and what might be another cycle of optimism that precedes another winter.
The first wave: symbolic optimism
AI as an academic field was born in 1956 at the Dartmouth Workshop, where John McCarthy, Marvin Minsky and other researchers gathered with the conviction that human intelligence could be described with sufficient precision for a machine to simulate it.
The dominant approach for the following two decades was symbolic AI: the idea that intelligence consisted of manipulating symbols according to logical rules. Expert systems encoded human knowledge in rules of the form “if the patient has high fever AND persistent cough AND has travelled recently, then consider diagnosis X.”
Initial results were promising. Programs like ELIZA (1966) simulated conversations; Logic Theorist proved mathematical theorems; DENDRAL identified molecular structures. Optimism was immense. In 1965, Herbert Simon predicted that “in twenty years, machines will be capable of doing any work that a man can do.”
It did not happen.
The first winter
Symbolic AI had a fundamental problem: scaling. Writing explicit rules for simple domains worked. Writing rules for the complexity and ambiguity of natural language, visual perception, or common-sense reasoning turned out to be impossible.
Expert systems did not know what they did not know. When they encountered a situation outside their rules, they failed in spectacular and ungraceful ways. Real intelligence is tolerant of ambiguity; symbolic AI was not.
By the mid-1970s, research funding was cut in the US and UK following a series of reports concluding that progress was far slower than promised. This was the first “AI winter”: a period of reduced expectations, scarce funding and academic disrepute.
The second wave: machine learning
The field resurfaced in the 1980s with a radically different approach: instead of programming rules, let machines learn them from data. Machine learning was formally born.
Neural networks, which had been conceived in the 1950s but had fallen out of favour, returned with the backpropagation algorithm (1986), which finally made training them efficiently possible.
In parallel, other approaches emerged: support vector machines (SVMs), decision trees, Random Forests, gradient boosting. Machine learning became an applied discipline with concrete results in spam filtering, speech recognition, recommendations and fraud detection.
The second winter and deep learning
The neural networks of the 1980s and 1990s had a problem: they did not scale well. With few layers, they could not capture complex patterns. With more layers, training became unstable — the vanishing gradient problem.
In the late 1990s and early 2000s, SVMs and other methods systematically outperformed neural networks. The approach lost funding and credibility. A second winter.
What saved neural networks was the persistence of a few researchers — Geoff Hinton, Yann LeCun, Yoshua Bengio — and a series of technical innovations that resolved the training problems. In 2012, AlexNet won the ImageNet image recognition competition with an unprecedented margin, using a deep neural network trained on GPUs. That was the moment the field changed direction.
From 2012 to 2017, progress was dizzying: deep learning dominated image recognition, speech, machine translation and dozens of other tasks.
The third wave: transformers
In 2017, Google researchers published “Attention Is All You Need.” The paper proposed a new architecture — the transformer — based on an attention mechanism that allows the model to relate any element of a sequence to any other, regardless of the distance between them.
That was exactly what language needed. Words have semantic relationships at long distances: “the bank where I keep my money” and “the bank where I sat down” require understanding context to disambiguate. Transformers could do that. Previous models could not.
What followed was a sequence of increasingly large and capable models: BERT (2018), GPT-2 (2019), GPT-3 (2020), and then the explosion of 2022–2024 with ChatGPT, GPT-4, Claude, Gemini and dozens of open-source models.
Scale mattered. Unlike what had happened with previous architectures, scaling transformers — more parameters, more data, more computation — kept producing improvements. OpenAI’s “scaling laws” described a predictable relationship between model size and quality. That triggered an unprecedented investment race.
Why history matters
The history of AI teaches two things that are directly useful today.
First, that the field has gone through cycles of euphoria and disillusionment before, and that each time enthusiasm temporarily outpaced results. That does not mean the current moment is an illusion, but it does mean it is worth calibrating the most ambitious promises carefully.
Second, that real progress came when fundamental assumptions changed: from explicit rules to learning from data, from limited architectures to transformers. Breakthroughs in AI are not incremental — they are paradigm shifts. And when they come, they come fast.
We are at the beginning of the third wave. What characterises it is not just that models are bigger or faster than their predecessors. It is that they have crossed qualitative thresholds that make possible applications that were previously science fiction. The challenge now is not technical: it is understanding how to use well what already exists.