How we got to generative AI

From the origins to the Turing test

Interest in creating intelligent machines dates back at least two centuries, to the Charles Babbage's analytical engine and the related notes Ada Lovelace. After a period of relative torpor, it is with Alan Turing that the topic is regaining interest among scholars. It is no coincidence that even today reference is made to the Turing test to evaluate the ability of a machine or program to think. Developed in 1950 and presented in one of the most cited essays in the history of computing, Computing Machinery and Intelligence, published in the magazine All, the test proposes a simple and ingenious criterion:

a machine can be said to be "thinking" only if, when answering questions posed by an observer, it is unable to distinguish itself from a human being.
From expert systems to machine learning

In the 1960s and 1970s, artificial intelligence developed as a branch of computer science aimed at creating logic machines capable of making correct inferences from both a syntactic and semantic point of view. This trend culminates with the birth of the expert systems: programs capable of drawing on a knowledge base and inferring new, true knowledge, thanks to propositional logic and predicate calculus.

The results, however, quickly proved disappointing. Machines showed a certain rigidity and difficulty adapting to new situations, leading to a temporary decline in interest in AI.

In the years that followed, increasing computing power allowed the development of programs capable of excelling at specific tasks—sometimes better than humans. Two prime examples: Deep Blue, the IBM supercomputer that defeated world chess champion Garry Kasparov in 1997, and AlphaGo, developed by Google, the first software capable (in 2015) of beating a human master in the game of Go.

However, these are specialized machines, excellent in a single area but incapable of generalizing or sustaining a natural conversation: in other words, they would not pass the Turing test.

It is during this period that the neural networks, with the development of the machine learning algorithm and, subsequently, of the deep learning, in an attempt to more closely mimic some aspects of human intelligence.

The Transformer's Turning Point

The real revolution came in 2017 with the publication of an article destined to change the landscape of artificial intelligence forever: Attention Is All You Need, signed by a group of Google researchers.
The article introduces a new type of neural network, the Transform, which will become the engine of modern Large Language Models (LLM), like the ones behind ChatGPT.

A Transformer is a program that can predict the next word based on an input (prompt) provided by the user. Similar models were subsequently developed for other areas: images, sounds, and computer code.

From static embedding to context

Given that Transformers implement extremely complex and sophisticated statistical models, we can try to understand their fundamental principles, focusing on those that generate text.

The first step is to transform every word (in fact, every digital token) in a numeric vector in many dimensions.

In models like ChatGPT, each token is represented by a vector with over 12.000 dimensions: this means that each word is described by more than 12.000 numbers, each of which represents a different semantic feature.
For example, the word "king" may have a high value along the dimension associated with the concept of "candies" (e.g. 1 on a scale of 0 to 1) and significantly lower values ​​on other dimensions such as "poverty" o "submission".

This process is called static embedding and consists of training the machine by providing it with a large amount of examples: texts taken from the web, books, encyclopedias and other written material in digital format.
During the training, the words are connected to each other like the nodes of a semantic network, with parameters that express the numerical strength of the bond between them. For example, "king" e "crown" will have a high correlation value, while "king" e "tail" a very low one.

Models like ChatGPT-5 probably have hundreds of billions of these parameters, although the exact figure is not public.

The mechanism of attention

At this point we have a static model of the language. However, the meaning of words changes radically depending on their position, the sentence and, more generally, the context.
This is where the attention mechanism, which allows you to obtain a contextual embedding —that is, a mathematical representation of the word that takes into account the context in which it appears.

In practice, the values ​​of the vector representing a word (derived from the static embedding) are modified based on the parameters that connect it to other words in the text. Each word “points” or be careful the others, with different intensity, through the weights (attention scores) contained in the Q, K and V matricesThese matrices are learned during the training phase, which can be guided o unsupervised.

The result is a set of attention scores which indicate, in terms of probability, how relevant each word is with respect to all the others.
The procedure is repeated dozens of times (up to 96 in ChatGPT models), progressively refining the representation.

Finally, the result of this process is passed through a further neural network capable of grasping the nonlinear correlations —the nuances of the language, we might say.

The final product of this architecture, called encoder, are the contextually enriched embeddings, that is, vectors semantically enriched by context.
These are then sent to another architecture, the decoder, which applies a similar process to predict the next word, thus generating text in a coherent and fluid way.

Conclusion: Do machines really think?

Transformers are extraordinarily effective and precise, but they really think?
Or do they simply perform statistical calculations without understanding anything about what they produce?

If for to understand we mean to be self-aware, we can confidently say that no artificial intelligence has consciousness.

On the other hand, if we adopt a point of view behaviorist, the responses of generative AI are perfectly compatible with what, in functional terms, we define think.

Ultimately, it is not excluded that even the human brain operates similarly to a generative model, albeit with emerging qualities like self-awareness.

Transformers, in fact, also exhibit emergent phenomena: they produce responses and inferences that were not explicitly foreseen during training.

However, Artificial intelligence — at least for now — does not have sensory input comparable to human ones, and experience shows us that intelligence arises from the interaction between cognition e perception.
It also lacks emotions and feelings, essential components of human experience and thought.

In conclusion, we can say that intelligent systems they don't really think — not yet, at least, and they certainly don't pass the Turing test. But nothing prevents us from imagining that, just as human intelligence emerged from the organization of biological matter, one day human intelligence could emerge from the silicon substrate, a form of intelligence similar or even superior.

PS This text was written by me (PRF) and revised with the help of ChatGpt 5


We thank PAOLO RICCARDO FELICIOLI for his contribution

Jump to content