From chess-playing robots to a future beyond our control: Prof Stefan Leijnen discusses the challenges of AI and its evolution
Stefan’s Leijnen experience spans cutting-edge AI research and public policy. As professor in applied AI, he focuses on machine learning, generative AI, and the emergence of artificial sentience. As the lead for EU affairs for the Dutch National Growth Fund project AiNed, Stefan plays a pivotal role in defining public policy that promotes innovation while protecting citizens’ rights.
In this captivating presentation, Stefan takes us on a journey through 250 years of artificial intelligence, from a chess-playing robot in 1770 to the modern complexities of machine learning. With thought-provoking anecdotes he draws parallels between the past and the ethical challenges we face today. As the lead in EU AI policy, Stefan unpacks how AI is reshaping industries, from Netflix’s algorithms to self-driving cars, and why we need to prepare for its profound societal impacts.
Below we summarize his insights as a series of lessons and share the full video presentation and its transcript.
Prepare to rethink what you know about AI and its future.
Lesson 1. AI behaviour is underpinned by models that cannot be understood by humans
10 years ago, Netflix asked the following question: “How do we know that the categories that we designed with our limited capacity as humans are the best predictors for your viewing preferences? We categorize people according to those labels, but those might not be the best labels. So let’s reverse engineer things: now that we have all this data, let’s decide what categories are the best predictors for our viewership.” And they did that. And so they come up with 50 dimensions or 50 labels, all generated by the computer, by AI. And 20 of them made a lot of sense: gender, age, etc. But for 30 of those 50 labels, you could not identify the category. That means that the machine uncovered a quality among people that we don’t have a word for. For Netflix this was great because it meant they now had 30 more predictors. But on the other hand, it’s a huge problem. Because now if you want to change something in those labels or you want to change something in the way that you use the model, you no longer understand what you’re dealing with.
Watch the video clip.
Lesson 2. AI’s versatility can lead to hidden – and very hard – ethical problems
Let’s say the camera of a self-driving car spots something and there’s a 99% chance that it is just a leaf blowing by, and a 1% chance that it’s a child crossing the street. Do you break? Of course, you would break in a 1% chance. But now let’s lower the chance to 0.1% or 0.01%. At what point do you decide to break?
The point, of course, is that we never make that decision as humans. But with rule-based programs, you have to make that decision. So it becomes an ethical problem. And these kind of ethical problems are much more difficult to solve than technological problems. Because who’s going to answer that? Who’s going to give you this number? It’s not the programmer. The programmer will go to the manager or to the CEO and they will go to the legal division or to the insurer or to the legislator. And nobody’s willing to provide an answer. For moral reasons (and for insurance reasons), it’s very difficult to solve this problem. Now, of course, nowadays there’s a different approach: just gather the data and calculate the probability of breaking that humans have. But in doing so, you have moved the ethical challenge under the rug. But it’s still there. So don’t get fooled by those strategies.
Watch the video clip.
Lesson 3. AI impact is unpredictable. But its impact won’t be just technological. It will be societal, economical, and likely political
There are other systems technologies like AI. We have the computer, we have the internet, we have the steam engine and electricity. And if you think about the steam engine, when it was first discovered, nobody had a clue of the implications of this technology 10, 20 or 30 years down the line. The first steam engines were used to automate factories. So instead of people working on benches close to each other, the whole workforce was designed along the axis of this steam engine so everything would be mechanically automated. This meant a lot of changes to the workforce. It meant that work could go for hours on end, even in the evenings and in the weekends. That led to a lot of societal changes. So labor forces emerged, you had unions, you had new ideologies popping up. The steam engine also became a lot smaller. You got the steam engine on railways. Railways meant completely different ways of warfare, economy, diplomacy. The world got a lot smaller. This all happened in the time span of several decades. We will see similar effects that are completely unpredictable as AI gets rolled out in the next couple of decades. Most of these effects of the steam engine were not technological. They were societal, economical, sometimes political. So it’s also good to be aware of this when it comes to AI.
Watch the video clip.
Lesson 4. The interfaces with AI will evolve in ways we do not yet anticipate.
The AI that we know now is very primitive. Because what we see today in AI is a very old interface. With ChatGPT, it’s a text command prompt. When the first car was invented, it was a horseless carriage. When the first TV was invented, it was essentially radio programming with an image glued on top of it. Now, for most of you who have been following the news, you already see that the interfaces are developing very rapidly. So you’ll get voice interfaces, you’ll get a lot more personalization with AI. This is a clear trend.
Watch the video clip.
Watch the full video.
(Full transcript below.)
Full transcript.
The transcript has been edited for brevity and clarity.
Prof Leijnen: Does anybody recognize this robot that I have behind me? Not many people. Well, that’s not surprising because it’s a very old robot. This robot was built in the year 1770, so over 250 years ago. And this robot can play chess.
And it was not just any chess-playing robot. It was actually an excellent chess-playing robot. He won most games. And as you can imagine at that time, it was a celebrity. This robot played against Benjamin Franklin. It played against emperors and kings. It also played a game against Napoleon Bonaparte. We know what happened because there were witnesses there. In fact, Napoleon being the smart man that he is, he decided to play a move in chess that’s not allowed just to see the robot’s reaction. What the robot did is it took the piece that Napoleon moved and put it back in its original position.
Napoleon being inventive, did the same illegal move again. Then the robot took the piece and put it beside the board as though it’s no longer in the game. Napoleon tried a third time and then the robot wiped all the pieces off the board and decided that the game was over, to the amusement of the spectators. Then they set up the pieces again, played another game and Napoleon lost. Well, to me, this is really intelligence.
You might think or not think that this is artificial intelligence. If you think this is not artificial, you’re right. Because there’s this little cabinet which has like magnets and strings and there was a very small person that could fit inside this cabinet and who was very good at playing chess. And of course, he is the person who played the game. People only found out about 80 years after the facts when the plans were revealed by the son of von Kempelen.
Now, there was another person who played against this robot and his name was Charles Babbage. And Charles Babbage is the inventor of this machine, the analytical engine. And it’s considered by many to be the first computer in the world. It was able to calculate logarithms. Interestingly, Babbage played against the robot that you just saw. He also lost in 18 turns. But I like to imagine that Babbage must have been thinking how does this robot work, what’s going on inside.
As some of you may know, a computer actually beat the world champion in chess, Gary Kasparov in 1997. So you could say in this story spanning 250 years is a nice story arc. Because now we do have AI that can play and win chess, which was the original point of the chess-playing robots. So we’re done. We’re actually done with AI. We have AI now. The future is here. But at the same time, we’re not done at all. Because now we have this AI and we don’t know how to use it. And we don’t know how to develop it further.
There is a nice example from Netflix, the streaming company. They collect a lot of data. They have your age, gender, postal code, maybe your income. They know things about you. And then based on those categories, they try to predict with machine learning what type of series and movies you like to watch. And this is essentially their business model. Now, 10 years ago they asked the following question: “How do we know that the categories that we designed with our limited capacity as humans are the best predictors for your viewing preferences? We categorize people according to those labels, but those might not be the best labels. So let’s reverse engineer this machine learning algorithm. And now that we have all this data, let’s decide what categories are the best predictors for your viewership.”
And they did that. And they came up with 50 dimensions or 50 labels that you could attach to a viewer, all generated by the computer, by AI. And 20 of them made a lot of sense. So you would see men here, women there. You would see an age distribution. And there was very clear preferences in viewership. Of course, not completely uniform, but you could identify the categories and you could attach a label to them.
Now for 30 of those 50 labels, you could not identify the category. For example, for one of the 30 categories, you would see people on the left side, people on the right side. On the left side, they had a strong preference for the movie “American Beauty.” And on the right side, they had a strong preference for X on the beach. And nobody had any clue what discerned the group on the left from the group on the right. So that means that there was a quality in those groups of people that we don’t have a word for. We don’t know how to understand that. Which for Netflix was great because it means they had now 30 more predictors they could use to do good predictions. But on the other hand, it’s a huge problem. Because now if you want to change something in those labels or you want to change something in the way that you use the model, you no longer understand what you’re dealing with.
And this is essentially the topic of what I am talking about today. How do you manage something you can’t comprehend? Because essentially that’s what AI is. And this is not just a problem for companies implementing AI. We all know plenty of examples of AI going wrong. And when it goes wrong, it tends to go wrong quite deeply. Like in this case, if you ask AI to provide you with an image of a salmon, the AI is not wrong. Statistically speaking, it is the most likely image of a salmon you’ll find on the internet. But of course we know that this is not what was expected. And this is not just a bug. It’s a feature of AI.
I teach AI, I teach students how to program AI systems and machine learning systems. If I ask my students to come up with a program, without using machine learning or AI, that filters out the dogs from the cakes in these kind of images. It will be very difficult because it’s very hard to come up with a rule set that can discern A from B. At the same time, we know that for AI, machine learning, this is a very easy task. And that’s because the AI programs itself. Or in other words, AI can come up with a model that is so complex that we don’t understand how it works anymore, but it still produces the outcome that we’re looking for. In this case, a category classifier.
And that’s great because those very complex models allow us to build systems of infinite complexity. There’s no boundary to the complexity of the model. Just the data that you use, the computing power that you use, the fitness function that you use, but those things we can collect. But it’s also terrible because we don’t know how to deal with this complexity anymore. It’s beyond our human comprehension.
Now, about 12, 13 years ago, I was in Mountain View at Google. They had a self-driving car division. The head of their self-driving car division explains the following problem to us. He said: “We have to deal with all kinds of technical challenges. But what do you think our most difficult challenge is?” Now, this was a room full of engineers. So they said, “Well, the steering or locating yourself on the street or how do I do image segmentation?” He said, “No, you’re all wrong. Those are all technical problems that can be solved. There’s a much deeper underlying problem here. And that’s the problem of when do I break?”
Let’s say the camera spots something and there’s a 99% chance that it’s a leaf blowing by. And it’s a 1% chance that it’s a child crossing the street. Do you break? Well, of course, you would break in a 1% chance. But now we lower the chance to 0.1 or 0.01. At what point do you decide to break? The point, of course, is that we never make that decision as humans. But when you program a system like that, you have to make a decision because it’s rule-based. So you have to say if the probability is below this and that, then I break. So it becomes an ethical problem. And these kinds of ethical problems are much more difficult to solve than technological problems. Because who’s going to answer that? Who’s going to give you this number? It’s not the programmer. The programmer will go to the manager or to the CEO and they will go to the legal division or to the insurer or to the legislator. And nobody’s willing to provide an answer. For moral reasons, also for insurance reasons, it’s very difficult to solve this problem, he said. Now, nowadays they have a different approach. They just gather the data and they say based on this data, this is the probability of breaking that humans have. And so they moved the ethical challenge under the rug. But it’s still there. Don’t get fooled by those strategies.
The examples I showed, of Netflix and Google, are from tech companies. But you see it everywhere. We also know that AI is going to play a major role in healthcare in the future. Not just in medicine, but also in caring for the elderly, for monitoring, for prevention, etc. This raises lots of ethical questions. Is this desirable? Here we see a woman who needs care. There’s no care for her. This is from the documentary “Still Alice”. And there’s this robot companion taking care of her, mental care. Is this what we want or is this not what we want? Again, it’s not a technical question. It’s a moral question.
In the last 10, 15 years and in the foreseeable future, AI has moved from the lab to society. ChatGPT is adopted at a much higher rate than most companies know. A large percentage of employees use ChatGPT. But if you ask company CEOs, they probably mention a lower number because many employees use these kind of tools without the company knowing it. We know that a majority of jobs in the Netherlands will be affected by AI, either by full of partial displacement, or AI complementing their work. And we also know that there are enormous opportunities in terms of automation. So on the one hand, it’s very difficult to manage such technology, not just its bugs, but its intrinsic properties. On the other hand, it provides enormous promises for problems that we don’t know how else to solve.
So it’s wise to take a step back and think more deeply and more long-term about the effects of this technology – before we start thinking about how to innovate and how to regulate the technology. What helps us is looking back a little bit. There are other systems technologies like AI. We have the computer, we have the internet, we have the steam engine and electricity. And if you think about the steam engine, when it was first discovered, nobody had a clue of the implications of this technology 10, 20 or 30 years down the line. The first steam engines were used to automate factories. Instead of people working on benches close to each other so they could talk, the whole workforce was designed along the axis of this steam engine so everything would be mechanically automated. This meant a lot of changes to the workforce. It meant that work could go on hour after hour, even in the evenings and in the weekends because now you have this machine, and you want to keep using it. That led to societal changes. You had labor forces, you had unions, you had new ideologies popping up. The steam engine also became a lot smaller. You got the steam engine on railways. Railways meant completely different ways of warfare, economy, diplomacy. The world got a lot smaller. This all happened in the time span of several decades but we will see similar effects that are completely unpredictable as AI gets rolled out in the next couple of decades. Most of these effects of the steam engine were not technological. They were societal, economical, sometimes political. So it’s also good to be aware of this when it comes to AI.
A second element of AI is the speed at which it develops. I’ve been giving talks about artificial creativity for about 10 years now and 10, 8 years ago, it was very easy for me to create a talk. I could just show people this image and then I would say, this cat does not exist and people would be taken aback. This was the highlight of my presentation. Now I show you this image and nobody raises an eyebrow. And then two years later, I had to show this image. Again, I see no reaction from you. I don’t expect any reaction, by the way. But it shows just how fast it goes and how quickly we adopt and get used to these kind of tools. And it also raises the question: given what was achieved in the past 8 years, where will we be in 25 years from now? You can, of course, apply AI in completely new different fields, similar as was done with the steam engine: creating new materials, coming up with new inventions, new types of engineering. We already know that AI has a major role to play in creativity and in coding.
We also know that the AI that we know now is very primitive. I’ll be giving another speech in 10 years and the audience won’t be taken aback by anything. Because what we see today is AI with a very old interface. At ChatGPT, it’s the interface of the internet [i.e. a text command prompt], of the previous system technology. And that’s always been the case. When the first car was invented, it was a horseless carriage. When the first TV was invented, it was essentially radio programming with an image added to it. And now what we have with AI is an internet browser with an AI model behind it. And for most of you who have been following the news, you see that the interfaces are developing very rapidly. You’ll get voice interfaces, you’ll get a lot more personalization with AI. This is also a trend that’s very clear.
So we talked a bit about the history, we talked a bit about the pace of development, about the complexity being a feature, not a bug. We can also dive a little bit more into the technology itself. This is a graph I designed five years ago with a student of mine. It’s called the Neural Network Zoo. And what you see is from the top left, all the way to the bottom right, is the evolution of neural network architectures. Interestingly, at the bottom right, this is called the transformer architecture. Essentially, the evolution stopped and most AI that you hear about nowadays, most AI developed at Microsoft and Google and OpenAI and others are based on this transformer architecture. So there was this Cambrian explosion of architectures, and then suddenly it converged.
Until five years or so AI models were proliferating. Now they’re also converging. Nowadays, we talk about OpenAI’s GPT, we talk about Google’s Gemini, we talk about Meta’s LAMA, Mistral. There aren’t that many models. So not just the technology has been locking in, but the models themselves as well. So you see huge conversions into only a very limited set of players and models. And this is of course due to the scaling laws. It becomes very difficult to play in this game. But it’s very interesting that on the one hand you have a convergence to a limited set of models in a limited set of companies. And on the other hand, you have this emergence of new functionalities coming out of these large scale models. So they surprise us all the time, but they’re only a very limited set of models that are able to surprise us. And these developments, these trends, all inform the way that we regulate this technology.
This is currently how the European Union thinks about regulating this technology. You have four categories. (1) A minimal risk category where there’s not much or hardly any legislation. (2) A limited risk. For example, if I interact with a chatbot, I have to know I’m interacting with a chatbot and not a human. The AI has to be transparent. (3) A high risk category, where there will be all kinds of ethical checks around, let’s say toys or healthcare or anything that has a real risk for consumers or citizens or society. (4) Unacceptable risk, which is AI systems that can subconsciously influence you, do social scoring, etc. Those will all be forbidden under new legislation (the EU AI Act).
I’ll end the presentation with this final quote, because I think this is essentially where we are right now: “The real problem of humanity is the following: we have paleolithic emotions, medieval institutions and with AI, god-like technology.” (E.O. Wilson).
The 2024 Nobel Prizes in Physics and Chemistry put the spotlight on AI. While the Physics laureates, John Hopfield and Geoffrey Hinton, contributed to its theoretical foundations, two of the three Chemistry laureates – specifically, Demis Hassabis and John Jumper – were rewarded for putting it into use.
John Hopfield developed the Hopfield network in 1982, a form of recurrent artificial neural network that can store and retrieve patterns, mimicking how human memory works. It operates by processing and recognizing patterns even when presented with incomplete or distorted data. His work was significant because it helped bridge the gap between biology and computer science, showing how computational systems could simulate the way the human brain stores and retrieves information.
Geoffrey Hinton co-invented the Boltzmann Machines, a type of neural network that played an important role in understanding how networks can be trained to discover patterns in data. He also popularized the use of backpropagation, an algorithm for training multi-layer neural networks, which considerably improved their capacity to learn complex patterns. Hinton’s contributions ultimately led to AI systems like GPT (Generative Pre-trained Transformers), which underpins ChatGPT, and AlphaFold the AI program that earned Demis Hassabis and John Jumper their Nobel prize in Chemistry.
AlphaFold solved one of biology’s greatest challenges: accurately predicting the 3D structure of proteins from their amino acid sequences. This problem had stumped scientists for decades, as protein folding is essential to understanding how proteins function, which is crucial for drug discovery, disease research, and biotechnology. AlphaFold’s predictions were so accurate that they matched experimental results with near-perfect precision, revolutionizing the field of biology. This breakthrough has wide-ranging implications for medicine and has already begun to accelerate research into diseases, drug discovery, and bioengineering.
Towards AI-driven disruption of traditional business models
Beyond the world of academia and frontier research, the AI techniques developed by the 2024 laureates are permeating the business world too. For one, the capabilities to analyse, identify patterns, and make sense of vast datasets, particularly unstructured data, rely at least partially on them.
From supply chain optimization to consumer behaviour analysis, AI holds the promise of making data-driven decisions faster, and automating a growing range of tasks. Large companies have already launched initiatives to capitalize on this, with some notable successes. Witness the case of a telecom company that generated an ROI 2.5x higher than average thanks to the judicious use of AI; or the case of an energy provider that delivered savings for consumers while increasing its own revenues; or this Supply Chain example that minimized waste and lost sales, while reducing the need for manual intervention at store level. These cases are no exceptions. Increasingly, the deployment of advanced algorithms and data management techniques play a central role in gaining competitive advantage.
Ultimately, AI ability to make sense of vast quantities of data will accelerate innovation and paves the way for new business models that will disrupt existing ones. From biotech to finance and manufacturing, the possibilities are endless, and all industries will eventually be impacted. More prosaically, the breakthroughs of the 2024 Nobel laureates herald an era when AI is not just a futuristic concept, but a key driver of competitiveness right now.
Technology and innovation expert Tom Goodwin on the merits of GenAI and how to leverage its potential.
During Rewire LIVE, we had the pleasure of hosting Tom Goodwin, a friend of Rewire and pragmatic futurist and transformation expert who advises Fortune 500 companies on emerging technologies such as GenAI. Over the past 20 years, he has studied the impact of new technology, new consumer behaviors and the changing rules of business, which makes him uniquely suited to understand the significance of GenAI today.
At the core of Tom’s thinking lies a question that all leaders should ponder: if, knowing everything you know now, were to build your company from scratch, what would it look like? At times counter-intuitive, Tom’s insights, steeped in history, provide valuable clues to answer this question. In this article, we share a handful of them.
INSIGHT 1: Technology revolution happens in two stages. In the first stage we add to what was done before. In the second stage we rethink. That’s when the revolution really happens.
Tom’s insight is derived from the Perez Framework, developed by Carlota Perez, a scholar specialized in technology and socio-economic development. The framework – based on the analysis of all the major technological revolutions since the industrial revolution – stipulates that technological revolutions first go through an installation phase, then a deployment stage. In the installation phase, the technology comes to market and the supporting infrastructure is built. In the deployment phase, society fully adopts the technology. (The transition between the two phases is typically marked by a financial crash and a recovery.)
During the first phase, there’s a frenzy – not dissimilar to the hype that currently surrounds GenAI. Everyone jumps on the technology, everyone talks about it. However, nothing profound really changes. For the most part, the technology only adds to the existing ways of doing things. In contrast, during the second stage, people finally make sense of the technology and use it to rethink the way things are done. That’s when the value is unleashed.
Take electricity as an example. In the first stage, electricity brought the electric iron, the light, the fan, the oven. These were all things that existed before. In the second stage, truly revolutionary innovations emerged: the radio, the TV, the telephone, the microwave, the microwave dinner, factories that operate 24/7, and so on. The second stage required a completely different mindset vis-à-vis what could do be done and how people would behave.
This begs the question: what will be the second stage of GenAI – and more broadly AI – be? What will be the telephone, radio, microwave for AI? Tom’s assertion here is that the degree of transformation is less about how exciting that technology is, and it’s much more about how deeply you change. Better AI will be about systems that are completely rethought and deep integrations, rather than UI patches.
Watch the video clip.
INSIGHT 2: Having category expertise, knowing how to make money, having relationships, and having staff who really know what they’re doing is probably more important than technology expertise.
Across many industries, battle lines are drawn between large traditional companies that have been around for a long time and the digitally-enabled, tech first, mobile-centric startup types. Think Airbnb vs Marriott, Tesla vs. BMW, SpaceX vs NASA, and so on.
The question is who’s going to win. Is it the digitally native companies who have created themselves for the modern era? Or is it the traditional companies that have been around for a long time? Put another way, is it easiest to be a tech company and learn how to make money in your target industry? Or be a big company who already knows how to make money but must now understand what a technology means and adapt accordingly?
Up until recently, the assumption was that the tech companies would win the battle. This proved true for a while: Netflix vs. Blockbusters, Apple vs. Nokia, etc. The assumption was that this would carry on. Understanding the technology was more important than understanding the category.
Tom’s observation is that in the past four years, these assumptions have been challenged. For example, traditional banks have got really good at understanding technology. Neobanks might be good at getting millennials to share the cost of a pizza, but they’re not that good at making money. So there’s this slow realisation that maybe digital-first tech companies are not going to win – because big companies are getting pretty good at change.
Taking a step back, it seems that the narrative of disrupt or die isn’t always true: a lot of the rules of business have not changed; incumbents just need to get a bit more excited about technology. Ultimately, having category expertise, knowing how to make money, having relationships, and having staff who really know what they’re doing is probably more important than tech expertise.
Watch the video clip.
INSIGHT 3: The AI craze is enabling a more flexible investment climate. This is an incentive for leaders to be bold.
Generative AI has spurn heated debates about the evolution of AI and divided experts and observers into two opposing groups: the AI cheerleaders and the sceptics. The former believe that AI is going to change everything immediately. The latter think that it’s a bubble.
History is littered with innovations that went nowhere. A handful of them however proved to be transformational – if in the long run. Only time will tell which group GenAI will join. In the meantime, there’s a growing realization that significant investment may be required to make meaningful steps with AI, hence a more flexible climate for capex – which is an incentive for leaders to be bold.
Tom’s insight reflects this situation: change is hard and expensive, and so regardless of one’s position in the debate, GenAI provides a unique window of opportunity to get the investor that you wouldn’t normally get. It is an amazing time to have an audience who normally wouldn’t listen to you.
Conclusion
These were but a handful of the many insights that Tom shared with us during Rewire LIVE. Taking a step back, it is clear that we are far from having realized the full value of GenAI – and, more broadly, AI. In the words of Tom, AI is a chance to dream really big and leave your mark on the world. It is yours for grab.
About Tom Goodwin
Tom Goodwin is the four time #1 in “Voice in Marketing” on LinkedIn with over 725,000 followers on the platform. He currently heads up “All We Have Is Now”, a digital business transformation consultancy, working with Clients as varied as Stellantis, Merck, Bayer, and EY to rethink how they use technology.
Tom hosts “The Edge” a TV series focusing on technology and innovation, and “My Wildest Prediction”, a podcast produced and distributed by Euronews. He has published the book “Digital Darwinism” with Kogan Page, and has spoken in over 100 cities across 45 countries.
With a 23 year career that spans creative, PR, digital and media agencies, Tom is an industry provocateur as a columnist for the Guardian, TechCrunch and Forbes and frequent contributor to GQ, The World Economic Forum, Ad Age, Wired, Ad Week, Inc, MediaPost and Digiday.
To find out more about Tom, visit www.tomgoodwin.co
Rewire CEO Wouter Huygen reviews the arguments for and against GenAI heralding the next industrial revolution, and how business leaders should prepare.
Is generative AI under- or overhyped? Is it all smoke and mirrors, or is it the beginning of a new industrial revolution? How should business leaders respond? Should they rush to adopt it or should they adopt a wait-and-see approach?
Finding clear-cut answers to these questions is a challenge for most. Experts in the field are equally divided between the cheerleaders and the skeptics, which adds to the apparent subjectivity of the debate.
The GenAI cheerleaders can point to the fact that performance benchmarks keep being beaten. Here the underlying assumption is the “AI Scaling Hypothesis”. That is, as long as we throw in more data and computing power, we’ll make progress. Moreover, the infrastructure required for GenAI at scale is already there: an abundance of cloud-based data and software; the ability to interact with the technology using natural language. Thus, innovation cycles have become shorter and faster.
On the other hand, GenAI skeptics make the following points: first, the limitations of GenAI are not bugs, they’re features. They’re inherent to the way the technology works. Second, GenAI lacks real world understanding. Third, LLMs demonstrate diminishing returns. In short, there are hard limits to the capabilities of GenAI.
The lessons of History indicate that while there might be some overhype around GenAI, the impact could be profound – in the long run. Leaders should therefore develop their own understanding of GenAI and use it to define their vision. Shaping the future is a long-term game that starts today.
Watch the video (full transcript below).
The transcript has been edited for clarity and length.
Generative AI: the new magic lantern?
Anyone recognizes this? If you look closely, not much has changed since. Because this is a basic slide projector. It’s the Magic Lantern, invented around 1600. But it was not only used as a slide projector. It was also used by charlatans, magicians, people entertaining audiences to create illusions. This is the origin of the saying “smoke and mirrors”. Because they used smoke and mirrors with the Magic Lantern to create live projections in the air, in the smoke. So the Magic Lantern became much more than a slide projector – actually a way of creating illusions that were by definition not real.
You could say that Artificial Intelligence is today’s Magic Lantern. We’ve all seen images of Sora, OpenAI’s video production tool. And if you look at OpenAI’s website, they claim that they’re not working on video production. They actually intend to model the physical world. That’s a very big deal if that is true. Obviously it’s not true. At least I think I’m one of the more sceptical ones. But those are the claims being made. If we can actually use these models to model the physical world, that’s a big step towards artificial general intelligence.
Is GenAI overhyped? Reviewing the arguments for and against
If AI is today’s Magic Lantern, it begs the question, where are the smoke and where are the mirrors? And people who lead organizations should ponder a few questions: How good are AI capabilities today? Is AI overhyped? What is the trajectory? Will it continue to go at this pace? Will it slow down? Re-accelerate? How should I respond? Do we need to jump on it? Do we need to wait and see? Let everybody else do the first experience, experience the pains, and then we will adopt whatever works? What are the threats and what are the risks? These are common questions, but given the pace of things, they are crucial.
To answer these questions, one could look to the people who develop all this new technology. But the question is whether we can trust them. Sam Altman is looking for $7 trillion. I think the GDP of Germany is what? $4 trillion or $5 trillion. Last week Eric Schmidt, ex-Google CEO, stated on TV that AI is underhyped. He said the arrival of a non-human intelligence is a very, very big deal. Then the interviewer asked: is it here? And his answer was: it’s here, it’s coming, it’s almost here. Okay, so what is it? Is it here or is it coming? Anyway, he thinks it’s underhyped.
We need to look at the data, but even that isn’t trivial. Because if you look at generative AI, Large Language Models and how to measure their performance, it’s not easy. Because how do you determine if a response is actually accurate or not? You can’t measure it easily. In any case, we see the field progressing, and we’ve all seen the news around models beating bar exams and so on.
The key thing here is that all this progress is based on the AI scaling hypothesis, which states that as long as we throw more data and compute at it, we’ll advance. We’ll get ahead. This is the secret hypothesis that people are basing their claims on. And there are incentives for the industry to make the world believe that we’re close to artificial general intelligence. So we can’t fully trust them in my opinion, and we have to keep looking at the data. But the data tells us we’re still advancing. So what does that mean? Because current systems are anything but perfect. You must have seen ample examples. One is from Air Canada. They deployed a chatbot for their customer service, and the chatbot gave away free flights. It was a bug in the system.
That brings us to the skeptical view. What are the arguments? One is about large language modelling or generative AI in general: the flaws that we’re seeing are not bugs to be fixed. The way this technology works, by definition, has these flaws. These flaws are features, they’re not bugs. And part of that is that the models do not represent how the world works. They don’t have an understanding of the world. They just produce text in the case of a Large Language Model.
On top of that, they claim that there are diminishing returns. If you analyze the performance, for instance, of the OpenAI stuff that’s coming out, they claim that if you look at the benchmarks, it’s not really progressing that much anymore. And OpenAI hasn’t launched GPT-5, so they’re probably struggling. And all the claims are based on these scaling laws, and those scaling laws can’t go on forever. We’ve used all the data in the world, all the internet by now. So we’re probably hitting a plateau. This is the skeptical view. So on the one hand we hear all the progress and all the promises, but there are also people saying, “Look, that’s actually not the case if you really look under the hood of these systems.”
As for questions asked by organization leaders: “What do I need to do?” “How fast is this going?” Here, the predictions vary. In the Dutch Financial Times, here’s an economist saying it’s overhyped, it’s the same as always, all past technology revolutions took time and it will be the same this time. On the other hand, a recent report that came out saying this time is different: generative AI is a different type of technology and this is going to go much faster. The implication being that if you don’t stay ahead, if you don’t participate as an organization, you will be left behind soon.
The argument for generative AI is that the infrastructure is already there. It’s not like electricity, where we had to build power lines. For generative AI, the infrastructure is there. The cloud is rolled out. Software has become modular. And the technology itself is very intuitive. It’s very easy for people to interact with it because it’s based on natural language. All of those arguments are the basis for saying that this is going to go much faster. And I think some of us recognize that.
Looking ahead: how leaders should prepare
There’s a difference between adopting new tools and really changing your organization. When we think about the implications, at Rewire we try to make sense of these polarized views and form our own view of what is really happening and what it means for our clients, for our partners, and the people we work with. We have three key takeaways.
The first one is that we firmly believe that everybody needs to develop their own intuition and understanding of AI. Especially because we’re living in the smoke and mirror phase. It means that it’s important for people who have the role of shaping their organization to understand the technology and develop their own compass of what it can do, to navigate change.
The second is that you need to rethink the fundamentals. You need to think about redesigning things, re-engineering things, re-imagining your organization, asking what if, rather than adopting a tool or a point solution. You must think how your organization is going to evolve, what will it look like in five years’ time and how do we get there?
The third, is that yes, I agree with the fact of this Andrew McAfee, the economist that says generative AI is different because it goes faster. To a certain extent that’s true. But not to the point where full business models and full organizations and end-to-end processes change. Because that’s still hard work, it’s transformational work that doesn’t happen overnight. So the answers are nuanced. It’s not one extreme or the other. It is a long-term game to reap the benefits of this new technology.
What Generative AI can and cannot do. And what it means for the future of business.
Meet Edmond de Belamy, the portrait painting displayed above. On Thursday October 25th in 2018, Edmond was auctioned off by Christie’s for a whopping $432,500. The signature on the bottom right of Edmond shows its artist to be

This impressive-looking formula represents a GAN network: a generative AI-model avant-la-lettre. Edmond de Belamy became the first AI-generated artwork to be auctioned off by a major auction house.
The Belamy portrait was the first time we were truly impressed by the capabilities of a GenAI model. Nowadays, its quality is nothing special. In fact, we have rapidly gotten used to image generation models producing photorealistic images, text generation models that generate e-mail texts and meeting summaries better than we could ourselves, and even LLMs that support us developers and data scientists in writing code.

The breakthrough of GenAI, heralded by the release of ChatGPT in December 2022, has truly been amazing. It may almost lead you to believe that the age of humans is over, and the age of machines has now truly begun.
That is, until you start asking these GenAI models the hard questions. Such as:
Can you give me an odd number that does not have the letter “E” in it?

Anyone who can read could tell you GPT-4 botched that question. The correct answer would have been: no, I cannot, because such a number does not exist. Despite its ‘reasoning’, ChatGPT manages to confidently announce the number “two” has an “e” in it (note: it doesn’t), and continues to produce “two hundred one” as an example of an odd number that does not have an “e” in it (note: it does).
Another favorite pastime of ours is to ask LLM models to play chess with us. Our typical experience: they will claim they know the rules, and proceed to happily play illegal moves halfway through the game, or conjure up ghost pieces that manage to capture your knight out of nowhere.
The internet is full of entertaining examples of ChatGPT getting tasks wrong that humans solve without a second thought. These issues are not ChatGPT-specific either. Other LLM-providers, such as Anthropic, Google, Meta, and so on, face similar challenges.
So, on the one hand, GenAI models have proven to be an impressive breakthrough. On the other hand, it is clear that they are far from infallible. This begs the question:
What is the breakthrough of GenAI really?
And more importantly:
What does this mean for someone wanting to make the most of this technology?
Dissecting the GenAI breakthrough
In the 21st century we have seen the breakthrough of machine learning ending the ‘AI winter’ of the 1990s. Machine learning is essentially computers generalizing patterns in data. Three things are needed to do that well: algorithms for identifying patterns, large amounts of data to identify the patterns in, and computation power to run the algorithms.
By the start of the 2000s, we did know a fair bit about algorithms. The first Artificial Neural Networks, which form the foundation of ‘deep learning’, were published in 1943 (McCulloch et al., 1943). Random forest algorithms, the precursors to modern tree ensemble algorithms, were initially designed in 1995 (Breiman et al., 2001).
The rise of the digital age added the other two ingredients. We saw exponential growth in the amount of stored data worldwide. Simultaneously computing power kept getting exponentially cheaper and more easily accessible. All of a sudden, ‘machine learning’ and therefore AI, started to fly. Apparently, tossing large amounts of data, and huge amounts of computing power at a suitable algorithm can achieve a lot.
With machine learning we managed to have a go at quite a few problems that were previously considered difficult: image recognition, speech-to-text, playing chess. Some problems remained remarkably difficult, however. One of which was automated understanding and generation of human language.
This is where LLMs come in. In 2018, the same year our good fellow Edmond was created, OpenAI published a remarkable paper. They introduced a class of algorithms for language understanding called Generative Pre-trained Transformers, for short: “GPT” (Radford et al., 2018). This combined two key ideas. One that allowed them to “pre-train” language models on low-quality text as well as beautifully crafted pieces. And another that allowed these algorithms to benefit efficiently from large amounts of computing power. Essentially, they asked these models to learn how to predict the next word in a sentence, ideally enabling them to generate their own sentence if they became good enough at that.
GPT-1 was born with this paper in 2018, but OpenAI did not stop there. With their new algorithms they were able to scale GPT-1 both in model size, but just as importantly in terms of how much data it could be fed. GPT-3, the model that powered ChatGPT at its launch in 2022, was reportedly trained on almost 500 billion text tokens (which is impressive, but not even close to the ‘entire internet’, as some enthusiasts claim), and with millions of dollars of computing power.
Despite its insane jump in performance, the fundamental task of GPT-3 remains the same as it was for GPT-1. It is a very large, very complex machine learning language model trained to predict the next word in a sentence (or predict the next ‘token’, to be more precise). What is truly remarkable, however, is the competencies that emerge from this skill, which impressed even the most sceptical.
Emergent competencies of Large Language Models
Predicting the next word is cool, but why are these LLMs heralded by some as the future of everything (and by some as the start of the end of humanity)?
Despite being trained as “next-word-predictors” LLMs have begun to show competencies that truly do make them game changers. To be able to predict the next word based on a prompt, they have to do three things:
- Understand: making sense of unstructured data from humans and systems;
- Synthesize: processing information to ‘connect the dots’ and formulate a single, sensible response;
- Respond: generating responses in human or machine-readable language that provide information or initiate next steps.

Let’s unpack all this by looking at an example.

In this example, we asked ChatGPT powered by GPT-4 to solve a problem that may seem easy to anyone with basic arithmetic knowledge. Let’s pause to appreciate why ChatGPT’s ability to solve the problem is remarkable regardless.
Firstly, ChatGPT understands what we are asking. It manages to grasp that this question requires logical reasoning and gives us an answer that includes argumentation and a calculation. Furthermore, it can synthesize a correct response to our request, generating essentially ‘new’ information that was not present in the original prompt: the fact that we will have 3 loaves of bread left at the end of the day. Finally, it can respond with an answer that makes sense. In this case, it’s a human language response, wrapped in some nice formatting, but ChatGPT can also be used to generate ‘computer language’ in the form of programming code.
To put this into perspective: previous systems that were able to complete such prompts, would have generally been composed of many different components working together, tailored to this particular kind of problem. We’d have a text understanding algorithm, that extracts the calculation to be done from the input prompt. Then a simple calculator to perform the calculation, and a ‘connecting’ algorithm that feeds the output from the first algorithm into the calculator. Finally, we’d have a text generating system that takes the output and inputs the text into a pre-defined text template. With ChatGPT – we can have one model perform this whole sequence, all by itself. Lovely.
To put it in even more perspective: these types of problems were still difficult for older versions of ChatGPT. It shows that these emerging competencies are still improving in newer versions of state-of-the-art models.

So far, we haven’t even considered models that can do understanding, synthesis and generation across multiple types of input and outputs (text, images, sound, video, etc.) These multi-modal models are all the rage since early 2024. The world of GenAI is truly moving quickly.
So what does it mean for businesses?
The talents of GenAI are truly impressive, making them a very versatile tool that opens up endless possibilities. It can craft beautiful e-mails, help generate original logos, or speed up software development.
Even more promising is GenAI’s ability to make sense of unstructured data at scale. Estimates put the amount of unstructured data to be about 80-90% of all the data that companies possess. Extracting value from this data through traditional means is time-consuming and challenging at best. This has led to unstructured data being largely ignored for many commercial use cases. Yet GenAI can plough through it and generate outputs tailored to the desired use cases. You can uncover the needles in haystacks to fuel human decisions, or enable more traditional AI systems to learn from this data. Imagine how powerful these systems would become if you increased the amount of useful information available to them by a factor 5 to 10.
Now, if you’ve been following some of the AI news lately, you’ll know we can take it even one step further and think about ‘Agentic AI’, that is, agents powered by AI. These systems that cannot only think, but actually do. In future, they will likely enable large scale automation at first, and later complete organizational transformations.
Research into how to make this work is in full swing as of the summer of 2024. The extent to which autonomous AI agents are already feasible is a hotly debated topic. At any rate, ‘simple’ agents, are now being developed that are beginning to capitalize on the AI promise. Making the most of these will require their users to carefully consider how to manage their performance, and balance two opposing characteristics: hallucinations (i.e. nonsensical results, which some argue are an inescapable feature of LLMs), and their effectiveness.
Against this backdrop, it won’t be long before the early adopters get a head start on the laggers. Teams and organizations that take the time to identify opportunities and capitalize on them are set to move far ahead of the competition.
Fine-tuning can worsen factual correctness in specialized application domains. We discuss the implications.
This article originally published on LinkedIn. The writer, Wouter Huygen is partner and CEO at Rewire.
A new paper reveals that fine-tuning is not a wonder drug for prevailing LLM hallucinations. Rather the reverse: fine-tuning can actually worsen performance when aiming to develop factual correctness in specialized application domains.
Using supervised learning on new knowledge fine-tunes hallucinations, instead of enhancing accuracy
These findings could have profound implications. What if these areas precisely provide most of the value from LLM use cases?
The hard problem of hallucinations
The beauty of LLMs is that they are very generic and general-purpose: they contain “knowledge” on a very wide range of subjects covered in the training data. This forms the basis for the claim that the current path will get us (close) to AGI. I don’t think that is true, but that’s for another day.
Clearly, generative AI currently works only up to a point. Measuring hallucination rates is notoriously difficult, but roughly speaking the tech works well in 80% of the cases. And yes, performance depends on many factors including the prompting abilities of the user. That being said, getting rid of the remaining 20% is arguably the biggest headache of the AI industry.
A long standing question in neuroscience and philosophy is how consciousness arises in the brain. How does a bunch of molecules and electromagnetic waves produce the miracle of our conscious experience? This is referred to as the hard problem of consciousness. But what if science has the premise all wrong? What if consciousness does not arise from matter, but matter is (an illusion) formed in consciousness?
Hallucinations are the current hard problem-to-crack for AI
Similarly, hallucinations are a cause of generative AI technology, not a consequence. The technology is designed to dream up content, based on probabilistic relationships captured in the model parameters.
Big tech proclaims that the issue can be solved through further scaling, but experts in the field increasingly recognize we have to view it as a remaining feature, not a bug. After all, who would not be hallucinating after reading the entire internet ;-)
For the short term, the applicability of LLMs – despite their amazing feats – remains more limited than we might hope. Especially in high stakes situations and/or very specialized areas. And these might just be the areas that herald the most value (e.g. in healthcare, proving accurate diagnostic/treatment solutions).
Unless fundamental algorithmic breakthroughs come along, or scaling proves to work after all, we have to learn how to make the best of what we've got. Work with the strengths, while minimizing downside impact.
Using Fine-Tuning to develop domain specific applications
Since the beginning of the Gen AI hype, fine-tuning is touted as one of the ways to improve performance on specific application areas. The approach is to use supervised learning on domain-specific data (e.g. proprietary company data) to fine-tune a foundational (open source) model to specialize it for a certain use case and increase factuality.
Intuitively this makes sense. The foundation model is pre-trained on generic text prediction with a very broad base of foundational knowledge. Further fine-tuning would then provide the required specialization, based on proprietary and company-specific facts.
Fine-tuning does not work well for new information
A recent paper investigates the impact of fine-tuning on new information. The authors aimed to validate the hypothesis that new knowledge can have unexpected negative impact on model performance, rather than improving it in a specific area. The outcomes are surprising, counter-intuitive at first glance, and impactful.
Fine-tuning with new knowledge works much slower than for existing knowledge (i.e. knowledge that was included in the pre-training data set). But most importantly, beyond a certain point of training, new knowledge deteriorates model performance on existing knowledge. In other words, incorporating specific new information in fine-tuning increases hallucinations. Worse yet, the hallucination rate grows linearly with more training in unknown content.
In intuitive terms, it seems as if the model gets confused with new information and “unlearns” existing knowledge.
Exhibit 1. Train and development accuracies as a function of the fine-tuning duration, when fine-tuning on 50% Known and 50% Unknown examples.

Source: paper from Zorik Gekhman et al.
These conclusions have serious implications for anyone aiming to develop specialized LLM use cases. Fine-tuning remains useful for strengthening model performance in known areas as well as improving the form and structure of the desired output. But using fine-tuning to increase factuality on new information does not work well and has undesirable, opposite affects.
The unfortunate correlation between accuracy and value
Using LLMs to build knowledge assistants is a promising use case across many fields. These use cases thrive well in highly knowledge-intensive industries, allowing users to query situation specific information on-demand. This includes healthcare workers, pharmaceutical advisors, customer service, professional services, etc. Not only do these assistants increase effectiveness and efficiency of their users, they also allow to accumulate enterprise knowledge and IP in a much more sustainable and scalable manner. They become like digital co-workers that never resign, unless you fire them.
As long as humans can be in the loop, verifying output, or when the impact of inaccurate information is low, the current LLM technology is already good enough. But in many situations, most of the value would actually come from reliability and factual correctness rather than an 80%- answer that can be manually adjusted (like drafting an email).
What to do instead?
To enhance performance in specific application areas amidst existing technological constraints, companies and developers must adopt a pragmatic and empirical engineering approach. This involves employing a combination of sophisticated techniques to forge optimal solutions. Innovations like Retrieval-Augmented Generation (RAG), fine-tuning processes accounting for new versus existing knowledge, advanced context embedding, and post-processing output verification are reshaping our methodologies daily.
The new insights discussed here demonstrate the importance to stay abreast of the fast developing field to continue pushing the performance boundaries of Gen AI applications. Until new breakthroughs happen in foundation models, we have to keep learning new tricks of the trade to get the most out of today's state of the art.