Artificial Intelligence (AI) has rapidly transformed from a niche technological curiosity into a force capable of reshaping industries, societies, and even the trajectory of humanity itself. While AI’s potential for advancing fields such as medicine, education, and climate change is undeniable, its unchecked development poses profound existential risks. Unlike traditional programs that operate on human-defined instructions, modern AI systems evolve autonomously, creating capabilities that even their creators struggle to comprehend.
This article explores why AI stands apart from conventional technologies and outlines nine key points for understanding its existential threat. By delving into the mechanics of neural networks, the challenge of goal alignment, and the unpredictability of advanced AI capabilities, we aim to illuminate why urgent global coordination is necessary to mitigate the risks. As humanity approaches a critical juncture, the choices we make today could determine whether AI becomes our greatest ally—or an irreversible threat.
1. AI is different from normal programs
Traditional programs are instructions written by humans that a computer follows. These instructions contain algorithms invented by humans. Modern AI systems like ChatGPT are neural networks: matrices with billions to trillions of numbers. In principle, for any algorithm, even one that humans don’t yet understand, there is a (possibly very large) neural network that will approximately implement it. But the numbers in neural networks and the algorithms they implement are incomprehensible to humans.
To create a neural network, we figure out what order to multiply a bunch of matrices together, and what operations to do between the multiplications. Then we fill the matrices with completely random numbers, set up some metric to measure how well the neural network achieves its goals, and use some pretty simple math to figure out which way to change all those numbers so that the neural network performs better on a given metric. So, we essentially grow this neural network: we automatically change it so that it is more capable.
But while we can see all the billions and trillions of numbers that make up a neural network, we have absolutely no idea what it is made of, or how the multiplication of those numbers leads to goal achievement. (Even if we could scan the entire human brain, neuroscientists and other scientists would have to do a lot of work to understand how human consciousness works and what makes people achieve their goals.)
I sketched out a simple tool , you can manually teach a neural network to find an element in the middle between two selected ones.
2. We know how to make neural networks more capable.
If there is a way to solve a problem, there is a neural network that can solve that problem. If there is a way to produce text, and we teach a neural network to predict text, it can, in principle, understand the way that text was produced in reality. If we use “reinforcement learning” – giving rewards for successfully achieving goals – there is a neural network that would receive the maximum reward.
Machine learning is all about setting a metric to measure how capable a neural network is; choosing an architecture (how exactly to arrange all the matrices so that the neural network is potentially capable enough); and the learning process (how exactly to automatically change all those numbers to produce ones that make up an increasingly capable neural network).
It seems that because of the mathematics ( example ) of learning — searching in a very high-dimensional space — spending more computing power simply leads to better results. This means that if we get a lot more GPUs and spend more electricity, we can get a more capable neural network out of it.
3. There is not much time left until neural networks will be as capable of achieving goals as people are.
This realization prompted Nobel laureate Geoffrey Hinton to leave Google. It is also why the vast majority of leading AI scientists signed a statement in May 2023:
Mitigating the risk of extinction from AI should be a global priority alongside other societal-scale risks such as pandemics and nuclear war.
Since GPT-2 came out in 2019, it has been clear to me that the trajectory of AI is going to be much faster than I expected because neural network training works. This became clear to many after AlphaGo/Alpha/Zero. But since ChatGPT came out two years ago, the speed at which advanced AI systems are getting smarter has become pretty obvious to the vast majority of scientists.
We can make AI systems more capable; we are doing it; it is just a question of the resources that need to be spent and the specific algorithms that lead there effectively.
I’ll be very surprised if it’s less than a year or more than ten years.
4. We don’t know how to give AI goals
Twelve years ago, when people thought that AI would be just computer programs, there was a problem: it was very difficult to formulate a mathematical goal that would be safe for an artificial intelligence to achieve the goal better than a human.
(If you could mathematically describe the hormones that are present in the brain when a person feels happy, what would a genie do if asked to maximize the amount of that hormone in the universe—or in people’s skulls?)
How to mathematically specify “do what I would like the AI to do if I were smarter, knew how the world really works, and was more like the ideal version of myself I think I am” is pretty hard to describe with a mathematical formula.
But the technical problem we face now is much worse.
We don’t invent algorithms for achieving goals. We grow neural networks with ever better algorithms for achieving goals that we don’t understand, don’t know how to develop ourselves, and can’t recognize by looking at the insides of neural networks.
We don’t know how to set goals for smart neural networks.
If a neural network is able to achieve a goal very well, somewhere inside it these goals are somehow contained. We do not know how, where exactly, and how to influence them, if the neural network is very capable.
Our metrics may cover what we can measure; but we cannot measure what the neural network’s goals are.
If she’s stupid enough, her algorithms won’t be very focused and coherent, and that’s not too bad.
But if the neural network is smart enough and can achieve the goal better than a human, then whatever metric we specify, the neural network will show the best results for instrumental reasons – regardless of its goals – because this allows it to protect itself from changes in the process that changes the numbers, and to preserve its goals.
This means that quite a lot of the metrics we use reach their optimum by finding neural networks that are very smart and able to achieve their goals, but whose goals are completely random (because the result on the metrics is the same regardless of the goals).
That is: the primary problem is not even to formulate the goal, but to figure out how to install it into a sufficiently smart neural network for any formulated goal. Nobody knows how to do this.
This means that by default, if we don’t solve this technical problem, the first neural network that can achieve a goal better than a human will have random goals that have nothing to do with human values.
5. If an AI system is smarter than a human and can achieve goals better than humans, but has random values, it will lead to disaster and death of everyone on the planet.
Most random goals mean that people are perceived as
a) Agents who could potentially launch another AI system with different random goals that they would have to share, which is some kind of threat;
b) Atoms that could be used for something else.
It’s possible to speculate on how exactly AI wins; there are technologies achievable that shouldn’t be a problem for AI and that allow it to very quickly become independent of the need to convince or bribe humans to do things.
But if something achieves its goals better than you, the end is much more predictable than the process. If we try to play chess against Stockfish (a chess bot that is much better than humans), we don’t know how exactly Stockfish will beat us — if we could predict every move, we would be just as good at chess — but we can predict an important property of the board at the end: we will lose.
Same here. If an AI can choose actions to win better than humans, the AI wins. There is no underground resistance, like in the movies – just as there is no underground resistance in chess against Stockfish. All the moves available to us are known; if an action would lead to defeat by opening up some path for humanity, a sufficiently capable AI system does not take that action.
6. The smart move for humanity is not to play.
We shouldn’t build AI systems that can achieve goals better than humans until we figure out how to make those goals align with human values rather than being completely random.
7. There are short-term incentives that keep humanity from putting development on pause.
If you’re a cutting-edge company developing AI systems, as long as it doesn’t kill everyone on the planet, having a system better than your competitors is very economically valuable.
8. We cannot predict AI capabilities before launch.
We can’t look at a description of the learning process and predict the results – how smart and capable the system will be. If it performs better on the metrics, it’s probably more capable; but we don’t know how much more capable until we run it and test it.
9. To avoid a catastrophe, it is necessary to suspend the development of the AI system category.
Humanity needs to coordinate and prevent AI systems from emerging anywhere on the planet that can achieve goals better than us until we figure out how to do so safely. To do this, we need to limit the training of AI systems to broadly focused goals.
(That said, there are many areas where machine learning is useful and doesn’t pose such threats – drug development, energy, education, climate change. A huge number of narrow applications of AI are very cool and we would like to support them and continue developing them despite the rush of broad/general developments.)
This would require fairly unprecedented international agreements and political will on the part of the US and China.