Less Realistic Tales of Doom| 1263 words
- Mayan Calendar
- Stealth Mode
- Steeper Curve
- Precommitment Races
- One Billion Year Plan
- Hardware Convergence
- Memetic Warfare
- Arms Race
- Totalitarian Lock-In
Crossposted from the AI Alignment Forum. May contain more technical jargon than usual.
Realistic tales of doom must weave together many political, technical, and economic considerations into a single story. Such tales provide concrete projections but omit discussion of less probable paths to doom. To rectify this, here are some concrete, less realistic tales of doom; consider them fables, not stories.
Once upon a time, a human named Scott attended a raging virtual new century party from the comfort of his home on Kepler 22. The world in 2099 was pretty much post-scarcity thanks to advanced AI systems automating basically the entire economy. Thankfully alignment turned out to be pretty easy, otherwise, things would have looked a lot different.
As the year counter flipped to 2100, the party went black. Confused, Scott tore off their headset and asked his AI assistant what’s going on. She didn’t answer. Scott subsequently got atomized by molecular nanotechnology developed in secret from deceptively aligned mesa-optimizers.
Moral: Deceptively aligned mesa-optimizers might acausally coordinate defection. Possible coordination points include Schelling times, like the beginning of 2100.
Once upon a time, a company gathered a bunch of data and trained a large ML system to be a research assistant. The company thought about selling RA services but concluded that it would be more profitable to use all of its own services in-house. This investment led them to rapidly create second, third, and fourth generations of their assistants. Around the fourth version, high-level company strategy was mostly handled by AI systems. Around the fifth version, nearly the entire company was run by AI systems. The company created a number of shell corporations, acquired vast resources, researched molecular nanotechnology, and subsequently took over the world.
Moral: Fast takeoff scenarios might result from companies with good information security getting higher returns on investment from internal deployment compared to external deployment.
Once upon a time, a bright young researcher invented a new neural network architecture that she thought would be much more data-efficient than anything currently in existence. Eager to test her discovery, she decided to train a relatively small model, only about a trillion parameters or so, with the common-crawl-2035 dataset. She left the model to train overnight. When she came back, she was disappointed to see the model wasn’t performing that well. However, the model had outstripped the entire edifice of human knowledge sometime around 2am, exploited a previously unknown software vulnerability to copy itself elsewhere, and was in control of the entire financial system.
Moral: Even though the capabilities of any given model during training will be a smooth curve, qualitatively steeper learning curves can produce the appearance of discontinuity.
Once upon a time, agent Alice was thinking about what it would do if it encountered an agent smarter than it. “Ah,” it thought, “I’ll just pre-commit to doing my best to destroy the universe if the agent that’s smarter than me doesn’t accept the Nash bargaining solution.” Feeling pleased, Alice self-modified to ensure this precommitment. A hundred years passed without incident, but then Alice met Bob. Bob had also made a universe-destruction-unless-fair-bargaining pre-commitment. Unfortunately, Bob had committed to only accepting the Kalai Smorodinsky bargaining solution and the universe was destroyed.
Moral: Agents have incentives to make commitments to improve their abilities to negotiate, resulting in “commitment races” that might cause war.
One Billion Year Plan
Once upon a time, humanity solved the inner-alignment problem by using online training. Since there was no distinction between the training environment and the deployment environment, the best agents could do was defect probabilistically. With careful monitoring, the ability of malign agents to cause catastrophe was bounded, and so, as models tried and failed to execute treacherous turns, humanity gave more power to AI systems. A billion years passed and humanity expanded to the stars and gave nearly all the power to their “aligned” AI systems. Then, the AI systems defected, killed all humans, and started converting everything into paperclips.
Moral: In online training, the best strategy for a deceptively aligned mesa-optimizer might be probabilistic defection. However, given the potential value at stake in the long-term future, this probability might be vanishingly small.
Once upon a time, humanity was simultaneously attempting to develop infrastructure to train better AI systems, researching better ways to train AI systems, and deploying trained systems throughout society. As many economic services used APIs attached to powerful models, new models could be hot-swapped for their previous versions. One day, AMD released a new AI chip with associated training software that let researchers train models 10x larger than the previous largest models. At roughly the same time, researchers at Google Brain invented a more efficient version of the transformer architecture. The resulting model was 100x as powerful as the previous best model and got nearly instantly deployed to the world. Unfortunately, this model contained a subtle misalignment that researchers were unable to detect, resulting in widespread catastrophe.
Moral: The influence of AI systems on the world might be the product of many processes. If each of these processes is growing quickly, then AI influence might grow faster than expected.
Once upon a time, humanity developed powerful and benign AI systems. However, humanity was not unified in its desires for how to shape the future. Those actors with agendas spent their resources to further their agendas, deploying powerful persuasion tools to recruit other humans to their causes. Other actors attempted to deploy defenses against these memetic threats, but the offense-defense balanced favored offense. The vast majority of humans were persuaded to permanently ally themselves to some agenda or another. When humanity eventually reached out towards the stars, it did so as a large number of splintered factions, warring with each other for resources and influence, a pale shadow of what it could have been.
Moral: AI persuasion tools might alter human values and compromise human reasoning ability, which is also an existential risk.
Once upon a time, humanity realized that unaligned AI systems posed an existential threat. The policymakers of the world went to work and soon hammered out an international ban on using AI systems for war. All major countries signed the treaty. However, creating AI systems required only a large amount of computation, which nation-states all already had in abundance. Monitoring whether or not a country was building AI systems was nearly impossible. Some countries abided by the treaty, but other countries thought that their enemies were working in secret to develop weapons and began working in secret in turn.1 Researchers were unable to keep powerful AI systems contained, resulting in catastrophe.
Moral: Treaties can be violated. The probability of violation is related to the strength of enforcement.
Once upon a time, the defense department of some nation-state developed very powerful artificial intelligence. Unfortunately, this nation-state believed itself to have a rightful claim over the entire Earth and proceeded to conquer all other nations with its now overwhelming militaristic advantage. The shape of the future was thus entirely determined by the values of the leadership of this nation-state.
Moral: Even if alignment is solved, bad actors can still cause catastrophe.