Skip to main content

AI Visionary Eliezer Yudkowsky on the Singularity, Bayesian Brains and Closet Goblins

“Decision theorist” Eliezer Yudkowsky spells out his idiosyncratic vision of the Singularity.

This article was published in Scientific American’s former blog network and reflects the views of the author, not necessarily those of Scientific American


I’m perpetually astonished by smart people who believe things I find preposterous. For example, geneticist and National Institutes of Health Director Francis Collins, who believes Jesus rose from the dead. Or artificial-intelligence theorist Eliezer Yudkowsky, who believes machines… Well, I should let Yudkowsky say what he believes. I interviewed him on Bloggingheads.tv in 2008, and it didn’t go well, because I assumed he was a disciple of Singularity guru Ray Kurzweil. Yudkowsky, who never attended college, is no one’s follower. He is a stubbornly original theorist of intelligence, both human and artificial. His writings (such as this essay, which helped me grok, or gave me the illusion of grokking, Bayes’s Theorem) exude the arrogance of the autodidact, edges undulled by formal education, but that’s part of his charm. Even when he’s annoying, Yudkowsky is funny, fresh, provocative. For more on his background and interests, see his personal website or the site of the Machine Intelligence Research Institute, which he-cofounded. And read the following Q&A, which includes a bonus: comments from his wife Brienne.

Horgan: When someone at a party asks what you do, what do you tell her?

Yudkowsky: Depending on the venue: "I'm a decision theorist", or "I'm a cofounder of the Machine Intelligence Research Institute", or if it wasn't that kind of party, I'd talk about my fiction.


On supporting science journalism

If you're enjoying this article, consider supporting our award-winning journalism by subscribing. By purchasing a subscription you are helping to ensure the future of impactful stories about the discoveries and ideas shaping our world today.


Horgan: What’s your favorite AI film and why?

Yudkowsky: AI in film is universally awful.  Ex Machina is as close to being an exception to this rule as it is realistic to ask.

Horgan: Is college overrated?

Yudkowsky: It'd be very surprising if college were underrated, given the social desirability bias of endorsing college. So far as I know, there's no reason to disbelieve the economists who say that college has mostly become a positional good, and that previous efforts to increase the volume of student loans just increased the cost of college and the burden of graduate debt.

Horgan: Why do you write fiction?

Yudkowsky: To paraphrase Wondermark, "Well, first I tried not making it, but then that didn't work."

Beyond that, nonfiction conveys knowledge and fiction conveys experience.  If you want to understand a proof of Bayes's Rule, I can use diagrams.  If I want you to feel what it is to use Bayesian reasoning, I have to write a story in which some character is doing that.

Horgan: Are you religious in any way?

Yudkowsky: No.  When you make a mistake, you need to avoid the temptation to go defensive, try to find some way in which you were a little right, look for a silver lining in the cloud.  It's much wiser to just say "Oops", admit you were not even a little right, swallow the whole bitter pill in one gulp, and get on with your life.  That's the attitude humanity should take toward religion.

Horgan: If you were King of the World, what would top your “To Do” list?

Yudkowsky: I once observed, "The libertarian test is whether, imagining that you've gained power, your first thought is of the laws you would pass, or the laws you would repeal."  I'm not an absolute libertarian, since not everything I want would be about repealing laws and softening constraints.  But when I think of a case like this, I imagine trying to get the world to a condition where some unemployed person can offer to drive you to work for 20 minutes, be paid five dollars, and then nothing else bad happens to them.  They don't have their unemployment insurance phased out, have to register for a business license, lose their Medicare, be audited, have their lawyer certify compliance with OSHA rules, or whatever.  They just have an added $5.

I'd try to get to the point where employing somebody was once again as easy as it was in 1900.  I think it can make sense nowadays to have some safety nets, but I'd try to construct every safety net such that it didn't disincent or add paperwork to that simple event where a person becomes part of the economy again.

I'd try to do all the things smart economists have been yelling about for a while but that almost no country ever does.  Replace investment taxes and income taxes with consumption taxes and land value tax.  Replace minimum wages with negative wage taxes.  Institute NGDP level targeting regimes at central banks and let the too-big-to-fails go hang.  Require loser-pays in patent law and put copyright back to 28 years.  Eliminate obstacles to housing construction.  Copy and paste from Singapore's healthcare setup.  Copy and paste from Estonia's e-government setup.  Try to replace committees and elaborate process regulations with specific, individual decision-makers whose decisions would be publicly documented and accountable.  Run controlled trials of different government setups and actually pay attention to the results.  I could go on for literally hours.

All this might not matter directly from the perspective of two hundred million years later.  But the goodwill generated by the resulting economic boom might stand my government in good stead when I tried to figure out what the heck to do about Artificial Intelligence.  The obvious thing, I guess, would be a Manhattan Project on an island somewhere, with pay competitive with top hedge funds, where people could collaborate on researching parts of the Artificial General Intelligence problem without the publication of their work automatically moving us closer to the end of the world.  We'd still be working to an unknown deadline, and I wouldn't feel relaxed at that point.  Unless we postulate that I have literally magical powers or an utterly unshakeable regime, I don't see how any law I could reasonably decree could delay AI timelines for very long on a planet where computers are already ubiquitous.

All of this is an impossible thought experiment in the first place, and I see roughly zero hope of it ever coming to pass in real life.

Horgan: What’s so great about Bayes’ Theorem?

Yudkowsky: For one thing, Bayes's Theorem is incredibly deep.  So it's not easy to give a brief answer to that.

I might answer that Bayes's Theorem is a kind of Second Law of Thermodynamics for cognition.  If you obtain a well-calibrated posterior belief that some proposition is 99% probable, whether that proposition is milk being available at the supermarket or global warming being anthropogenic, then you must have processed some combination of sufficiently good priors and sufficiently strong evidence.  That's not a normative demand, it's a law.  In the same way that a car can't run without dissipating entropy, you simply don't get an accurate map of the world without a process that has Bayesian structure buried somewhere inside it, even if the process doesn't explicitly represent probabilities or likelihood ratios.  You had strong-enough evidence and a good-enough prior or you wouldn't have gotten there.

On a personal level, I think the main inspiration Bayes has to offer us is just the fact that there are rules, that there are iron laws that govern whether a mode of thinking works to map reality.  Mormons are told that they'll know the truth of the Book of Mormon through feeling a burning sensation in their hearts.  Let's conservatively set the prior probability of the Book of Mormon at one to a billion (against).  We then ask about the likelihood that, assuming the Book of Mormon is false, someone would feel a burning sensation in their heart after being told to expect one.  If you understand Bayes's Rule you can see at once that the improbability of the evidence is not commensurate with the improbability of the hypothesis it's trying to lift.  You don't even have to make up numbers to see that the numbers don't add up - as Philip Tetlock found in his study of superforecasters, superforecasters often know Bayes's Rule but they rarely make up specific probabilities.  On some level, it's harder to be fooled if you just realize on a gut level that there is math, that there is some math you'd do to arrive at the exact strength of the evidence and whether it sufficed to lift the prior improbability of the hypothesis.  That you can't just make stuff up and believe what you want to believe because that doesn't work. [See also “Bayes’s Rule: Guide.”]

Horgan: Does the Bayesian-brain hypothesis impress you?

Yudkowsky: I think some of the people in that debate may be talking past each other.  Asking whether the brain is a Bayesian algorithm is like asking whether a Honda Accord runs on a Carnot heat engine.  If you have one person who's trying to say, "Every car is a thermodynamic process that requires fuel and dissipates waste heat" and the person on the other end hears, "If you draw a diagram of a Carnot heat engine and show it to a mechanic, they should agree that it looks like the inside of a Honda Accord" then you are going to have some fireworks.

Some people will also be really excited when they open up the internal combustion engine and find the cylinders and say, "I bet this converts heat into pressure and helps drive the car forward!"  And they'll be right, but then you're going to find other people saying, "You're focusing on what's merely a single component in a much bigger library of car parts; the catalytic converter is also important and that doesn't appear anywhere on your diagram of a Carnot heat engine.  Why, sometimes we run the air conditioner, which operates in the exact opposite way of how you say a heat engine works."

I don't think it would come as much of a surprise that I think the people who adopt a superior attitude and say, "You are clearly unfamiliar with modern car repair; you need a toolbox of diverse methods to build a car engine, like sparkplugs and catalytic convertors, not just these thermodynamic processes you keep talking about" are missing a key level of abstraction.

But if you want to know whether the brain is literally a Bayesian engine, as opposed to doing cognitive work whose nature we can understand in a Bayesian way, then my guess is "Heck, no."  There might be a few excitingly Bayesian cylinders in that engine, but a lot more of it is going to look like weird ad-hoc seat belts and air conditioning.  None of which is going to change the fact that to correctly identify an apple based on sensory evidence, you need to do something that's ultimately interpretable as resting on an inductive prior that can learn the apple concept, and updating on evidence that distinguishes apples from nonapples.

Horgan: Can you be too rational?

Yudkowsky: You can run into what we call "The Valley of Bad Rationality."  If you were previously irrational in multiple ways that balanced or canceled out, then becoming half-rational can leave you worse off than before.  Becoming incrementally more rational can make you incrementally worse off, if you choose the wrong place to invest your skill points first.

But I would not recommend to people that they obsess over that possibility too much.  In my experience, people who go around talking about cleverly choosing to be irrational strike me as, well, rather nitwits about it, to be frank.  It's hard to come up with a realistic non-contrived life situation where you know that it's a good time to be irrational and you don't already know the true answer.  I think in real life, you just tell yourself the truth as best you know it, and don't try to be clever.

On an entirely separate issue, it's possible that being an ideal Bayesian agent is ultimately incompatible with living the life best-lived from a fun-theoretic perspective.  But we're a long, long, long way from that being a bigger problem than our current self-destructiveness.

Horgan: How does your vision of the Singularity differ from that of Ray Kurzweil?

Yudkowsky:

- I don't think you can time AI with Moore's Law.  AI is a software problem.

- I don't think that humans and machines "merging" is a likely source for the first superhuman intelligences.  It took a century after the first cars before we could even begin to put a robotic exoskeleton on a horse, and a real car would still be faster than that.

- I don't expect the first strong AIs to be based on algorithms discovered by way of neuroscience any more than the first airplanes looked like birds.

- I don't think that nano-info-bio "convergence" is probable, inevitable, well-defined, or desirable.

- I think the changes between 1930 and 1970 were bigger than the changes between 1970 and 2010.

- I buy that productivity is currently stagnating in developed countries.

- I think extrapolating a Moore's Law graph of technological progress past the point where you say it predicts smarter-than-human AI is just plain weird.  Smarter-than-human AI breaks your graphs.

- Some analysts, such as Illka Tuomi, claim that Moore's Law broke down in the '00s.  I don't particularly disbelieve this.

- The only key technological threshold I care about is the one where AI, which is to say AI software, becomes capable of strong self-improvement.  We have no graph of progress toward this threshold and no idea where it lies (except that it should not be high above the human level because humans can do computer science), so it can't be timed by a graph, nor known to be near, nor known to be far.  (Ignorance implies a wide credibility interval, not being certain that something is far away.)

- I think outcomes are not good by default - I think outcomes can be made good, but this will require hard work that key actors may not have immediate incentives to do.  Telling people that we're on a default trajectory to great and wonderful times is false.

 - I think that the "Singularity" has become a suitcase word with too many mutually incompatible meanings and details packed into it, and I've stopped using it.

Horgan: Do you think you have a shot at becoming a superintelligent cyborg?

Yudkowsky: The conjunction law of probability theory says that P(A&B) <= P(A) - the probability of both A and B happening is less than the probability of A alone happening.  Experimental conditions that can get humans to assign P(A&B) > P(A) for some A&B are said to exhibit the "conjunction fallacy" - for example, in 1982, experts at the International Congress on Forecasting assigned higher probability to "A Russian invasion of Poland, and a complete breakdown of diplomatic relations with the Soviet Union" than a separate group did for "A complete breakdown of diplomatic relations with the Soviet Union".  Similarly, another group assigned higher probability to "An earthquake in California causing a flood that causes over a thousand deaths" than another group assigned to "A flood causing over a thousand deaths somewhere in North America."  Even though adding on additional details necessarily makes a story less probable, it can make the story sound more plausible.  I see understanding this as a kind of Pons Asinorum of serious futurism - the distinction between carefully weighing each and every independent proposition you add to your burden, asking if you can support that detail independently of all the rest, versus making up a wonderful vivid story.

I mention this as context for my reply, which is, "Why the heck are you tacking on the 'cyborg' detail to that?  I don't want to be a cyborg."  You've got to be careful with tacking on extra details to things.

Horgan: Do you have a shot at immortality?

Yudkowsky: What, literal immortality?  Literal immortality seems hard.  Living significantly longer than a few trillion years requires us to be wrong about the expected fate of the expanding universe.  Living longer than, say, a googolplex years, requires us to be wrong about the basic character of physical law, not just the details.

Even if some of the wilder speculations are true and it's possible for our universe to spawn baby universes, that doesn't get us literal immortality.  To live significantly past a googolplex years without repeating yourself, you need computing structures containing more than a googol elements, and those won't fit inside a single Hubble volume.

And a googolplex is hardly infinity.  To paraphrase Martin Gardner, Graham's Number is still relatively small because most finite numbers are very much larger.  Look up the fast-growing hierarchy if you really want to have your mind blown, well, eternity is longer than that.  Only weird and frankly terrifying anthropic theories would let you live long enough to gaze, perhaps knowingly and perhaps not, upon the halting of the longest-running halting Turing machine with 100 states.

But I'm not sure that living to look upon the 100th Busy Beaver Number feels to me like it matters very much on a deep emotional level.  I have some imaginative sympathy with myself a subjective century from now.  That me will be in a position to sympathize with their future self a subjective century later.  And maybe somewhere down the line is someone who faces the prospect of their future self not existing at all, and they might be very sad about that; but I'm not sure I can imagine who that person will be.  "I want to live one more day.  Tomorrow I'll still want to live one more day.  Therefore I want to live forever, proof by induction on the positive integers."  Even my desire for merely physical-universe-containable longevity is an abstract want by induction; it's not that I can actually imagine myself a trillion years later.

Horgan: I’ve described the Singularity as an “escapist, pseudoscientific” fantasy that distracts us from climate change, war, inequality and other serious problems. Why am I wrong?

Yudkowsky: Because you're trying to forecast empirical facts by psychoanalyzing people.  This never works.

Suppose we get to the point where there's an AI smart enough to do the same kind of work that humans do in making the AI smarter; it can tweak itself, it can do computer science, it can invent new algorithms.  It can self-improve.  What happens after that - does it become even smarter, see even more improvements, and rapidly gain capability up to some very high limit?  Or does nothing much exciting happen?

It could be that, (A), self-improvements of size delta tend to make the AI sufficiently smarter that it can go back and find new potential self-improvements of size k*delta and that k is greater than 1, and this continues for a sufficiently extended regime that there's a rapid cascade of self-improvements leading up to superintelligence; what I. J. Good called the intelligence explosion.  Or it could be that, (B), k is less than one or that all regimes like this are small and don't lead up to superintelligence, or that superintelligence is impossible, and you get a fizzle instead of an explosion.  Which is true, A or B?  If you actually built an AI at some particular level of intelligence and it actually tried to do that, something would actually happen out there in the empirical real world, and that event would be determined by background facts about the landscape of algorithms and attainable improvements.

You can't get solid information about that event by psychoanalyzing people.  It's exactly the sort of thing that Bayes's Theorem tells us is the equivalent of trying to run a car without fuel.  Some people will be escapist regardless of the true values on the hidden variables of computer science, so observing some people being escapist isn't strong evidence, even if it might make you feel like you want to disaffiliate with a belief or something.

There is a misapprehension, I think, of the nature of rationality, which is to think that it's rational to believe "there are no closet goblins" because belief in closet goblins is foolish, immature, outdated, the sort of thing that stupid people believe.  The true principle is that you go in your closet and look.  So that in possible universes where there are closet goblins, you end up believing in closet goblins, and in universes with no closet goblins, you end up disbelieving in closet goblins.

It's difficult but not impossible to try to sneak peeks through the crack of the closet door, to ask the question, "What would look different in the universe now if you couldn't get sustained returns on cognitive investment later, such that an AI trying to improve itself would fizzle?  What other facts should we observe in a universe like that?"

So you have people who say, for example, that we'll only be able to improve AI up to the human level because we're human ourselves, and then we won't be able to push an AI past that.  I think that if this is how the universe looks in general, then we should also observe, e.g., diminishing returns on investment in hardware and software for computer chess past the human level, which we did not in fact observe.  Also, natural selection shouldn't have been able to construct humans, and Einstein's mother must have been one heck of a physicist, etcetera.

You have people who say, for example, that it should require more and more tweaking to get smarter algorithms and that human intelligence is around the limit.  But this doesn't square up with the anthropological record of human intelligence; we can know that there were not diminishing returns to brain tweaks and mutations producing improved cognitive power.  We know this because population genetics says that mutations with very low statistical returns will not evolve to fixation at all.

And hominids definitely didn't need exponentially vaster brains than chimpanzees.  And John von Neumann didn't have a head exponentially vaster than the head of an average human.

And on a sheerly pragmatic level, human axons transmit information at around a millionth of the speed of light, even when it comes to heat dissipation each synaptic operation in the brain consumes around a million times the minimum heat dissipation for an irreversible binary operation at 300 Kelvin, and so on.  Why think the brain's software is closer to optimal than the hardware?  Human intelligence is privileged mainly by being the least possible level of intelligence that suffices to construct a computer; if it were possible to construct a computer with less intelligence, we'd be having this conversation at that level of intelligence instead.

But this is not a simple debate and for a detailed consideration I'd point people at an old informal paper of mine, "Intelligence Explosion Microeconomics", which is unfortunately probably still the best source out there.  But these are the type of questions one must ask to try to use our currently accessible evidence to reason about whether or not we'll see what's colloquially termed an "AI FOOM" - whether there's an extended regime where delta improvement in cognition, reinvested into self-optimization, yields greater than delta further improvements.

As for your question about opportunity costs:

There is a conceivable world where there is no intelligence explosion and no superintelligence.  Or where, a related but logically distinct proposition, the tricks that machine learning experts will inevitably build up for controlling infrahuman AIs carry over pretty well to the human-equivalent and superhuman regime.  Or where moral internalism is true and therefore all sufficiently advanced AIs are inevitably nice.  In conceivable worlds like that, all the work and worry of the Machine Intelligence Research Institute comes to nothing and was never necessary in the first place, representing some lost number of mosquito nets that could otherwise have been bought by the Against Malaria Foundation.

There's also a conceivable world where you work hard and fight malaria, where you work hard and keep the carbon emissions to not much worse than they are already (or use geoengineering to mitigate mistakes already made).  And then it ends up making no difference because your civilization failed to solve the AI alignment problem, and all the children you saved with those malaria nets grew up only to be killed by nanomachines in their sleep.  (Vivid detail warning!  I don't actually know what the final hours will be like and whether nanomachines will be involved.  But if we're happy to visualize what it's like to put a mosquito net over a bed, and then we refuse to ever visualize in concrete detail what it's like for our civilization to fail AI alignment, that can also lead us astray.)

I think that people who try to do thought-out philanthropy, e.g., Holden Karnofsky of Givewell, would unhesitatingly agree that these are both conceivable worlds we prefer not to enter.  The question is just which of these two worlds is more probable as the one we should avoid.  And again, the central principle of rationality is not to disbelieve in goblins because goblins are foolish and low-prestige, or to believe in goblins because they are exciting or beautiful.  The central principle of rationality is to figure out which observational signs and logical validities can distinguish which of these two conceivable worlds is the metaphorical equivalent of believing in goblins.

I think it's the first world that's improbable and the second one that's probable.  I'm aware that in trying to convince people of that, I'm swimming uphill against a sense of eternal normality - the sense that this transient and temporary civilization of ours that has existed for only a few decades, that this species of ours that has existed for only an eyeblink of evolutionary and geological time, is all that makes sense and shall surely last forever.  But given that I do think the first conceivable world is just a fond dream, it should be clear why I don't think we should ignore a problem we'll predictably have to panic about later.  The mission of the Machine Intelligence Research Institute is to do today that research which, 30 years from now, people will desperately wish had begun 30 years earlier.

Horgan: Does your wife Brienne believe in the Singularity?

Brienne replies: "If someone asked me whether I 'believed in the singularity', I'd raise an eyebrow and ask them if they 'believed in' robotic trucking. It's kind of a weird question. I don't know a lot about what the first fleet of robotic cargo trucks will be like, or how long they'll take to completely replace contemporary ground shipping. And if there were a culturally loaded suitcase term 'robotruckism' that included a lot of specific technological claims along with whole economic and sociological paradigms, I'd be hesitant to say I 'believed in' driverless trucks. I confidently forecast that driverless ground shipping will replace contemporary human-operated ground shipping, because that's just obviously where we're headed if nothing really weird happens. Similarly, I confidently forecast an intelligence explosion. That's obviously where we're headed if nothing really weird happens. I'm less sure of the other items in the 'singularity' suitcase." (Eliezer adds: “To avoid prejudicing the result, Brienne composed her reply without seeing my other answers.  We're just well-matched.”)

Horgan: Can we create superintelligences without knowing how our brains work?

Yudkowsky: Only in the sense that you can make airplanes without knowing how a bird flies.  You don't need to be an expert in bird biology, but at the same time, it's difficult to know enough to build an airplane without realizing some high-level notion of how a bird might glide or push down air with its wings.  That's why I write about human rationality in the first place - if you push your grasp on machine intelligence past a certain point, you can't help but start having ideas about how humans could think better too.

Horgan: What would superintelligences want? Will they have anything resembling sexual desire?

Yudkowsky: Think of an enormous space of possibilities, a giant multidimensional sphere.  This is Mind Design Space, the set of possible cognitive algorithms.  Imagine that somewhere near the bottom of that sphere is a little tiny dot representing all the humans who ever lived - it's a tiny dot because all humans have basically the same brain design, with a cerebral cortex, a prefrontal cortex, a cerebellum, a thalamus, and so on.  It's conserved even relative to chimpanzee brain design.  Some of us are weird in little ways, you could say it's a spiky dot, but the spikes are on the same tiny scale as the dot itself; no matter how neuroatypical you are, you aren't running on a different cortical algorithm.

Asking "what would superintelligences want" is a Wrong Question. Superintelligences are not this weird tribe of people who live across the water with fascinating exotic customs.  "Artificial Intelligence" is just a name for the entire space of possibilities outside the tiny human dot.  With sufficient knowledge you might be able to reach into that space of possibilities and deliberately pull out an AI that wanted things that had a compact description in human wanting-language, but that wouldn't be because this is a kind of thing that those exotic superintelligence people naturally want, it would be because you managed to pinpoint one part of the design space.

When it comes to pursuing things like matter and energy, we may tentatively expect partial but not total convergence - it seems like there should be many, many possible superintelligences that would instrumentally want matter and energy in order to serve terminal preferences of tremendous variety.  But even there, everything is subject to defeat by special cases.  If you don't want to get disassembled for spare atoms, you can, if you understand the design space well enough, reach in and pull out a particular machine intelligence that doesn't want to hurt you.

So the answer to your second question about sexual desire is that if you knew exactly what you were doing and if you had solved the general problem of building AIs that stably want particular things as they self-improve and if you had solved the general problem of pinpointing an AI's utility functions at things that seem deceptively straightforward to human intuitions, and you'd solved an even harder problem of building an AI using the particular sort of architecture where 'being horny' or 'sex makes me happy' makes sense in the first place, then you could perhaps make an AI that had been told to look at humans, model what humans want, pick out the part of the model that was sexual desire, and then want and experience that thing too.

You could also, if you had a sufficiently good understanding of organic biology and aerodynamics, build an airplane that could mate with birds.

I don't think this would have been a smart thing for the Wright Brothers to try to do in the early days.  There would have been absolutely no point.

It does seem a lot wiser to figure out how to reach into the design space and pull out a special case of AI that will lack the default instrumental preference to disassemble us for spare atoms.

Horgan: I like to think superintelligent beings would be nonviolent, because they will realize that violence is stupid. Am I naive?

Yudkowsky: I think so.  As David Hume might have told you, you're making a type error by trying to apply the 'stupidity' predicate to an agent's terminal values or utility function.  Acts, choices, policies can be stupid given some set of preferences over final states of the world.  If you happen to be an agent that has meta-preferences you haven't fully computed, you might have a platform on which to stand and call particular guesses at the derived object-level preferences as 'stupid'.

A paperclip maximizer is not making a computational error by having a preference order on outcomes that prefers outcomes with more paperclips in them.  It is not standing from within your own preference framework and choosing blatantly mistaken acts, nor is it standing within your meta-preference framework and making mistakes about what to prefer.  It is computing the answer to a different question than the question that you are asking when you ask, "What should I do?"  A paperclip maximizer just outputs the action leading to the greatest number of expected paperclips.

The fatal scenario is an AI that neither loves you nor hates you, because you're still made of atoms that it can use for something else.  Game theory, and issues like cooperation in the Prisoner's Dilemma, don't emerge in all possible cases.  In particular, they don't emerge when something is sufficiently more powerful than you that it can disassemble you for spare atoms whether you try to press Cooperate or Defect.  Past that threshold, either you solved the problem of making something that didn't want to hurt you, or else you've already lost.

Horgan: Will superintelligences solve the “hard problem” of consciousness?

Yudkowsky: Yes, and in retrospect the answer will look embarrassingly obvious from our perspective.

Horgan: Will superintelligences possess free will?

Yudkowsky: Yes, but they won't have the illusion of free will.

Horgan: What’s your utopia?

Yudkowsky: I refer your readers to my nonfiction Fun Theory Sequence, since I have not as yet succeeded in writing any novel set in a fun-theoretically optimal world.

Further Reading:

Bayes's Theorem: What's the Big Deal?

Are Brains Bayesian?

Can the Singularity Solve the Valentine's Day Dilemma?

Do Big New Brain Projects Make Sense When We Don't Even Know the "Neural Code"?

Two More Reasons Why Big Brain Projects Are Premature.

Artificial brains are imminent… not!

What’s the Biggest Science News? We’re Still Human, for Ill or Good.

Can We Improve Predictions? Q&A with Philip "Superforecasting" Tetlock.

Christof Koch on Free Will, the Singularity and the Quest to Crack Consciousness.

The Many Minds of Marvin Minsky (R.I.P.)