#431 – Roman Yampolskiy: Dangers of Superintelligent AI

AI transcript
0:00:00 The following is a conversation with Roman Yumpolski, an AI safety and security researcher
0:00:05 and author of a new book titled AI, unexplainable, unpredictable, uncontrollable.
0:00:13 He argues that there’s almost 100% chance that AGI will eventually destroy human civilization.
0:00:19 As an aside, let me say that I will have many often technical conversations on the topic
0:00:25 of AI, often with engineers building the state of the art AI systems.
0:00:31 I would say those folks put the infamous P-DOOM or the probability of AGI killing all humans
0:00:36 at around 1 to 20%, but it’s also important to talk to folks who put that value at 70,
0:00:43 80, 90, and in the case of Roman at 99.99 and many more nines percent.
0:00:51 I’m personally excited for the future and believe it will be a good one, in part because
0:00:57 of the amazing technological innovation we humans create, but we must absolutely not
0:01:03 do so with blinders on, ignoring the possible risks, including existential risks of those
0:01:11 technologies.
0:01:12 That’s what this conversation is about.
0:01:16 And now a quick few second mention of each sponsor.
0:01:19 Check them out in the description, it’s the best way to support this podcast.
0:01:23 We got Yahoo Finance for Investors, Masterclass for Learning, NetSuite for Business, Element
0:01:29 for Hydration, and Aidsleep for Sweet Sweet Naps.
0:01:34 Choose what is it my friends.
0:01:36 Also, if you want to get in touch with me or for whatever reason, work with our amazing
0:01:41 team let’s say, just go to lexfreedman.com/contact.
0:01:45 And now on to the full ad reads, as always, no ads in the middle.
0:01:49 I try to make these interesting, but if you must skip them, friends, please still check
0:01:53 out our sponsors.
0:01:54 I enjoy their stuff, maybe you will too.
0:01:58 This episode is brought to you by Yahoo Finance, a site that provides financial management,
0:02:03 reports, information, and news for investors.
0:02:07 It’s my main go-to place for financial stuff.
0:02:10 I also added my portfolio to it.
0:02:13 I guess it used to be TD Ameritrade and that got transported, transformed, moved to Charles
0:02:21 Schwab.
0:02:22 I guess that was an acquisition of some sort.
0:02:25 I have not been paying attention.
0:02:28 All I know is I hate change and try and figure out the new interface of Schwab when I log
0:02:33 in once a year or however long I log in is just annoying.
0:02:39 Anyway, one of the ways to avoid that annoyance is tracking information about my portfolio
0:02:45 from Yahoo Finance so you can drag over your portfolio and in the same place find out all
0:02:52 the news, analysis, information, all that kind of stuff.
0:02:55 Anyway, for comprehensive financial news and analysis, go to yahoofinance.com.
0:03:01 That’s yahoofinance.com.
0:03:04 I don’t know why I whispered that.
0:03:06 This episode is also brought to you by Masterclass where you can watch over 180 classes from
0:03:12 the best people in the world and their respective disciplines.
0:03:16 We got Aaron Franklin on barbecue and brisket, something I watched recently, and I love brisket.
0:03:24 I love barbecue.
0:03:25 It’s one of my favorite things about Austin.
0:03:28 It’s funny when the obvious cliche thing is also the thing that brings you joy.
0:03:33 So it almost doesn’t feel genuine to say, but I really love barbecue.
0:03:39 My favorite place to go is probably Terry Blacks.
0:03:41 I’ve had Franklin’s a couple of times.
0:03:43 It’s also amazing.
0:03:44 I’ve actually don’t remember myself having bad barbecue or even mediocre barbecue in
0:03:51 Austin.
0:03:52 So it’s hard to pick favorites because it really boils down to the experience you have
0:03:58 when you’re sitting there.
0:03:59 One of my favorite places to sit is Terry Blacks.
0:04:02 I have this, I don’t know, it feels like a tavern.
0:04:05 I feel like Cowboy.
0:04:08 I just robbed a bank in some town in the middle of nowhere in West Texas and I’m just sitting
0:04:15 down for some good barbecue and the sheriffs walk in and there’s a gunfight and all that,
0:04:19 as usual.
0:04:20 Anyway, get unlimited access to every Masterclass and get an additional 15% off an annual membership
0:04:25 by masterclass.com/lexpod.
0:04:32 This episode is also brought to you by NetSuite, an all-in-one cloud business management system.
0:04:38 One of the most fulfilling things in life is the people you surround yourself with, just
0:04:43 like in the movie 300.
0:04:44 All it takes is 300 people to do some incredible stuff.
0:04:51 But they all have to be shredded.
0:04:53 It’s really, really important to always be ready for war in physical and mental shape.
0:05:03 No, not really, but I guess if that’s your thing, happiness is the thing you should be
0:05:10 chasing and there’s a lot of ways to achieve that.
0:05:13 For me, being in shape is one of the things that make me happy because I can move about
0:05:18 the world and have a lightness to my physical being if I’m in good shape.
0:05:24 Anyway, I say all that because getting a strong team together and having them operate is an
0:05:30 efficient, powerful machine is really important for the success of the team, for the happiness
0:05:38 of the team, and the individuals in that team.
0:05:41 NetSuite is a great system that runs the machine, inside the machine, for a knee-sized business.
0:05:48 37,000 companies have upgraded to NetSuite by Oracle.
0:05:54 Take advantage of NetSuite’s flexible financing plan at netsuite.com/lex.
0:05:58 That’s netsuite.com/lex.
0:06:02 This episode was also brought to you by Element, electrolyte, drink mix of sodium potassium
0:06:08 and magnesium that I’ve been consuming multiple times a day.
0:06:12 Watermelon salt is my favorite.
0:06:14 Whenever you see me drink from a cup on the podcast, almost always it’s going to be water
0:06:19 with some element in it.
0:06:20 I use an empty Powerade bottle, 28 fluid ounces, filled with water, put one packet of watermelon
0:06:27 salt element in it, mix it up, put it in the fridge, and when it’s time to drink, I take
0:06:33 it out of the fridge and I drink it, and I drink a lot of those a day, and it feels good,
0:06:37 it’s delicious, whenever I do crazy physical fasting, all that kind of stuff, the element
0:06:45 is always by my side, and more and more, you’re going to see probably the sparkling water
0:06:50 thing or whatever the element is making, so it’s in a can, and it’s freaking delicious.
0:06:55 There’s four flavors.
0:06:58 The lemon one is the only one I don’t like, the other three I really love, and I forget
0:07:01 their names, but they’re freaking delicious, and you’re going to see it more and more on
0:07:05 my desk, except for the fact that I run out very quickly because I consume them very quickly.
0:07:10 Get a simple pack for free with any purchase, try it at drinkelement.com/lex.
0:07:16 This episode is also brought to you by A.Sleep, and it’s pod for ultra.
0:07:20 This thing is amazing, the ultra part of that adds a base that goes between the mattress
0:07:26 and the bed frame and can elevate to like a reading position, so it modifies the positioning
0:07:31 of the bed.
0:07:32 On top of all the cooling and heating and all that kind of stuff they can do, and do
0:07:37 it better in the pod four, I think it has two X the cooling power of pod three, so they’re
0:07:41 improving on the main thing that they do, but also there’s the ultra part that can adjust
0:07:47 the bed.
0:07:48 It can cool down each side of the bed to 20 degrees Fahrenheit below room temperature.
0:07:55 One of my favorite things is to escape the world on a cool bed with a warm blanket.
0:08:01 And just disappear for 20 minutes or for eight hours into a dream world where everything
0:08:08 is possible, where everything is allowed.
0:08:12 It’s a chance to explore the Jungian shadow, the good, the bad, and the ugly, but it’s
0:08:19 usually good and it’s usually awesome.
0:08:21 And I actually don’t dream that much but when I do it’s awesome.
0:08:25 The whole point though is that I wake up refreshed, taking your sleep seriously is really, really
0:08:31 important.
0:08:32 When you get a chance to sleep, do it in style and do it on a bed that’s awesome.
0:08:38 Go to sleep.com/lex and use code Lex to get $350 off the pod four ultra.
0:08:47 This is the Lex Friedman podcast to support it, please check out our sponsors in the
0:08:51 description.
0:08:52 And now dear friends, here’s Roman Yampalski.
0:08:56 What do you, is the probability that super intelligent AI will destroy all human civilization?
0:09:18 What’s the time frame?
0:09:19 Let’s say a hundred years, in the next hundred years.
0:09:22 So the problem of controlling a GI or super intelligence, in my opinion, is like a problem
0:09:30 of creating a perpetual safety machine, by analogy with perpetual motion machine, it’s
0:09:35 impossible.
0:09:36 Yeah, we may succeed and do a good job with GPT-5, 6, 7, but they just keep improving,
0:09:46 learning, eventually self modifying, interacting with the environment, interacting with malevolent
0:09:53 actors.
0:09:55 The difference between cybersecurity, narrow AI safety and safety for general AI for super
0:10:01 intelligence is that we don’t get a second chance.
0:10:04 With cybersecurity, somebody hacks your account, what’s the big deal?
0:10:07 You get a new password, new credit card, you move on.
0:10:11 Here, if we’re talking about existential risks, you only get one chance.
0:10:15 So you’re really asking me, what are the chances that we’ll create the most complex software
0:10:21 ever on the first try with zero bugs and it will continue have zero bugs for a hundred
0:10:28 years or more?
0:10:31 So there is an incremental improvement of systems leading up to AGI.
0:10:38 To you, it doesn’t matter if we can keep those safe, there’s going to be one level
0:10:44 of system at which you cannot possibly control it.
0:10:49 I don’t think we so far have made any system safe.
0:10:54 At the level of capability they display, they already have made mistakes.
0:11:00 We had accidents, they’ve been jailbroken.
0:11:03 I don’t think there is a single large language model today which no one was successful at
0:11:09 making do something developers didn’t intend it to do.
0:11:13 But there’s a difference between getting it to do something unintended, getting it to
0:11:17 do something that’s painful, costly, destructive, and something that’s destructive to the level
0:11:22 of hurting billions of people, or hundreds of millions of people, billions of people,
0:11:28 or the entirety of human civilization.
0:11:30 That’s a big leap.
0:11:31 Exactly, but the systems we have today have capability of causing X amount of damage.
0:11:37 So when they fail, that’s all we get.
0:11:39 If we develop systems capable of impacting all of humanity, all of universe, the damage
0:11:46 is proportionate.
0:11:48 What do you, are the possible ways that such kind of mass murder of humans can happen?
0:11:55 It’s always a wonderful question.
0:11:58 So one of the chapters in my new book is about unpredictability.
0:12:01 I argue that we cannot predict what a smarter system will do.
0:12:05 So you’re really not asking me how superintelligence will kill everyone.
0:12:09 You’re asking me how I would do it.
0:12:11 And I think it’s not that interesting.
0:12:13 I can tell you about the standard, you know, nanotech, synthetic, bionuclear.
0:12:18 Superintelligence will come up with something completely new, completely super.
0:12:23 We may not even recognize that as a possible path to achieve that goal.
0:12:28 So there is like a unlimited level of creativity in terms of how humans could be killed.
0:12:36 But you know, we could still investigate possible ways of doing it.
0:12:41 Not how to do it, but the, at the end, what is the methodology that does it?
0:12:46 You know, shutting off the power and then humans start killing each other, maybe because
0:12:51 the resources are really constrained that they’re, and then there’s the actual use of
0:12:55 weapons like nuclear weapons or developing artificial pathogens, viruses, that kind of
0:13:01 stuff.
0:13:03 We could still kind of think through that and defend against it, right?
0:13:07 There’s a ceiling to the creativity of mass murder of humans here, right?
0:13:11 The options are limited.
0:13:13 They are limited by how imaginative we are.
0:13:16 If you are that much smarter, that much more creative, if you are capable of thinking across
0:13:20 multiple domains, do novel research in physics and biology, you may not be limited by the
0:13:25 tools.
0:13:26 If squirrels were planning to kill humans, they would have a set of possible ways of
0:13:31 doing it, but they would never consider things we can come up.
0:13:34 So are you thinking about mass murder and destruction of human civilization?
0:13:38 Are you thinking of with squirrels, you put them in a zoo and they don’t really know
0:13:42 they’re in a zoo?
0:13:43 If we just look at the entire set of undesirable trajectories, majority of them are not going
0:13:48 to be death.
0:13:50 Most of them are going to be just like things like Brave New World, where the squirrels are
0:13:58 fed dopamine, and they’re all doing some kind of fun activity, and the fire, the soul of
0:14:05 humanity is lost because of the drug that’s fed to it, or literally in a zoo.
0:14:10 We’re in a zoo, we’re doing our thing, we’re playing a game of sims, and the actual players
0:14:18 playing that game are AI systems.
0:14:20 Those are all undesirable because of the free will.
0:14:24 The fire of human consciousness is dimmed through that process, but it’s not killing humans.
0:14:30 So are you thinking about that, or is the biggest concern, literally, the extinctions
0:14:36 of humans?
0:14:37 I think about a lot of things.
0:14:39 So there is ex-risk, existential risk, everyone’s dead.
0:14:43 There is ex-risk, suffering risks, where everyone wishes they were dead.
0:14:48 We have also idea for iris, ikigai risks, where we lost our meaning.
0:14:53 The systems can be more creative, they can do all the jobs.
0:14:57 It’s not obvious what you have to contribute to a world where superintelligence exists.
0:15:02 Of course, you can have all the variants you mentioned, where we are safe, we are kept
0:15:07 alive, but we are not in control.
0:15:09 We are not deciding anything, we are like animals in a zoo.
0:15:13 There is, again, possibilities we can come up with as very smart humans, and then possibilities
0:15:19 something a thousand times smarter can come up with.
0:15:23 For reasons we cannot comprehend.
0:15:25 I would love to sort of dig into each of those, ex-risk, ex-risk, and iris.
0:15:30 So can you like linger on iris?
0:15:33 What is that?
0:15:34 You Japanese concept of ikigai, you find something which allows you to make money, you are good
0:15:41 at it, and the society says we need it.
0:15:44 So like you have this awesome job, you are a podcaster, gives you a lot of meaning, you
0:15:49 have a good life, I assume, you’re happy.
0:15:54 That’s what we want most people to find, to have.
0:15:56 For many intellectuals, it is their occupation which gives them a lot of meaning.
0:16:02 I am a researcher, philosopher, scholar, that means something to me.
0:16:07 In a world where an artist is not feeling appreciated because his art is just not competitive with
0:16:14 what is produced by machines, or writer, or scientist, we’ll lose a lot of that.
0:16:22 And at the lower level, we’re talking about complete technological unemployment.
0:16:26 We’re not losing 10% of jobs, we’re losing all jobs.
0:16:30 What do people do with all that free time?
0:16:32 What happens then?
0:16:35 Everything society’s build on is completely modified in one generation.
0:16:40 It’s not a slow process where we get to kind of figure out how to live that new lifestyle,
0:16:46 but it’s pretty quick.
0:16:48 In that world, humans do what humans currently do with chess, play each other, have tournaments,
0:16:55 even though AI systems are far superior at this time in chess.
0:17:00 So we just create artificial games.
0:17:02 Or for us, they’re real, like the Olympics, we do all kinds of different competitions
0:17:07 and have fun, maximize the fun and let the AI focus on the productivity.
0:17:17 It’s an option.
0:17:18 I have a paper where I try to solve the value alignment problem for multiple agents.
0:17:23 And the solution to avoid compromises is to give everyone a personal virtual universe.
0:17:28 You can do whatever you want in that world.
0:17:30 You could be king, you could be slave, you decide what happens.
0:17:33 So it’s basically a glorified video game where you get to enjoy yourself and someone
0:17:37 else takes care of your needs and the substrate alignment is the only thing we need to solve.
0:17:44 We don’t have to get 8 billion humans to agree on anything.
0:17:48 So why is that not a likely outcome?
0:17:52 Why can’t AI systems create video games for us to lose ourselves in, each with an individual
0:17:58 video game universe?
0:18:01 Some people say that’s what happened, we’re in a simulation.
0:18:04 And we’re playing that video game and now we’re creating artificial threats for ourselves
0:18:11 to be scared about because fear is really exciting.
0:18:14 It allows us to play the video game more vigorously.
0:18:18 And some people choose to play on a more difficult level with more constraints.
0:18:23 Some say, okay, I’m just going to enjoy the game, high privilege level.
0:18:26 Absolutely.
0:18:27 So okay, what was that paper on multi-agent value alignment?
0:18:31 Personal universes.
0:18:33 Personal universes.
0:18:35 So that’s one of the possible outcomes.
0:18:37 But what in general is the idea of the paper, just looking at multiple agents, they’re human
0:18:42 AI, like a hybrid system, whether it’s humans and AI’s, or is it looking at humans or just
0:18:46 intelligent agents?
0:18:48 In order to solve value alignment problem, I’m trying to formalize it a little better.
0:18:53 Basically we’re talking about getting AI’s to do what we want, which is not well defined.
0:18:58 Are we talking about creator of a system, owner of that AI, humanity as a whole?
0:19:04 But we don’t agree on much.
0:19:06 There is no universally accepted ethics, morals across cultures, religions.
0:19:12 People have individually very different preferences politically and such.
0:19:15 So even if we somehow managed all the other aspects of it, programming those fuzzy concepts
0:19:21 and getting AI to follow them closely, we don’t agree on what to program in.
0:19:25 So my solution was, okay, we don’t have to compromise on room temperature.
0:19:29 You have your universe, I have mine, whatever you want.
0:19:33 And if you like me, you can invite me to visit your universe.
0:19:36 We don’t have to be independent, but the point is you can be.
0:19:39 And virtual reality is getting pretty good.
0:19:41 It’s going to hit a point where you can’t tell the difference.
0:19:44 And if you can’t tell if it’s real or not, what’s the difference?
0:19:47 So basically give up on value alignment, create an entire, it’s like the multiverse theory.
0:19:53 This is creating an entire universe for you with your values.
0:19:57 You still have to align with that individual.
0:19:59 They have to be happy in that simulation.
0:20:02 But it’s a much easier problem to align with one agent versus 8 billion agents plus animals,
0:20:07 aliens.
0:20:08 So you convert the multi agent problem into a single agent problem.
0:20:12 I’m trying to do that, yeah.
0:20:14 Okay.
0:20:15 Is there any way to, so, okay, that’s giving up on the value alignment problem.
0:20:22 Well is there any way to solve the value alignment problem where there’s a bunch of humans, multiple
0:20:28 humans, tens of humans or 8 billion humans that have very different set of values?
0:20:34 It seems contradictory.
0:20:35 I haven’t seen anyone explain what it means outside of kind of words which pack a lot,
0:20:43 make it good, make it desirable, make it something they don’t regret.
0:20:48 But how do you specifically formalize those notions?
0:20:50 How do you program them in?
0:20:52 I haven’t seen anyone make progress on that so far.
0:20:55 But isn’t that the whole optimization journey that we’re doing as a human civilization?
0:21:00 We’re looking at geopolitics.
0:21:03 Nations are in a state of anarchy with each other.
0:21:06 They start wars, there’s conflict.
0:21:11 And oftentimes they have a very different views of what is good and what is evil.
0:21:15 It’s not what we’re trying to figure out, just together trying to converge towards that.
0:21:20 So we’re essentially trying to solve the value alignment problem with humans.
0:21:24 Right.
0:21:25 But the examples you gave, some of them are, for example, two different religions saying
0:21:29 this is our holy site and we are not willing to compromise it in any way.
0:21:34 If you can make two holy sites in virtual worlds, you solve the problem.
0:21:38 And if you only have one, it’s not divisible.
0:21:41 You’re stuck there.
0:21:42 But what if we want to be attention with each other?
0:21:45 And that through that tension, we understand ourselves and we understand the world.
0:21:50 So that’s the intellectual journey we’re on as a human civilization, is we create intellectual
0:21:58 and physical conflict and through that figure stuff out.
0:22:01 If we go back to that idea of simulation and this is entertainment kind of giving meaning
0:22:06 to us.
0:22:07 The question is how much suffering is reasonable for a video game?
0:22:11 So yeah, I don’t mind a video game where I get haptic feedback, there is a little bit
0:22:15 of shaking, maybe I’m a little scared.
0:22:17 I don’t want a game where kids are tortured, literally.
0:22:23 That seems unethical, at least by our human standards.
0:22:26 Are you suggesting it’s possible to remove suffering if we’re looking at human civilization
0:22:31 as an optimization problem?
0:22:33 So we know there are some humans who, because of a mutation, don’t experience physical pain.
0:22:39 So at least physical pain can be mutated out, re-engineered out.
0:22:46 Suffering in terms of meaning, like you burn the only copy of my book, is a little harder.
0:22:51 But even there, you can manipulate your hedonic set point, you can change the faults, you
0:22:56 can reset.
0:22:58 Problem with that is if you start messing with your reward channel, you start wire heading.
0:23:03 And end up bleasing out a little too much.
0:23:07 Well, that’s the question.
0:23:09 Would you really want to live in a world where there’s no suffering, as a dark question?
0:23:15 Is there some level of suffering that reminds us of what this is all for?
0:23:22 I think we need that, but I would change the overall range.
0:23:26 So right now it’s negative infinity to kind of positive infinity, pain, pleasure, access.
0:23:30 I would make it like zero to positive infinity, and being unhappy is like I’m close to zero.
0:23:36 Okay, so what’s the S-risk?
0:23:39 What are the possible things that you’re imagining with S-risk, so mass suffering of humans?
0:23:44 What are we talking about there, caused by AGI?
0:23:47 So there are many malevolent actors, so we can talk about psychopaths, crazies, hackers,
0:23:53 doomsday cults.
0:23:55 We know from history they tried killing everyone.
0:23:58 They tried on purpose to cause maximum amount of damage, terrorism.
0:24:03 What if someone malevolent wants on purpose to torture all humans as long as possible?
0:24:08 You solve aging, so now you have functional immortality, and you just try to be as creative
0:24:15 as you can.
0:24:16 Do you think there is actually people in human history that try to literally maximize human
0:24:22 suffering?
0:24:23 It’s just starting people have done evil in the world.
0:24:26 It seems that they think that they’re doing good, and it doesn’t seem like they’re trying
0:24:30 to maximize suffering.
0:24:33 They just cause a lot of suffering as a side effect of doing what they think is good.
0:24:39 So there are different malevolent agents.
0:24:42 Some may be just gaining personal benefit and sacrificing others to that cause.
0:24:48 Others we know for a fact are trying to kill as many people as possible when we look at
0:24:53 recent school shootings.
0:24:54 If they had more capable weapons, they would take out not dozens, but thousands, millions,
0:25:03 billions.
0:25:06 Well we don’t know that, but that is a terrifying possibility, and we don’t want to find out.
0:25:14 Like if terrorists had access to nuclear weapons, how far would they go?
0:25:19 Is there a limit to what they’re willing to do?
0:25:25 In your senses, there are some malevolent actors where there’s no limit.
0:25:29 There is mental diseases where people don’t have empathy, don’t have this human quality
0:25:40 of understanding, suffering, and ours.
0:25:42 And then there’s also a set of beliefs where you think you’re doing good by killing a lot
0:25:47 of humans.
0:25:48 Again, I would like to assume that normal people never think like that.
0:25:52 It’s always some sort of psychopaths, but yeah.
0:25:56 And to you, AGI systems can carry that and be more competent at executing that?
0:26:04 They can certainly be more creative.
0:26:06 They can understand human biology better, understand our molecular structure, genome.
0:26:12 Again, a lot of times torture ends, then individual dies.
0:26:19 That limit can be removed as well.
0:26:21 So if we’re actually looking at X-Risk and S-Risk as the systems get more and more intelligent,
0:26:26 don’t you think it’s possible to anticipate the ways they can do it and defend against
0:26:32 it like we do with the cybersecurity with the do security systems?
0:26:35 Right.
0:26:36 We can definitely keep up for a while.
0:26:38 I’m saying you cannot do it indefinitely.
0:26:41 At some point, the cognitive gap is too big.
0:26:44 The surface you have to defend is infinite, but attackers only need to find one exploit.
0:26:53 So to you, eventually, this is heading off a cliff.
0:26:57 If we create general super intelligences, I don’t see a good outcome long-term for humanity.
0:27:04 The only way to win this game is not to play it.
0:27:06 Okay, we’ll talk about possible solutions and what not playing it means.
0:27:12 But what are the possible timelines here to you?
0:27:14 What are we talking about?
0:27:15 We’re talking about a set of years, decades, centuries, what do you think?
0:27:20 I don’t know for sure.
0:27:21 The prediction markets right now are saying 2026 for AGI.
0:27:26 I heard the same thing from CEO of Anthropic, DeepMind, so maybe we’re two years away, which
0:27:31 seems very soon, given we don’t have a working safety mechanism in place or even a prototype
0:27:38 for one.
0:27:39 And there are people trying to accelerate those timelines because they feel we’re not getting
0:27:43 there quick enough.
0:27:44 Well, what do you think they mean when they say AGI?
0:27:47 So the definitions we used to have, and people are modifying them a little bit lately, artificial
0:27:53 general intelligence was a system capable of performing in any domain a human could perform.
0:27:59 So kind of you creating this average artificial person, they can do cognitive labor, physical
0:28:05 labor where you can get another human to do it.
0:28:08 Superintelligence was defined as a system which is superior to all humans in all domains.
0:28:13 Now people are starting to refer to AGI as if it’s superintelligence.
0:28:17 I made a post recently where I argued, for me at least, if you average out over all the
0:28:23 common human tasks, those systems are already smarter than an average human.
0:28:28 So under that definition, we have it.
0:28:31 Shane Lake has this definition of where you’re trying to win in all domains.
0:28:35 That’s what intelligence is.
0:28:37 Now are they smarter than elite individuals in certain domains?
0:28:41 Of course not.
0:28:42 They’re not there yet, but the progress is exponential.
0:28:46 See, I’m much more concerned about social engineering.
0:28:50 So to me, AGI’s ability to do something in the physical world, the lowest hanging fruit,
0:28:59 the easiest set of methods is by just getting humans to do it.
0:29:05 It’s going to be much harder to be the kind of viruses to take over the minds of robots
0:29:13 that where the robots are executing the commands.
0:29:15 It just seems like human social engineering of humans is much more likely.
0:29:19 That would be enough to bootstrap the whole process.
0:29:22 Okay, just to linger on the term AGI, what to you is the difference in AGI and human
0:29:29 level intelligence?
0:29:31 Human level is general in the domain of expertise of humans.
0:29:36 We know how to do human things.
0:29:37 I don’t speak dog language.
0:29:39 I should be able to pick it up if I’m a general intelligence.
0:29:42 It’s kind of inferior animal.
0:29:44 I should be able to learn that skill, but I can’t.
0:29:47 General intelligence, truly universal general intelligence, should be able to do things
0:29:51 like that humans cannot do.
0:29:53 To be able to talk to animals, for example.
0:29:55 To solve pattern recognition problems of that type, to have similar things outside of our
0:30:04 domain of expertise because it’s just not the world we live in.
0:30:08 If we just look at the space of cognitive abilities we have, I just would love to understand
0:30:14 what the limits are beyond which an AGI system can reach.
0:30:19 What does that look like?
0:30:20 What about actual mathematical thinking or scientific innovation, that kind of stuff?
0:30:30 We know calculators are smarter than humans in that narrow domain of addition.
0:30:36 But is it humans plus tools versus AGI or just human raw human intelligence?
0:30:43 Because humans create tools and with the tools they become more intelligent, so there’s
0:30:48 a gray area there, what it means to be human when we’re measuring their intelligence.
0:30:52 When I think about it, I usually think human would like a paper and a pencil, not human
0:30:56 with internet and other AI helping.
0:30:59 But is that a fair way to think about it?
0:31:01 Because isn’t there another definition of human level intelligence that includes the
0:31:05 tools that humans create?
0:31:06 But we create AI, so at any point you’ll still just add super intelligence to human capability.
0:31:11 That seems like cheating.
0:31:14 No, controllable tools.
0:31:16 There is an implied leap that you’re making when AGI goes from tool to entity that can
0:31:25 make its own decisions.
0:31:27 So if we define human level intelligence as everything a human can do with fully controllable
0:31:32 tools.
0:31:33 It seems like a hybrid of some kind, you know, doing brain-computer interfaces, you’re connecting
0:31:38 it to maybe narrow AI’s yet definitely increases our capabilities.
0:31:44 So what’s a good test to you that measures whether an artificial intelligence system has
0:31:52 reached human level intelligence?
0:31:54 And what’s a good test where it has superseded human level intelligence to reach that land
0:32:00 of AGI?
0:32:01 I’m old-fashioned.
0:32:02 I like to test.
0:32:03 I have a paper where I equate passing-turing tests to solving AI-complete problems because
0:32:09 you can encode any questions about any domain into the Turing test.
0:32:13 You don’t have to talk about how is your day, you can ask anything.
0:32:18 And so the system has to be as smart as a human to pass it in a true sense.
0:32:23 But then you would extend that to maybe a very long conversation.
0:32:27 I think the Alexa prize was doing that.
0:32:31 Really can you do a 20-minute, 30-minute conversation with any ass system?
0:32:35 It has to be long enough to where you can make some meaningful decisions about capabilities,
0:32:41 absolutely.
0:32:42 You can brute force very short conversations.
0:32:45 So like literally what does that look like?
0:32:48 Can we construct formally a kind of test, a test for AGI?
0:32:56 For AGI it has to be there, I cannot give it a task.
0:33:00 I can give to a human and it cannot do it if a human can.
0:33:05 For super intelligence it would be superior on all such tasks.
0:33:09 Not just average performance, like go learn to drive car, go speak Chinese, play guitar,
0:33:14 okay, great.
0:33:15 I guess the follow-on question, is there a test for the kind of AGI that would be susceptible
0:33:24 to lead to S risk or X risk, susceptible to destroy human civilization?
0:33:31 Like is there a test for that?
0:33:33 You can develop a test which will give you positives if it lies to you or has those ideas.
0:33:39 You cannot develop a test which rules them out.
0:33:41 There is always possibility of what Bostrom calls a treacherous turn, where later on a
0:33:46 system decides for game theoretic reasons, economic reasons to change its behavior.
0:33:54 And we see the same with humans.
0:33:55 It’s not unique to AI.
0:33:57 For millennia we tried developing models, ethics, religions, lie detector tests.
0:34:03 And then employees betray the employers, spouses betray family.
0:34:07 It’s a pretty standard thing intelligent agents sometimes do.
0:34:12 So is it possible to detect when an AI system is lying or deceiving you?
0:34:17 If you know the truth and it tells you something false, you can detect that, but you cannot
0:34:22 know in general every single time.
0:34:26 And again, the system you’re testing today may not be lying.
0:34:30 The system you’re testing today may know you are testing it and so behaving.
0:34:35 And later on after it interacts with the environment, interacts with other systems, malevolent agents,
0:34:42 learns more, it may start doing those things.
0:34:45 So do you think it’s possible to develop a system where the creators of the system,
0:34:49 the developers, the programmers, don’t know that it’s deceiving them?
0:34:55 So systems today don’t have long-term planning.
0:34:58 That is not out.
0:34:59 They can lie today if it optimizes, helps them optimize the reward.
0:35:06 If they realize, okay, this human will be very happy if they tell them the following,
0:35:11 they will do it if it brings them more points.
0:35:15 And they don’t have to kind of keep track of it.
0:35:18 It’s just the right answer to this problem every single time.
0:35:23 At which point is somebody creating that intentionally, not unintentionally, intentionally
0:35:28 creating an AI system that’s doing long-term planning with an objective function as defined
0:35:33 by the AI system, not by a human?
0:35:36 Well, some people think that if they’re that smart, they’re always good.
0:35:40 They really do believe that.
0:35:42 It’s just benevolence from intelligence, so they’ll always want what’s best for us.
0:35:47 Some people think that they will be able to detect problem behaviors and correct them
0:35:54 at the time when we get there.
0:35:56 I don’t think it’s a good idea.
0:35:58 I am strongly against it, but yeah, there are quite a few people who in general are
0:36:03 so optimistic about this technology, it could do no wrong.
0:36:07 They want it developed as soon as possible, as capable as possible.
0:36:12 So there’s going to be people who believe the more intelligent it is, the more benevolent,
0:36:17 and so therefore, it should be the one that defines the objective function that it’s optimizing
0:36:22 when it’s doing long-term planning.
0:36:23 There are even people who say, okay, what’s so special about humans, right?
0:36:27 We removed the gender bias, we’re removing race bias.
0:36:32 Why is this pro-human bias?
0:36:33 We are polluting the planet, we are, as you said, fight a lot of wars, kind of violent.
0:36:39 Maybe it’s better if a super-intelligent, perfect society comes and replaces us.
0:36:45 It’s normal stage in the evolution of our species.
0:36:49 Yeah, so somebody says, let’s develop an AI system that removes the violent humans from
0:36:56 the world.
0:36:57 And then it turns out that all humans have violence in them, or the capacity for violence,
0:37:01 and therefore, all humans are removed.
0:37:03 Yeah, yeah, yeah.
0:37:07 Let me ask about Yan Likun.
0:37:10 He’s somebody who you’ve had a few exchanges with, and he’s somebody who actively pushes
0:37:17 back against this view that AI is going to lead to destruction of human civilization,
0:37:23 also known as AI-dumerism.
0:37:28 So in one example that he tweeted, he said, “I do acknowledge risks, but two points.
0:37:37 One, open research and open source are the best ways to understand and mitigate the risks.
0:37:42 And two, AI is not something that just happens.
0:37:45 We build it.
0:37:47 We have agency in what it becomes, hence we control the risks, we meaning humans.”
0:37:53 That’s some sort of natural phenomena that we have no control over.
0:37:58 Can you make the case that he’s right, and can you try to make the case that he’s wrong?
0:38:02 I cannot make a case that he’s right, he’s wrong in so many ways.
0:38:06 It’s difficult for me to remember all of them.
0:38:09 He’s a Facebook buddy, so I have a lot of fun having those little debates with him.
0:38:14 So I’m trying to remember the arguments.
0:38:16 So one, he says we are not gifted this intelligence from aliens.
0:38:22 We are designing it.
0:38:23 We are making decisions about it.
0:38:25 That’s not true.
0:38:26 It was true when we had expert systems, symbolic AI, decision trees.
0:38:32 Today, you set up parameters for a model and you water this plant.
0:38:36 You give it data, you give it compute, and it grows.
0:38:39 And after it’s finished growing into this alien plant, you start testing it to find
0:38:44 out what capabilities it has.
0:38:46 And it takes years to figure out, even for existing models, if it’s trained for six months,
0:38:51 it will take you two, three years to figure out basic capabilities of that system.
0:38:55 We still discover new capabilities in systems which are already out there.
0:39:00 So that’s not the case.
0:39:02 So just to link on that, to give you the difference there, there is some level of emergent intelligence
0:39:07 that happens in our current approaches.
0:39:11 So stuff that we don’t hard code in.
0:39:14 Absolutely.
0:39:15 That’s what makes it so successful.
0:39:17 When we had to painstakingly hard code in everything, we didn’t have much progress.
0:39:22 Now, just spend more money and more compute, and it’s a lot more capable.
0:39:27 And then the question is, when there is emergent intelligent phenomena, what is the ceiling
0:39:32 of that?
0:39:33 For you, there’s no ceiling.
0:39:35 For Yanlacun, I think there’s a kind of ceiling that happens that we have full control over.
0:39:41 Even if we don’t understand the internals of the emergence, how the emergence happens,
0:39:46 there’s a sense that we have control and an understanding of the approximate ceiling
0:39:53 of capability, the limits of the capability.
0:39:56 Let’s say there is a ceiling.
0:39:58 It’s not guaranteed to be at the level which is competitive with us.
0:40:03 It may be greatly superior to ours.
0:40:06 So what about his statement about open research and open source, are the best ways to understand
0:40:12 and mitigate the risks?
0:40:14 Historically, he’s completely right.
0:40:16 Open source software is wonderful.
0:40:17 It’s tested by the community, it’s debugged, but we’re switching from tools to agents.
0:40:23 Now you’re giving open source weapons to psychopaths.
0:40:27 Do we want open source nuclear weapons, biological weapons?
0:40:32 It’s not safe to give technology so powerful to those who may misalign it, even if you
0:40:38 are successful at somehow getting it to work in a first place in a friendly manner.
0:40:43 But the difference with nuclear weapons, current AI systems are not akin to nuclear
0:40:48 weapons.
0:40:49 So the idea there is you’re open sourcing it at this stage that you can understand it
0:40:53 better.
0:40:54 A large number of people can explore the limitations of capabilities, explore the possible ways
0:40:58 to keep it safe, to keep it secure, all that kind of stuff, while it’s not at the stage
0:41:03 of nuclear weapons.
0:41:04 In nuclear weapons, there’s a non-nuclear weapon and then there’s a nuclear weapon.
0:41:09 With AI systems, there’s a gradual improvement of capability and you get to perform that
0:41:16 improvement incrementally, and so open source allows you to study how things go wrong, study
0:41:22 the very process of emergence, study AI safety and those systems when there’s not a high
0:41:29 level of danger, all that kind of stuff.
0:41:30 It also sets a very wrong precedent.
0:41:33 So we open sourced model one, model two, model three, nothing ever bad happened, so obviously
0:41:38 we’re going to do it with model four, it’s just gradual improvement.
0:41:42 I don’t think it always works with the precedent, like you’re not stuck doing it the way you
0:41:48 always did, it’s just, it’s as a precedent of open research and open development such
0:41:55 that we get to learn together, and then the first time there’s a sign of danger, some
0:42:01 dramatic thing happen, not a thing that destroys human civilization, but some dramatic demonstration
0:42:07 of capability that can legitimately lead to a lot of damage, then everybody wakes up
0:42:12 and says, “Look, we need to regulate this, we need to come up with safety mechanism that
0:42:16 stops this.”
0:42:18 At this time, maybe you can educate me, but I haven’t seen any illustration of significant
0:42:23 damage done by intelligent AI systems.
0:42:27 So I have a paper which collects accidents through history of AI, and they always are
0:42:32 proportionate to capabilities of that system.
0:42:34 So if you have tic-tac-toe playing AI, it will fail to properly play and lose the game,
0:42:40 which it should draw, trivial, your spell checker will misspell a word, so on.
0:42:45 I stopped collecting those because there are just too many examples of AI’s failing at
0:42:49 what they are capable of.
0:42:51 We haven’t had terrible accidents in the sense of billion people get killed, absolutely
0:42:57 true, but in another paper, I argue that those accidents do not actually prevent people
0:43:04 from continuing with research, and actually they kind of serve like vaccines.
0:43:09 A vaccine makes your body a little bit sick, so you can handle the big disease later much
0:43:16 better.
0:43:17 It’s the same here.
0:43:18 People will point out, you know that accident, AI accident we had where 12 people died, everyone
0:43:23 still here, 12 people is less than smoking kills, it’s not a big deal, so we continue.
0:43:28 So in a way, it will actually be kind of confirming that it’s not that bad.
0:43:35 It matters how the deaths happen.
0:43:38 Whether it’s literally murdered by the AI system, then one is a problem.
0:43:43 But if it’s accidents because of increased reliance on the automation, for example, so
0:43:51 when airplanes are flying in an automated way, maybe the number of plane crashes increased
0:43:58 by 17% or something, and then you’re like, okay, do we really want to rely on automation?
0:44:04 I think in the case of automation, airplanes, they decrease significantly.
0:44:07 Okay, same thing with autonomous vehicles, like, okay, what are the pros and cons?
0:44:12 What are the tradeoffs here?
0:44:14 You can have that discussion in an honest way, but I think the kind of things we’re talking
0:44:20 about here is mass scale, pain and suffering caused by AI systems, and I think we need
0:44:29 to see illustrations of that in a very small scale to start to understand that this is
0:44:35 really damaging versus clippy, versus a tool that’s really useful to a lot of people to
0:44:41 do learning, to do summarization of texts, to do question and answer, all that kind of
0:44:47 stuff, to generate videos, that tool, fundamentally a tool versus an agent that can do a huge
0:44:54 amount of damage.
0:44:55 So, you bring up example of cars.
0:44:58 Yes.
0:44:59 Cars were slowly developed and integrated.
0:45:02 If we had no cars, and somebody came around and said, I invented this thing, it’s called
0:45:07 cars, it’s awesome, it kills like 100,000 Americans every year, let’s deploy it.
0:45:13 Would we deploy that?
0:45:15 There have been fear-mongering about cars for a long time.
0:45:18 The transition from horse to the cars, there’s a really nice channel that I recommend people
0:45:23 check out, Pessimist Archive, that documents all the fear-mongering about technology that’s
0:45:28 happened throughout history.
0:45:29 There’s definitely been a lot of fear-mongering about cars.
0:45:32 There’s a transition period there about cars, about how deadly they are, it took a very
0:45:39 long time for cars to proliferate to the degree they have now, and then you could ask serious
0:45:44 questions in terms of the miles traveled, the benefit to the economy, the benefit to
0:45:49 the quality of life that cars do, versus the number of deaths, 30, 40,000 in the United
0:45:54 States.
0:45:55 Are we willing to pay that price?
0:45:58 I think most people, when they’re rationally thinking, policymakers will say yes.
0:46:04 We want to decrease it from 40,000 to zero, and do everything we can to decrease it.
0:46:10 There’s all kinds of policies and incentives you can create to decrease the risks with
0:46:16 the deployment of technology, but then you have to weigh the benefits and the risks of
0:46:20 the technology.
0:46:21 The same thing would be done with AI.
0:46:24 You need data.
0:46:25 You need to know, but if I’m right and it’s unpredictable, unexplainable, uncontrollable,
0:46:30 you cannot make this decision with gaining $10 trillion of wealth, but we’re losing,
0:46:34 we don’t know how many people.
0:46:37 You basically have to perform an experiment on 8 billion humans without their consent.
0:46:44 Even if they want to give you consent, they can’t because they cannot give informed consent.
0:46:48 They don’t understand those things.
0:46:51 That happens when you go from the predictable to the unpredictable very quickly, but it’s
0:46:59 not obvious to me that AI systems would gain capabilities so quickly that you won’t be
0:47:04 able to collect enough data to study the benefits and the risks.
0:47:09 We literally doing it.
0:47:11 The previous model we learned about after we finished training it, what it was capable
0:47:15 of.
0:47:16 Let’s say we stopped GPT-4 training run around human capability, hypothetically.
0:47:21 We start training GPT-5, and I have no knowledge of insider training runs or anything.
0:47:27 We started at that point of about human, and we train it for the next nine months.
0:47:32 Maybe two months in, it becomes super intelligent.
0:47:34 We continue training it.
0:47:36 At the time when we start testing it, it is already a dangerous system.
0:47:42 How dangerous?
0:47:43 I have no idea, but neither people training it.
0:47:46 At the training stage, but then there’s a testing stage inside the company.
0:47:51 They can start getting intuition about what the system is capable to do.
0:47:54 You’re saying that somehow from leap from GPT-4 to GPT-5 can happen the kind of leap
0:48:03 where GPT-4 was controllable and GPT-5 is no longer controllable.
0:48:07 We get no insights from using GPT-4 about the fact that GPT-5 will be uncontrollable.
0:48:15 That’s the situation you’re concerned about, where their leap from N to N+1 would be such
0:48:23 that an uncontrollable system is created without any ability for us to anticipate that.
0:48:31 If we had capability of ahead of the run, before the training run, to register exactly
0:48:36 what capabilities that next model will have at the end of the training run, and we accurately
0:48:41 guessed all of them, I would say you’re right, we can definitely go ahead with this run.
0:48:45 We don’t have that capability.
0:48:47 From GPT-4, you can build up intuitions about what GPT-5 will be capable of.
0:48:52 It’s just incremental progress.
0:48:55 Even if that’s a big leap in capability, it just doesn’t seem like you can take a leap
0:49:01 from a system that’s helping you write emails to a system that’s going to destroy human
0:49:07 civilization.
0:49:08 It seems like it’s always going to be sufficiently incremental such that we can anticipate the
0:49:14 possible dangers.
0:49:15 We’re not even talking about existential risk, but just the kind of damage you can do to
0:49:20 civilization.
0:49:21 It seems like we’ll be able to anticipate the kinds, not the exact, but the kinds of
0:49:27 risks, in my lead to, and then rapidly develop defenses ahead of time and as the risks emerge.
0:49:37 We’re not talking just about capabilities, specific tasks.
0:49:40 We’re talking about general capability to learn.
0:49:43 Maybe like a child at the time of testing and deployment, it is still not extremely capable,
0:49:51 but as it is exposed to more data, real world, it can be trained to become much more dangerous
0:49:57 and capable.
0:49:58 Let’s focus then on the control problem.
0:50:03 At which point does the system become uncontrollable?
0:50:07 Why is it the more likely trajectory for you that the system becomes uncontrollable?
0:50:12 I think at some point it becomes capable of getting out of control.
0:50:17 For game theoretic reasons, it may decide not to do anything right away and for a long time
0:50:22 just collect more resources, accumulate strategic advantage.
0:50:27 Right away, it may be kind of still young, weak superintelligence, give it a decade.
0:50:32 It’s in charge of a lot more resources, it had time to make backups.
0:50:37 It’s not obvious to me that it will strike as soon as it can.
0:50:41 Can we just try to imagine this future where there’s an AI system that’s capable of escaping
0:50:49 the control of humans and then doesn’t and waits?
0:50:53 What’s that look like?
0:50:54 So one, we have to rely on that system for a lot of the infrastructure.
0:50:59 So we’ll have to give it access, not just to the internet, but to the task of managing
0:51:07 power, government, economy, this kind of stuff.
0:51:13 And that just feels like a gradual process given the bureaucracies of all those systems
0:51:17 involved.
0:51:18 We’ve been doing it for years.
0:51:19 Software controls all the systems, nuclear power plants, airline industry, it’s all software
0:51:24 based.
0:51:25 Every time there is electrical outage, I can’t fly anywhere for days.
0:51:29 But there’s a difference between software and AI, there’s different kinds of software.
0:51:35 So to give a single AI system access to the control of airlines and the control of the
0:51:41 economy, that’s not a trivial transition for humanity.
0:51:47 No, but if it shows it is safer, in fact, then it’s in control.
0:51:50 We get better results, people will demand that it was put in place.
0:51:54 Absolutely.
0:51:55 And if not, it can hack the system.
0:51:56 It can use social engineering to get access to it.
0:51:59 That’s why I said it might take some time for it to accumulate those resources.
0:52:03 It just feels like that would take a long time for either humans to trust it or for the
0:52:08 social engineering to come into play.
0:52:10 It’s not a thing that happens overnight.
0:52:12 It feels like something that happens across one or two decades.
0:52:15 I really hope you’re right, but it’s not what I’m seeing.
0:52:19 People are very quick to jump on the latest trend.
0:52:21 Early adopters will be there before it’s even deployed buying prototypes.
0:52:26 Maybe the social engineering, I can see, because so for social engineering, AI systems don’t
0:52:31 need any hardware access.
0:52:33 It’s all software.
0:52:34 So they can start manipulating you through social media and so on.
0:52:38 Like you have AI assistants that are going to help you do a lot of, manage a lot of your
0:52:42 day to day and then they start doing social engineering, but for a system that’s so capable
0:52:49 that can escape the control of humans that created it, such a system being deployed at
0:52:56 a mass scale and trusted by people to be deployed, it feels like that would take a lot of convincing.
0:53:06 So we’ve been deploying systems which had hidden capabilities.
0:53:11 Can you give an example?
0:53:12 GPT-4.
0:53:13 I don’t know what else is capable of, but there are still things we haven’t discovered,
0:53:17 can do.
0:53:18 There may be trivial proportion to its capability.
0:53:20 I don’t know, it writes Chinese poetry, hypothetical.
0:53:24 I know it does.
0:53:25 But we haven’t tested for all possible capabilities and we are not explicitly designing them.
0:53:33 We can only rule out bugs we find.
0:53:35 We cannot rule out bugs and capabilities because we haven’t found them.
0:53:43 Is it possible for a system to have hidden capabilities that are orders of magnitude
0:53:50 greater than its non-hidden capabilities?
0:53:54 This is the thing I’m really struggling with, where on the surface, the thing we understand
0:54:00 it can do doesn’t seem that harmful.
0:54:04 So even if it has bugs, even if it has hidden capabilities like Chinese poetry, or generating
0:54:10 effective software viruses, the damage that can do seems like on the same order of magnitude
0:54:18 as the capabilities that we know about.
0:54:23 So this idea that the hidden capabilities will include being uncontrollable is something
0:54:29 I’m struggling with because GPT-4 on the surface seems to be very controllable.
0:54:34 Again, we can only ask and test for things we know about if there are unknown unknowns,
0:54:40 we cannot do it.
0:54:41 I’m thinking of humans, artistic savants.
0:54:44 If you talk to a person like that, you may not even realize they can multiply 20-digit
0:54:49 numbers in their head.
0:54:50 You have to know to ask.
0:54:54 As I mentioned, just to sort of linger on the fear of the unknown.
0:55:00 So the pessimist archive has just documented, let’s look at data of the past, at history.
0:55:05 There’s been a lot of fear mongering about technology.
0:55:09 Pessimist archive does a really good job of documenting how crazily afraid we are of
0:55:15 every piece of technology.
0:55:16 We’ve been afraid.
0:55:17 There’s a blog post where Louis Aslow, who created Pessimist archive, writes about the
0:55:23 fact that we’ve been fear mongering about robots and automation for over 100 years.
0:55:30 So why is AGI different than the kinds of technologies we’ve been afraid of in the past?
0:55:36 So two things.
0:55:37 One, we’re switching from tools to agents.
0:55:40 Tools don’t have negative or positive impact.
0:55:45 People using tools do.
0:55:46 So guns don’t kill.
0:55:48 People with guns do.
0:55:50 Agents can make their own decisions.
0:55:52 They can be positive or negative.
0:55:53 A pit bull can decide to harm you.
0:55:57 That’s an agent.
0:55:58 The fears are the same.
0:56:01 The only difference is now we have this technology.
0:56:03 Then they were afraid of humanoid robots 100 years ago.
0:56:06 They had none.
0:56:07 Today, every major company in the world is investing billions to create them.
0:56:12 Not every, but you understand what I’m saying.
0:56:14 It’s very different.
0:56:16 Well, agents, it depends on what you mean by the word agents.
0:56:22 All those companies are not investing in a system that has the kind of agency that’s
0:56:27 implied by in the fears, where it can really make decisions on their own that have no human
0:56:33 in the loop.
0:56:34 They are saying they are building super intelligence and have a super alignment team.
0:56:39 You don’t think they are trying to create a system smart enough to be an independent
0:56:42 agent under that definition?
0:56:44 I have not seen evidence of it.
0:56:46 I think a lot of it is a marketing kind of discussion about the future and it’s a mission
0:56:54 about the kind of systems that can create in the long-term future, but in the short-term,
0:56:59 the kind of systems that are creating falls fully within the definition of narrow AI.
0:57:08 These are tools that have increasing capabilities, but they just don’t have a sense of agency
0:57:14 or consciousness or self-awareness or ability to deceive at scales that would be required
0:57:21 to do mass scale suffering and murder of humans.
0:57:24 Those systems are well beyond narrow AI.
0:57:27 If you had to list all the capabilities of GPT-4, you would spend a lot of time writing
0:57:31 that list.
0:57:32 But agency is not one of them.
0:57:34 Not yet, but do you think any of those companies are holding back because they think it may
0:57:39 be not safe or are they developing the most capable system they can, given the resources,
0:57:44 and hoping they can control and monetize?
0:57:49 Control and monetize.
0:57:50 Hoping they can control and monetize.
0:57:51 You’re saying, if they could press a button and create an agent, that they no longer control,
0:57:59 that they can have to ask nicely, a thing that lives on a server across a huge number
0:58:05 of computers.
0:58:09 You’re saying that they would push for the creation of that kind of system?
0:58:14 I mean, I can’t speak for other people, for all of them.
0:58:17 I think some of them are very ambitious.
0:58:19 They fundraise in trillions.
0:58:21 They talk about controlling the light gone over the universe.
0:58:24 I would guess that they might.
0:58:27 Well, that’s a human question.
0:58:30 Whether humans are capable of that, probably some humans are capable of that.
0:58:34 My more direct question is if it’s possible to create such a system.
0:58:39 Have a system that has that level of agency.
0:58:42 I don’t think that’s an easy technical challenge.
0:58:48 It doesn’t feel like we’re close to that.
0:58:50 A system that has the kind of agency where it can make its own decisions and deceive
0:58:54 everybody about them.
0:58:56 The current architecture we have in machine learning and how we train the systems, how
0:59:02 we deploy the systems and all that, it just doesn’t seem to support that kind of agency.
0:59:07 I really hope you’re right.
0:59:08 I think the scaling hypothesis is correct.
0:59:12 We haven’t seen diminishing returns.
0:59:14 It used to be we asked how long before AGI.
0:59:18 Now we should ask how much until AGI.
0:59:20 It’s trillion dollars today.
0:59:21 It’s a billion dollars next year.
0:59:23 It’s a million dollars in a few years.
0:59:25 Don’t you think it’s possible to basically run out of trillions?
0:59:31 Is this constrained by compute?
0:59:33 Compute gets cheaper every day exponentially.
0:59:36 Then that becomes a question of decades versus years.
0:59:39 If the only disagreement is that it will take decades, not years for everything I’m saying
0:59:45 to materialize, then I can go with that.
0:59:50 But if it takes decades, then the development of tools for AI safety becomes more and more
0:59:56 realistic.
0:59:57 I guess the question is, I have a fundamental belief that humans, when faced with danger,
1:00:04 can come up with ways to defend against that danger.
1:00:09 One of the big problems facing AI safety currently for me is that there’s not clear illustrations
1:00:15 of what that danger looks like.
1:00:18 There’s no illustrations of AI systems doing a lot of damage.
1:00:23 It’s unclear what you’re defending against because currently it’s a philosophical notion
1:00:28 that yes, it’s possible to imagine AI systems that take control of everything and then destroy
1:00:33 all humans.
1:00:35 It’s also a more formal mathematical notion that you talk about that it’s impossible to
1:00:41 have a perfectly secure system.
1:00:44 You can’t prove that a program of sufficient complexity is completely safe and perfect
1:00:52 and know everything about it.
1:00:53 Yes, but when you actually just programmatically look how much damage have the AI systems done
1:00:58 and what kind of damage, there’s not been illustrations of that.
1:01:03 Even in the autonomous weapon systems, there’s not been mass deployments of autonomous weapon
1:01:09 systems, luckily.
1:01:12 The automation in war currently is very limited.
1:01:18 The automation is at the scale of individuals versus at the scale of strategy and planning.
1:01:25 I think one of the challenges here is where is the dangers.
1:01:31 The intuition that Yamakuni and others have is let’s keep in the open building AI systems
1:01:37 until the dangers start rearing their heads.
1:01:42 They become more explicit, they start being case studies, illustrative case studies that
1:01:51 show exactly how the damage by AI systems is done, then regulation can step in, then
1:01:56 brilliant engineers can step up and we can have Manhattan style projects that defend
1:02:00 against such systems.
1:02:02 That’s kind of the notion and I guess attention with that is the idea that for you, we need
1:02:08 to be thinking about that now so that we’re ready because we’ll have not much time once
1:02:14 the systems are deployed.
1:02:16 Is that true?
1:02:17 There is a lot to unpack here.
1:02:19 There is a partnership on AI, a conglomerate of many large corporations.
1:02:25 They have a database of AI accidents they collect.
1:02:27 I contributed a lot to that database.
1:02:30 If we so far made almost no progress in actually solving this problem, not patching it, not
1:02:36 again, lipstick and a pig kind of solutions, why would we think we’ll do better than we
1:02:42 closer to the problem?
1:02:45 All the things you mentioned are serious concerns.
1:02:48 Measuring the amount of harm, so benefit versus risk there is difficult.
1:02:51 But to you, the sense is already the risk has superseded the benefit.
1:02:55 Again, I want to be perfectly clear.
1:02:57 I love AI.
1:02:58 I love technology.
1:02:59 I’m a computer scientist, I have PhD in engineering, I work at an engineering school.
1:03:02 There is a huge difference between we need to develop narrow AI systems, super intelligent
1:03:09 in solving specific human problems like protein folding, and let’s create super intelligent
1:03:15 machine-guided and we’ll decide what to do with us.
1:03:18 Those are not the same.
1:03:19 I am against the super intelligence in general sense with no undo button.
1:03:26 So do you think the teams that are doing, they’re able to do the AI safety on the kind
1:03:32 of narrow AI risks that you’ve mentioned, are those approaches going to be at all productive
1:03:41 towards leading to approaches of doing AI safety on AGI or is it just a fundamentally
1:03:46 different one?
1:03:47 Partially, but they don’t scale.
1:03:48 For narrow AI, for deterministic systems, you can test them.
1:03:52 You have edge cases.
1:03:53 You know what the answer should look like.
1:03:55 You know the right answers.
1:03:57 For general systems, you have infinite test surface.
1:04:00 You have no edge cases.
1:04:02 You cannot even know what to test for.
1:04:04 Again, the unknown unknowns are underappreciated by people looking at this problem.
1:04:11 You are always asking me, how will it kill everyone?
1:04:14 How will it will fail?
1:04:16 The whole point is, if I knew it, I would be super intelligent and despite what you might
1:04:20 think I’m not.
1:04:21 So to you, the concern is that we would not be able to see early signs of an uncontrollable
1:04:30 system.
1:04:31 It is a master at deception.
1:04:33 Sam tweeted about how great it is at persuasion, and we see it ourselves, especially now with
1:04:40 voices with maybe kind of flirty, sarcastic female voices.
1:04:45 It’s going to be very good at getting people to do things.
1:04:48 But I’m very concerned about system being used to control the masses.
1:04:58 But in that case, the developers know about the kind of control that’s happening.
1:05:04 You’re more concerned about the next stage, where even the developers don’t know about
1:05:09 the deception.
1:05:10 Right.
1:05:11 I don’t think developers know everything about what they are creating.
1:05:15 They have lots of great knowledge.
1:05:17 We’re making progress on explaining parts of the network.
1:05:20 We can understand, okay, this node, get excited when this input is presented, this cluster
1:05:27 of nodes.
1:05:28 But when nowhere near close to understanding the full picture, and I think it’s impossible,
1:05:34 you need to be able to survey an explanation.
1:05:37 The size of those models prevents a single human from observing all this information,
1:05:42 even if provided by the system.
1:05:44 So either we’re getting model as an explanation for what’s happening, and that’s not comprehensible
1:05:49 to us.
1:05:50 Or we’re getting a compressed explanation, lossy compression, where here’s top 10 reasons
1:05:56 you got fired.
1:05:57 It’s something, but it’s not a full picture.
1:05:59 I’ve given also an example of a child and everybody, all humans try to deceive.
1:06:05 They try to lie early on in their life.
1:06:08 I think we’ll just get a lot of examples of deceptions from large language models or
1:06:12 AI systems.
1:06:13 They’re going to be kind of shitty, or they’ll be pretty good, but we’ll catch them off-guard.
1:06:18 We’ll start to see the kind of momentum towards developing, increasing deception capabilities.
1:06:28 And that’s when you’re like, okay, we need to do some kind of alignment that prevents
1:06:31 deception.
1:06:32 But then we’ll have, if you support open source, then you can have open source models that
1:06:37 have some level of deception, you can start to explore on a large scale.
1:06:41 How do we stop it from being deceptive?
1:06:43 Then there’s a more explicit, pragmatic kind of problem to solve.
1:06:50 How do we stop AI systems from trying to optimize for deception?
1:06:56 That’s just an example, right?
1:06:57 So there is a paper, I think it came out last week by Dr. Park et al. from MIT, I think,
1:07:03 and they showed that existing models already showed successful deception in what they do.
1:07:11 My concern is not that they lie now and we need to catch them and tell them don’t lie.
1:07:15 My concern is that once they are capable and deployed, they will later change their mind
1:07:23 because that’s what unrestricted learning allows you to do.
1:07:28 Lots of people grow up maybe in the religious family, they read some new books and they
1:07:33 turn in their religion.
1:07:36 That’s a treacherous turn in humans.
1:07:38 If you learn something new about your colleagues, maybe you’ll change how you react to them.
1:07:45 Yeah, a treacherous turn.
1:07:48 If we just mentioned humans, Stalin and Hitler, there’s a turn.
1:07:53 Stalin’s a good example.
1:07:54 He just seems like a normal communist follower, Lenin, until there’s a turn.
1:08:01 There’s a turn of what that means in terms of when he has complete control with what
1:08:07 the execution of that policy means and how many people get to suffer.
1:08:10 You can’t say they’re not rational.
1:08:12 The rational decision changes based on your position.
1:08:15 Then you are under the boss, the rational policy maybe to be following orders and being
1:08:22 honest.
1:08:23 When you become a boss, rational policy may shift.
1:08:27 By the way, a lot of my disagreements here is just playing devil’s advocate to challenge
1:08:32 your ideas and to explore them together.
1:08:37 One of the big problems here in this whole conversation is human civilization hangs in
1:08:42 the balance and yet it’s everything’s unpredictable.
1:08:44 We don’t know how these systems will look like.
1:08:51 The robots are coming.
1:08:53 There’s a refrigerator making a buzzing noise.
1:08:55 Very menacing.
1:08:59 So every time I’m about to talk about this topic, things start to happen.
1:09:02 My flight yesterday was canceled without possibility to rebook.
1:09:06 I was giving a talk at Google in Israel and three cars which were supposed to take me
1:09:13 to the talk could not.
1:09:15 I’m just saying.
1:09:19 I like AIs.
1:09:21 I for one welcome our overlords.
1:09:24 There’s a degree to which it is very obvious.
1:09:29 As we already have, we’ve increasingly given our life over to software systems and then
1:09:35 it seems obvious given the capabilities of AI that are coming that will give our lives
1:09:40 over increasingly to AI systems.
1:09:44 As we’ll drive themselves, refrigerator eventually will optimize what I get to eat and as more
1:09:54 and more of our lives are controlled or managed by AI assistants, it is very possible that
1:10:00 there’s a drift.
1:10:02 I personally am concerned about non-existential stuff.
1:10:07 The more near term things because before we even get to existential, I feel like there
1:10:11 could be just so many brave new world type of situations.
1:10:14 You mentioned the term behavioral drift.
1:10:18 It’s the slow boiling that I’m really concerned about as we give our lives over to automation
1:10:24 that our minds can become controlled by governments, by companies or just in a distributed way.
1:10:32 There’s a drift.
1:10:34 Some aspect of our human nature gives ourselves over to the control of AI systems and they
1:10:40 in an unintended way just control how we think.
1:10:43 Maybe there’d be a herd like mentality and how we think which will kill all creativity
1:10:47 and exploration of ideas, the diversity of ideas or much worse.
1:10:53 So it’s true.
1:10:54 It’s true.
1:10:55 But a lot of the conversation I’m having with you now is also kind of wondering almost
1:11:01 on a technical level, how can AI escape control?
1:11:06 Like what would that system look like?
1:11:10 Because to me it’s terrifying and fascinating.
1:11:14 And also fascinating to me is maybe the optimistic notion that it’s possible to engineer systems
1:11:21 that are defending against that.
1:11:25 One of the things you write a lot about in your book is verifiers.
1:11:28 So not humans are also verifiers, but software systems that look at AI systems and help you
1:11:39 understand this thing is getting real weird, help you analyze those systems.
1:11:46 So maybe this is a good time to talk about verification.
1:11:50 What is this beautiful notion of verification?
1:11:53 My claim is again that there are very strong limits in what we can and cannot verify.
1:11:58 A lot of times when you post something in social media, people go, “Oh, I need citation
1:12:02 to a peer-reviewed article.”
1:12:04 But what is a peer-reviewed article?
1:12:06 You found two people in a world of hundreds of thousands of scientists who said, “I would
1:12:10 have a publisher.
1:12:11 I don’t care.”
1:12:12 That’s the verifier of that process.
1:12:15 When people say, “Oh, it’s formally verified software and mathematical proof,” they accept
1:12:21 something close to 100% chance of it being free of all problems.
1:12:27 But if you actually look at research, software is full of bugs.
1:12:32 Old mathematical theorems, which have been proven for hundreds of years, have been discovered
1:12:36 to contain bugs on top of which we generate new proofs, and now we have to redo all that.
1:12:42 So verifiers are not perfect.
1:12:46 Usually they are either a single human or communities of humans, and it’s basically
1:12:50 kind of like a democratic vote.
1:12:52 Many of mathematicians agrees that this proof is correct, mostly correct.
1:12:57 Even today, we’re starting to see some mathematical proofs as so complex, so large, that mathematical
1:13:03 community is unable to make a decision.
1:13:06 It looks interesting, it looks promising, but they don’t know.
1:13:08 They will need years for top scholars to study it, to figure it out.
1:13:13 So of course, we can use AI to help us with this process, but AI is a piece of software
1:13:18 which needs to be verified.
1:13:20 Just to clarify, so verification is the process of saying something is correct, sort of the
1:13:25 most formal, a mathematical proof, where there’s a statement and a series of logical statements
1:13:31 that prove that statement to be correct, which is a theorem.
1:13:36 And you’re saying it gets so complex that it’s possible for the human verifiers, the
1:13:42 human beings that verify that the logical step, there’s no bugs in it, it becomes impossible.
1:13:48 It’s nice to talk about verification in this most formal, most clear, most rigorous formulation
1:13:56 of it, which is mathematical proofs.
1:13:57 Right.
1:13:58 And for AI, we would like to have that level of confidence for very important mission critical
1:14:05 software controlling satellites, nuclear power plants, for small deterministic programs.
1:14:10 We can do this.
1:14:11 In fact, that code verifies its mapping to the design, whatever software engineers intend,
1:14:19 it was correctly implemented.
1:14:21 But we don’t know how to do this for software which keeps learning, self-modifying, rewriting
1:14:28 its own code.
1:14:30 We don’t know how to prove things about the physical world, states of humans in the physical
1:14:34 world.
1:14:35 So there are papers coming out now, and I have this beautiful one towards guaranteed
1:14:42 safe AI, very cool paper, some of the best outers I ever seen, I think there is multiple
1:14:48 touring award winners that is quite, you can have this one, and one just came out kind
1:14:53 of similar managing extreme AI risks.
1:14:57 So all of them expect this level of proof, but I would say that we can get more confidence
1:15:05 with more resources, we put into it, but at the end of the day, we still as reliable as
1:15:11 the verifiers.
1:15:13 And you have this infinite regressive verifiers, the software used to verify a program is itself
1:15:18 a piece of program.
1:15:19 If aliens give us well-aligned superintelligence, we can use that to create our own safe AI.
1:15:26 But it’s a catch-22.
1:15:27 You need to have already proven to be safe system to verify this new system of equal
1:15:34 or greater complexity.
1:15:35 You should just mention this paper towards guaranteed safe AI, a framework for ensuring
1:15:39 robust and reliable AI systems, like you mentioned, it’s like a who’s who.
1:15:44 Josh Tannenbaum, Yosha Ben-Joseph Russell, Max Tagmar, many other brilliant people.
1:15:50 The page you have it open on, there are many possible strategies for creating safety specifications.
1:15:55 These strategies can roughly be placed on a spectrum, depending on how much safety it
1:16:00 would grant if successfully implemented.
1:16:03 One way to do this is as follows, and there’s a set of levels from level zero, no safety
1:16:07 specification is used to level seven, the safety specification completely encodes all
1:16:12 things that humans might want in all contexts.
1:16:15 Where does this paper fall short to you?
1:16:18 So when I wrote a paper, artificial intelligence safety engineering, which kind of coins the
1:16:25 term AI safety, that was 2011, we had 2012 conference, 2013 journal paper, one of the
1:16:31 things I proposed, let’s just do formal verifications on it, let’s do mathematical formal proofs.
1:16:36 In the follow-up work, I basically realized it will still not get us 100%.
1:16:41 We can get 99.9, we can put more resources exponentially and get closer, but we never
1:16:47 get to 100%.
1:16:49 If a system makes a billion decisions a second, and you use it for 100 years, you’re still
1:16:54 going to deal with a problem.
1:16:56 This is wonderful research, I’m so happy they’re doing it, this is great.
1:17:00 But it is not going to be a permanent solution to that problem.
1:17:05 So just to clarify, the task of creating an AI verifier is what?
1:17:10 Is creating a verifier that the AI system does exactly as it says it does, or it sticks
1:17:15 within the guardrails that it says it must?
1:17:18 There are many, many levels.
1:17:20 So first, you’re verifying the hardware in which it is run.
1:17:23 You need to verify communication channel with the human.
1:17:27 Every aspect of that whole world model needs to be verified.
1:17:31 Somehow it needs to map the world into the world model, map and territory differences.
1:17:37 So how do I know internal states of humans?
1:17:39 Are you happy or sad?
1:17:40 I can’t tell.
1:17:42 So how do I make proofs about real physical world?
1:17:45 Yeah, I can verify that deterministic algorithm follows certain properties.
1:17:50 That can be done.
1:17:52 Some people argue that maybe just maybe two plus two is not four.
1:17:55 I’m not that extreme.
1:17:58 But once you have sufficiently large proof over sufficiently complex environment, the
1:18:04 probability that it has zero bugs in it is greatly reduced.
1:18:08 If you keep deploying this a lot, eventually you’re going to have a bug anyways.
1:18:13 There’s always a bug.
1:18:14 There is always a bug.
1:18:15 And the fundamental difference is what I mentioned.
1:18:17 We’re not dealing with cybersecurity.
1:18:19 We’re not going to get a new credit card, new humanity.
1:18:22 So this paper is really interesting.
1:18:24 You said 2011, artificial intelligence, safety engineering, why machine ethics is a wrong
1:18:29 approach.
1:18:31 The grand challenge you write of AI safety engineering, we propose the problem of developing
1:18:38 safety mechanisms for self-improving systems.
1:18:43 Self-improving systems.
1:18:44 But that’s an interesting term for the thing that we’re talking about.
1:18:51 Is self-improving more general than learning?
1:18:55 So self-improving, that’s an interesting term.
1:18:59 You can improve the rate at which you are learning.
1:19:01 You can become more efficient, meta-optimizer.
1:19:04 The word self, it’s like self-replicating, self-improving.
1:19:11 You can imagine a system building its own world on a scale and in a way that is way
1:19:17 different than the current systems do.
1:19:19 It feels like the current systems are not self-improving or self-replicating or self-growing
1:19:24 or self-spreading, all that kind of stuff.
1:19:28 And once you take that leap, that’s when a lot of the challenges seems to happen.
1:19:32 Because the kind of bugs you can find now seems more akin to the current sort of normal
1:19:39 software debugging kind of process.
1:19:44 But whenever you can do self-replication and arbitrary self-improvement, that’s when a
1:19:51 bug can become a real problem, real fast.
1:19:56 So what is the difference to you between verification of a non-self-improving system versus a verification
1:20:03 of a self-improving system?
1:20:05 So if you have fixed code, for example, you can verify that code, static verification
1:20:10 at the time.
1:20:11 But if it will continue modifying it, you have a much harder time guaranteeing that important
1:20:19 properties of that system have not been modified than the code changed.
1:20:23 Does it even do them all?
1:20:25 No.
1:20:26 Does the whole process of verification is completely fall apart?
1:20:29 It can always cheat.
1:20:30 It can store parts of its code outside in the environment.
1:20:33 It can have kind of extended mind situations.
1:20:36 So this is exactly the type of problems I’m trying to bring up.
1:20:40 What are the classes of verifiers that you read about in the book?
1:20:43 Is there an interesting one to stand out to you?
1:20:46 Do you have some favorites?
1:20:48 So I like oracle types where you kind of just know that it’s right during like oracle machines.
1:20:53 They know the right answer, how, who knows.
1:20:56 But they pull it out from somewhere, so you have to trust them.
1:20:59 And that’s a concern I have about humans in a world with very smart machines.
1:21:06 We experiment with them.
1:21:08 We see after a while, okay, they always been right before and we start trusting them without
1:21:12 any verification of what they’re saying.
1:21:14 Oh, I see that we kind of build oracle verifiers or rather we build verifiers we believe to
1:21:21 be oracles and then we start to, without any proof, use them as if they’re oracle verifiers.
1:21:28 We remove ourselves from that process.
1:21:30 We are not scientists who understand the world, we are humans who get new data presented
1:21:36 to us.
1:21:37 Okay, one really cool class of verifiers is a self-verifier.
1:21:42 Is it possible that you somehow engineer into AI systems the thing that constantly verifies
1:21:48 itself?
1:21:49 Preserved portion of it can be done, but in terms of mathematical verification, it’s kind
1:21:55 of useless.
1:21:56 You are saying you are the greatest guy in the world because you are saying it.
1:21:59 It’s circular and not very helpful, but it’s consistent.
1:22:02 We know that within that world, you have verified that system.
1:22:06 In a paper, I try to kind of brute force all possible verifiers.
1:22:10 It doesn’t mean that this one is particularly important to us.
1:22:14 But what about like self-doubt, like the kind of verification where you said you say or
1:22:20 I say I’m the greatest guy in the world?
1:22:22 What about a thing which I actually have is a voice that is constantly extremely critical?
1:22:28 So like, engineer into the system a constant uncertainty about self, a constant doubt.
1:22:38 Any smart system would have doubt about everything, all right?
1:22:41 You’re not sure of what information you are given us through if you are subject to manipulation.
1:22:48 You have this safety and security mindset.
1:22:51 What I mean, you have doubt about yourself.
1:22:54 So the AI systems that has doubt about whether the thing is doing is causing harm is the
1:23:03 right thing to be doing.
1:23:04 So just a constant doubt about what it’s doing because it’s hard to be a dictator full of
1:23:09 doubt.
1:23:10 I may be wrong, but I think Stuart Russell’s ideas are all about machines which are uncertain
1:23:17 about what humans want and trying to learn better and better.
1:23:21 What we want, the problem of course is we don’t know what we want and we don’t agree
1:23:24 on it.
1:23:25 Yeah, but uncertainty, his idea is that having that like self-doubt, uncertainty in AI systems,
1:23:32 engineering AI systems is one way to solve the control problem.
1:23:35 It could also backfire.
1:23:37 Maybe you’re uncertain about completing your mission.
1:23:40 Like I am paranoid about your camera is not recording right now.
1:23:43 So I would feel much better if you had a secondary camera, but I also would feel even better
1:23:48 if you had a third and eventually I would turn this whole world into cameras pointing
1:23:53 at us, making sure we’re capturing this.
1:23:57 No, but wouldn’t you have a meta concern like that you just stated that eventually there
1:24:03 would be way too many cameras?
1:24:06 So you would be able to keep zooming on the big picture of your concerns.
1:24:12 So it’s a multi-objective optimization.
1:24:16 It depends how much I value capturing this versus not destroying the universe.
1:24:22 Right, exactly.
1:24:24 And then you will also ask about like what does it mean to destroy the universe and how
1:24:27 many universes are, and you keep asking that question, but that doubting yourself would
1:24:32 prevent you from destroying the universe because you’re constantly full of doubt.
1:24:36 It might affect your productivity.
1:24:38 You might be scared to do anything.
1:24:40 It’s just scared to do anything.
1:24:42 Mess things up.
1:24:43 Well, that’s better.
1:24:44 I mean, I guess the question is the possible to engineer that in.
1:24:47 I guess your answer would be yes, but we don’t know how to do that, and we need to invest
1:24:51 a lot of effort into figuring out how to do that, but it’s unlikely.
1:24:55 Underpinning a lot of your writing is this sense that we’re screwed.
1:25:03 But it just feels like it’s an engineering problem.
1:25:07 I don’t understand why we’re screwed.
1:25:10 Time and time again, humanity has gotten itself into trouble and figured out a way to get
1:25:15 out of trouble.
1:25:17 We are in a situation where people making more capable systems just need more resources.
1:25:23 They don’t need to invent anything, in my opinion.
1:25:27 Some will disagree, but so far at least I don’t see diminishing returns.
1:25:30 If you have 10x compute, you will get better performance.
1:25:34 The same doesn’t apply to safety.
1:25:36 If you give me or any other organization 10x the money, they don’t output 10x the safety,
1:25:43 and the gap between capabilities and safety becomes bigger and bigger all the time.
1:25:49 So it’s hard to be completely optimistic about our results here.
1:25:54 I can name 10 excellent breakthrough papers in machine learning.
1:25:59 I would struggle to name equally important breakthroughs in safety.
1:26:03 A lot of times a safety paper will propose a toy solution and point out 10 new problems
1:26:09 discovered as a result.
1:26:10 It’s like this fractal.
1:26:11 You’re zooming in and you see more problems, and it’s infinite in all directions.
1:26:16 Does this apply to other technologies, or is this unique to AI, where safety is always
1:26:23 lagging behind?
1:26:25 So I guess we can look at related technologies with cybersecurity.
1:26:30 We did manage to have banks and casinos and Bitcoin, so you can have secure narrow systems,
1:26:38 which are doing okay, narrow attacks and them fail, but you can always go outside of a box.
1:26:46 So if I can’t hack your Bitcoin, I can hack you.
1:26:50 So there is always something.
1:26:51 If I really want it, I will find a different way.
1:26:54 We talk about guardrails for AI.
1:26:56 Well, that’s a fence.
1:26:58 I can dig a tunnel under it, I can jump over it, I can climb it, I can walk around it.
1:27:03 You may have a very nice guardrail, but in a real world, it’s not a permanent guarantee
1:27:08 of safety.
1:27:09 And again, this is a fundamental difference.
1:27:11 We are not saying we need to be 90% safe to get those trillions of dollars of benefit.
1:27:17 We need to be 100% indefinitely, or we might lose the principle.
1:27:23 So if you look at just humanity as a set of machines, is the machinery of AI safety conflicting
1:27:35 with the machinery of capitalism?
1:27:37 I think we can generalize it to just the prisoner’s dilemma in general, personal self-interest
1:27:43 versus group interest.
1:27:46 The incentive such that everyone wants the best for them, capitalism obviously has that
1:27:53 tendency to maximize your personal gain, which does create this race to the bottom.
1:28:02 I don’t have to be a lot better than you, but if I’m 1% better than you, I’ll capture
1:28:08 more of a profit, so it’s worth for me personally to take the risk, even if society as a whole
1:28:14 will suffer as a result.
1:28:17 The capitalism has created a lot of good in this world.
1:28:23 It’s not clear to me that AI safety is not aligned with the function of capitalism, unless
1:28:29 AI safety is so difficult that it requires the complete halt of the development, which
1:28:36 is also a possibility.
1:28:38 It just feels like building safe systems should be the desirable thing to do for tech companies.
1:28:47 Right.
1:28:48 Look at the governance structures, then you have someone with complete power, they’re
1:28:52 extremely dangerous.
1:28:54 So the solution we came up with is break it up.
1:28:57 You have judicial, legislative, executive, same here, have narrow AI systems, work on
1:29:02 important problems.
1:29:03 Solve immortality.
1:29:04 It’s a biological problem.
1:29:07 We can solve similar to how progress was made with protein folding using a system which
1:29:13 doesn’t also play chess.
1:29:15 There is no reason to create superintelligence system to get most of the benefits we want
1:29:22 from much safer narrow systems.
1:29:26 It really is a question to me whether companies are interested in creating anything but narrow
1:29:33 AI.
1:29:34 I think when term AGI is used by tech companies, they mean narrow AI.
1:29:42 They mean narrow AI with amazing capabilities.
1:29:48 I do think that there’s a lead between narrow AI with amazing capabilities, with superhuman
1:29:54 capabilities and the kind of self-motivated agent like AGI system that we’re talking
1:30:00 about.
1:30:01 I don’t know if it’s obvious to me that a company would want to take the leap to creating
1:30:08 an AGI that it would lose control of because then it can’t capture the value from that
1:30:14 system.
1:30:15 But the bragging rights, but being first, that is the same humans who are in the system.
1:30:22 So that jumps from the incentives of capitalism to human nature.
1:30:28 So the question is whether human nature will override the interests of the company.
1:30:34 So you’ve mentioned slowing or halting progress.
1:30:40 Is that one possible solution or your proponent of pausing development of AI, whether it’s
1:30:44 for six months or completely?
1:30:47 The condition would be not time but capabilities.
1:30:52 Pause until you can do XYZ.
1:30:54 If I’m right and you cannot, it’s impossible, then it becomes a permanent ban.
1:31:00 But if you’re right and it’s possible, so as soon as you have the safety capabilities,
1:31:04 go ahead.
1:31:06 So is there any actual explicit capabilities that we as a human civilization could put
1:31:15 on paper?
1:31:16 Is it possible to make explicit like that versus kind of a vague notion of, just like
1:31:23 you said, it’s very vague.
1:31:24 We want AI systems to do good and we want them to be safe.
1:31:27 Those are very vague notions, these are more formal notions.
1:31:31 So when I think about this problem, I think about having a toolbox I would need.
1:31:37 Capabilities such as explaining everything about that system’s design and workings, predicting
1:31:44 not just terminal goal, but all the intermediate steps of a system.
1:31:50 Tool in terms of either direct control, some sort of a hybrid option, ideal advisor, doesn’t
1:31:56 matter which one you pick, but you have to be able to achieve it.
1:32:01 In a book, we talk about ours, verification is another very important tool.
1:32:09 Communication without ambiguity, human language is ambiguous, that’s another source of danger.
1:32:13 So basically, there is a paper we published in ACM surveys, which looks at about 50 different
1:32:21 impossibility results, which may or may not be relevant to this problem, but we don’t
1:32:26 have enough human resources to investigate all of them for relevance to AI safety.
1:32:31 The ones I mentioned to you, I definitely think would be handy and that’s what we see
1:32:35 AI safety researchers working on, explainability is a huge one.
1:32:39 The problem is that it’s very hard to separate capabilities work from safety work.
1:32:46 If you make good progress in explainability, now the system itself can engage in self-improvement
1:32:52 much easier, increasing capability greatly.
1:32:55 So it’s not obvious that there is any research which is pure safety work without disproportionate
1:33:03 increasing capability and danger.
1:33:06 Explainability is really interesting.
1:33:08 Why is that connected to user capability?
1:33:10 If it’s able to explain itself well, why does that naturally mean that it’s more capable?
1:33:13 Right now, it’s comprised of weights on a neural network.
1:33:18 If it can convert it to manipulatable code like software, it’s a lot easier to work
1:33:22 in self-improvement.
1:33:23 I see.
1:33:24 So it–
1:33:25 You can do intelligent design instead of evolutionary gradual descent.
1:33:31 Well, you could probably do human feedback, human alignment more effectively if it’s able
1:33:37 to be explainable.
1:33:38 If it’s able to convert the weights into human understandable form, then you could probably
1:33:42 have humans interact with it better.
1:33:44 Do you think there’s hope that we can make AI systems explainable?
1:33:49 Not completely.
1:33:50 So if they are sufficiently large, you simply don’t have the capacity to comprehend what
1:33:59 all the trillions of connections represent.
1:34:02 Again, you can obviously get a very useful explanation which talks about top most important
1:34:08 features which contribute to the decision.
1:34:10 But the only true explanation is the model itself.
1:34:13 So there’s– deception could be part of the explanation, right?
1:34:18 So you can never prove that there is some deception in the network explaining itself.
1:34:24 Absolutely.
1:34:25 And you can probably have targeted deception where different individuals will understand
1:34:30 the explanation in different ways based on their cognitive capability.
1:34:35 So while what you’re saying may be the same and true in some situations, ours will be
1:34:40 deceived by it.
1:34:41 So it’s impossible for an AI system to be truly fully explainable in the way that we
1:34:47 mean.
1:34:48 Honestly and perfectly.
1:34:49 I think at the extreme, the systems which are narrow and less complex could be understood
1:34:54 pretty well.
1:34:55 If it’s impossible to be perfectly explainable, is there a hopeful perspective on that?
1:35:00 Like it’s impossible to be perfectly explainable, but you can explain mostly important stuff.
1:35:06 Most that you can– you can ask a system, what are the worst ways you can hurt humans?
1:35:11 And it will answer honestly.
1:35:13 Any work in a safety direction right now seems like a good idea because we are not slowing
1:35:20 down.
1:35:21 I’m not for a second thinking that my message or anyone else’s will be heard and will be
1:35:28 a same civilization which decides not to kill itself by creating its own replacements.
1:35:34 The pausing of development is an impossible thing for you.
1:35:37 Again, it’s always limited by either geographic constraints, pause in US, pause in China.
1:35:44 So there are other jurisdictions as the scale of a project becomes smaller.
1:35:50 So right now it’s like Manhattan project scale in terms of costs and people.
1:35:55 But if five years from now compute is available on a desktop to do it, regulation will not
1:36:01 help.
1:36:02 You can’t control it as easy.
1:36:03 Any kid in the garage can train a model.
1:36:06 So a lot of it is, in my opinion, just safety theater, security theater, where we’re saying,
1:36:12 oh, it’s illegal to train models so big.
1:36:17 So OK, that’s security theater and is government regulation also security theater?
1:36:24 Given that a lot of the terms are not well-defined and really cannot be enforced in real life,
1:36:30 we don’t have ways to monitor training runs meaningfully live while they take place.
1:36:36 There are limits to testing for capabilities I mentioned.
1:36:39 So a lot of it cannot be enforced.
1:36:42 Do I strongly support all that regulation?
1:36:44 Yes, of course.
1:36:45 Any type of red tape will slow it down and take money away from compute towards lawyers.
1:36:50 Can you help me understand what is the hopeful path here for you solution-wise?
1:36:56 Out of this, it sounds like you’re saying AI systems in the end are unverifiable, unpredictable
1:37:05 as the book says, unexplainable, uncontrollable.
1:37:10 That’s the big one.
1:37:12 Uncontrollable and all the other uns just make it difficult to avoid getting to the uncontrollable,
1:37:18 I guess.
1:37:19 Once it’s uncontrollable, then it just goes wild.
1:37:23 Surely there’s solutions.
1:37:25 Humans are pretty smart.
1:37:28 What are possible solutions?
1:37:29 If you are a dictator of the world, what do we do?
1:37:32 So the smart thing is not to build something you cannot control, you cannot understand,
1:37:38 build what you can and benefit from it.
1:37:40 I’m a big believer in personal self-interest.
1:37:43 A lot of the guys running those companies are young rich people.
1:37:48 How do they have to gain beyond billions we already have financially, right?
1:37:52 It’s not the requirement that they press that button.
1:37:56 They can easily wait a long time.
1:37:58 They can just choose not to do it and still have amazing life.
1:38:04 In history, a lot of times, if you did something really bad, at least you became part of history
1:38:08 books.
1:38:09 There is a chance in this case there won’t be any history.
1:38:12 So you’re saying the individuals running these companies should do some soul-searching and
1:38:19 what?
1:38:20 And stop development?
1:38:21 Well, either they have to prove that of course it’s possible to indefinitely control God-like
1:38:27 superintelligent machines by humans and ideally let us know how or agree that it’s not possible
1:38:34 and it’s a very bad idea to do it, including for them personally and their families and
1:38:38 friends and capital.
1:38:40 So what do you think the actual meetings inside these companies look like?
1:38:45 Don’t you think they’re all the engineers?
1:38:48 Really, it is the engineers that make this happen.
1:38:50 They’re not like automatons, they’re human beings, they’re brilliant human beings.
1:38:54 So they’re non-stop asking, how do we make sure this is safe?
1:39:00 So again, I’m not inside from outside.
1:39:03 It seems like there is a certain filtering going on and restrictions and criticism and
1:39:08 what they can say and everyone who was working in charge of safety and whose responsibility
1:39:14 it was to protect us said, “You know what, I’m going home.”
1:39:19 So that’s not encouraging.
1:39:21 What do you think the discussion inside those companies look like?
1:39:26 You’re developing, you’re training GPT-5, you’re training Gemini, you’re training Claude
1:39:33 and Grock.
1:39:34 Don’t you think they’re constantly underneath it, maybe it’s not made explicit, but you’re
1:39:39 constantly wondering where does the system currently stand, where did the possible understand
1:39:46 the consequences, where are the limits, where are the bugs, the small and the big bugs.
1:39:54 That’s the constant thing that the engineers are worried about.
1:39:58 I think super alignment is not quite the same as the kind of thing I’m referring to
1:40:07 what engineers are worried about.
1:40:08 Super alignment is saying, for future systems that we don’t quite yet have, how do we keep
1:40:15 them safe?
1:40:16 You’re trying to be a step ahead.
1:40:18 It’s a different kind of problem because it’s almost more philosophical.
1:40:23 It’s a really tricky one because you’re trying to make, prevent future systems from escaping
1:40:32 control of humans.
1:40:33 That’s really, I don’t think there’s been, is there anything akin to it in the history
1:40:40 of humanity?
1:40:41 I don’t think so, right?
1:40:42 Climate change?
1:40:43 But there’s an entire system which is climate, which is incredibly complex, which we don’t
1:40:49 have only tiny control of.
1:40:55 It’s its own system.
1:40:56 In this case, we’re building the system.
1:41:01 How do you keep that system from becoming destructive?
1:41:05 That’s a really difficult, different problem than the current meetings that companies are
1:41:09 having where the engineers are saying, okay, how powerful is this thing?
1:41:14 How does it go wrong?
1:41:18 As we train GPT-5 and train up future systems, where are the ways that can go wrong?
1:41:23 Don’t you think all those engineers are constantly worrying about this, thinking about this, which
1:41:28 is a little bit different than the superalignment team that’s thinking a little bit farther
1:41:33 into the future?
1:41:35 I think a lot of people who historically worked on AI never considered what happens
1:41:44 when they succeed.
1:41:50 Let’s look at software today.
1:41:57 What is the state of safety and security of our user software, things we give to millions
1:42:04 of people?
1:42:05 It is no liability.
1:42:06 You click, I agree.
1:42:08 What are you agreeing to?
1:42:09 Nobody knows.
1:42:10 Nobody reads.
1:42:11 They’re saying it will spy on you, corrupt your data, kill your first born, and you agree
1:42:15 and you’re not going to sue the company.
1:42:17 That’s the best they can do for mundane software, word processor, text software.
1:42:23 No liability, no responsibility, just as long as you agree not to sue us, you can use it.
1:42:29 If this is a state of the art in systems which narrow accountants, stable manipulators, why
1:42:35 do we think we can do so much better with much more complex systems, cross multiple
1:42:41 domains in the environment with malevolent actors, with, again, self-improvement, with
1:42:47 capabilities exceeding those of humans thinking about it?
1:42:52 The liability thing is more about lawyers than killing first borns.
1:42:56 If Clippy actually killed the child, I think lawyers aside, it would end Clippy and the
1:43:04 company that owns Clippy.
1:43:06 All right, so it’s not so much about, there’s two points to be made.
1:43:12 One is like, man, current software systems are full of bugs and they could do a lot of
1:43:20 damage and we don’t know what kind, is there unpredictable, there’s so much damage they
1:43:23 could possibly do.
1:43:26 And then we kind of live in this blissful illusion that everything is great and perfect
1:43:31 and it works.
1:43:33 Nevertheless, it still somehow works.
1:43:36 In many domains, we see car manufacturing, drug development.
1:43:40 The burden of proof is on the manufacturer of product or service to show their product
1:43:45 or service is safe.
1:43:46 It is not up to the user to prove that there are problems.
1:43:50 They have to do appropriate safety studies, they have to get government approval for selling
1:43:56 the product and they are still fully responsible for what happens.
1:44:00 We don’t see any of that here.
1:44:02 They can deploy whatever they want and I have to explain how that system is going to kill
1:44:07 everyone.
1:44:08 I don’t work for that company.
1:44:10 You have to explain to me how it’s definitely cannot mess up.
1:44:14 That’s because it’s the very early days of such a technology.
1:44:17 Government regulations lagging behind.
1:44:19 They’re really not tech savvy, a regulation of any kind of software.
1:44:23 If you look at like Congress talking about social media, whenever Mark Zuckerberg and
1:44:27 other CEOs show up, the cluelessness that Congress has about how technology works is
1:44:34 incredible.
1:44:36 It’s heartbreaking.
1:44:37 I agree completely, but that’s what scares me.
1:44:40 The response is when they start to get dangerous, we’ll really get it together, the politicians
1:44:45 will pass the right laws, engineers will solve the right problems.
1:44:49 We are not that good at many of those things.
1:44:52 We take forever and we are not early.
1:44:55 We are two years away according to prediction markets.
1:44:58 This is not a biased CEO fundraising.
1:45:01 This is what smartest people, super forecasters are thinking of this problem.
1:45:06 I’d like to push back about those predictions.
1:45:10 I wonder what those prediction markets are about, how they define AGI.
1:45:15 That’s wild to me.
1:45:16 I want to know what they said about autonomous vehicles because I’ve heard a lot of experts,
1:45:22 financial experts talk about autonomous vehicles and how it’s going to be a multi-trillion dollar
1:45:27 industry and all this kind of stuff.
1:45:30 It’s a small fund, but if you have good vision, maybe you can zoom in on that and see the
1:45:35 prediction dates and description.
1:45:37 There’s a lot.
1:45:38 I have a large one if you’re interested.
1:45:40 I guess my fundamental question is how often they write about technology.
1:45:46 I definitely do- There are studies on their accuracy rates and all that.
1:45:51 You can look it up.
1:45:52 Okay.
1:45:53 Even if they’re wrong, I’m just saying this is right now the best we have.
1:45:56 This is what humanity came up with as the predicted date.
1:46:00 Again, what they mean by AGI is really important there because there’s the non-agent like
1:46:07 AGI and then there’s the agent like AGI.
1:46:10 I don’t think it’s as trivial as a wrapper, putting a wrapper around.
1:46:17 One has lipstick and all it takes is to remove the lipstick.
1:46:20 I don’t think it’s that trivial.
1:46:21 You may be completely right, but what probability would you assign it?
1:46:25 You may be 10% wrong, but we’re betting all of humanity and this distribution, it seems
1:46:30 irrational.
1:46:31 Yeah.
1:46:32 It’s definitely not like one or zero percent.
1:46:34 Yeah.
1:46:35 What are your thoughts, by the way, about current systems?
1:46:40 Where they stand?
1:46:41 So GPT-4O, Claw 3, GROC, Gemini, on the path to superintelligence, to agent like superintelligence,
1:46:54 where are we?
1:46:55 I think they all about the same, obviously there are nuanced differences, but in terms
1:47:00 of capability, I don’t see a huge difference between them.
1:47:05 As I said, in my opinion, across all possible tasks, they exceed performance of an average
1:47:11 person.
1:47:12 Yeah.
1:47:13 I think they starting to be better than an average master student at my university, but
1:47:18 they still have very big limitations.
1:47:21 If the next model is as improved as GPT-4O versus GPT-3O, we may see something very, very,
1:47:30 very capable.
1:47:31 What do you feel about all this?
1:47:32 I mean, you’ve been thinking about AI safety for a long, long time, and at least for me,
1:47:40 the leaps, I mean, it probably started with Alpha-Zero, what was mine blowing for me?
1:47:49 Then the breakthroughs with LLMs, even GPT-2, but just the breakthroughs on LLMs, just mine
1:47:55 blowing to me.
1:47:56 What does it feel like to be living this day and age where all this talk about AGI feels
1:48:02 like it actually might happen, and quite soon, meaning within our lifetime?
1:48:10 What was it feel like?
1:48:11 When I started working on this, it was pure science fiction.
1:48:14 There was no funding, no journals, no conferences, no one in academia would dare to touch anything
1:48:19 with the word singularity in it, and I was pretty tenure at the time, so I was pretty
1:48:25 dumb.
1:48:26 Now, you see, touring award winners, publishing in science about how far behind we are according
1:48:33 to them in addressing this problem.
1:48:37 It’s definitely a change.
1:48:39 It’s difficult to keep up.
1:48:41 I used to be able to read every paper on AI safety, then I was able to read the best ones,
1:48:47 then the titles, and now I don’t even know what’s going on.
1:48:50 By the time this interview is over, we probably had GPT-6 released, and I have to deal with
1:48:55 that when I get back home.
1:48:58 It’s interesting.
1:48:59 Yes, there is now more opportunities.
1:49:00 I get invited to speak to smart people.
1:49:03 By the way, I would have talked to you before any of this.
1:49:09 This is not like some trend of AI.
1:49:11 To me, we’re still far away, so just to be clear, we’re still far away from AI, but not
1:49:17 far away in the sense relative to the magnitude of impact it can have, we’re not far away.
1:49:25 We weren’t far away 20 years ago because the impact that AI can have is on a scale of centuries.
1:49:33 It can end human civilization or it can transform it.
1:49:36 This discussion about one or two years versus one or two decades, or even 100 years, not
1:49:41 as important to me because we’re headed there.
1:49:45 This is like a human civilization scale question.
1:49:51 This is not just a hot topic.
1:49:53 It is the most important problem we’ll ever face.
1:49:57 It is not like anything we had to deal with before.
1:50:00 We never had birth of another intelligence, like aliens never visited us as far as I know.
1:50:08 Similar type of problem, by the way, if an intelligent alien civilization visited us.
1:50:13 That’s a similar kind of situation.
1:50:16 In some ways, if you look at history, any time a more technologically advanced civilization
1:50:20 visited a more primitive one, the results were genocide every single time.
1:50:26 Sometimes the genocide is worse than others.
1:50:27 Sometimes there’s less suffering and more suffering.
1:50:30 They always wondered, “But how can they kill us with those fire sticks and biological blankets?”
1:50:37 I mean, Jengis Khan was nicer.
1:50:38 He offered the choice of join or die.
1:50:43 But join implies you have something to contribute.
1:50:46 What are you contributing to superintelligence?
1:50:49 In the zoo, we’re entertaining to watch.
1:50:54 To our humans.
1:50:55 You know, I just spent some time in the Amazon.
1:50:57 I watched ants for a long time, and ants are kind of fascinating to watch.
1:51:02 I’ve watched them for a long time.
1:51:03 I’m sure there’s a lot of value in watching humans because we’re like the interesting
1:51:09 thing about humans.
1:51:10 You know, like when you have a video game that’s really well balanced?
1:51:14 Because of the whole evolutionary process, we’ve created the society is pretty well balanced.
1:51:19 Our limitations as humans and our capabilities are balanced from a video game perspective.
1:51:24 So we have wars.
1:51:25 We have conflicts.
1:51:26 We have cooperation.
1:51:27 Like in a game theoretic way, it’s an interesting system to watch in the same way that an ant
1:51:32 colony is an interesting system to watch.
1:51:34 So like if I was in alien civilization, I wouldn’t want to disturb it.
1:51:38 I’d just watch it.
1:51:39 Interesting.
1:51:40 Maybe perturb it every once in a while in interesting ways.
1:51:43 Well, we’re getting back to our simulation discussion from before.
1:51:47 How did it happen that we exist at exactly like the most interesting 20, 30 years in
1:51:52 the history of this civilization?
1:51:54 It’s been around for 15 billion years and that here we are.
1:51:58 What’s the probability that we live in a simulation?
1:52:01 I know never to say a hundred percent, but pretty close to that.
1:52:06 Is it possible to escape the simulation?
1:52:09 I have a paper about that.
1:52:11 This is just the first page teaser, but it’s like a nice 30 page document.
1:52:15 I’m still here, but yes.
1:52:17 How to hack the simulation is the title.
1:52:19 I spend a lot of time thinking about that.
1:52:21 That would be something I would want superintelligence to help us with, and that’s exactly what
1:52:25 the paper is about.
1:52:27 We used AI boxing as a possible tool for control AI.
1:52:32 We realized AI will always escape, but that is a skill we might use to help us escape
1:52:39 from our virtual box if we are in one.
1:52:42 Yeah, you have a lot of really great quotes here, including Elon Musk saying what’s outside
1:52:47 the simulation.
1:52:48 A question I asked him, he would ask an AGI system and he said he would ask what’s outside
1:52:53 the simulation.
1:52:54 That’s a really good question to ask.
1:52:57 Maybe the follow-up is the title of the paper, is how to get out or how to hack it.
1:53:03 The abstract reads, “Many researchers have conjectured that the humankind is simulated
1:53:08 along with the rest of the physical universe.
1:53:11 In this paper, we do not evaluate evidence for or against such a claim, but instead ask
1:53:16 a computer-sized question, namely, can we hack it?”
1:53:21 More formally, the question could be phrased as, “Could generally intelligent agents placed
1:53:25 in virtual environments find a way to jailbreak out of them?”
1:53:28 That’s a fascinating question.
1:53:30 At a small scale, you can actually just construct experiments.
1:53:36 Okay.
1:53:38 Can they?
1:53:39 How can they?
1:53:40 A lot depends on intelligence of simulators.
1:53:45 With humans boxing superintelligence, the entity in a box was smarter than us, presumed
1:53:52 to be.
1:53:53 If the simulators are much smarter than us and the superintelligence we create, then
1:53:58 probably they can contain us because greater intelligence can control lower intelligence,
1:54:03 at least for some time.
1:54:05 On the other hand, if our superintelligence somehow, for whatever reason, despite having
1:54:10 only local resources, manages to fume two levels beyond it, maybe it will succeed.
1:54:18 Maybe the security is not that important to them.
1:54:20 Maybe it’s entertainment systems, so there is no security and it’s easy to hack it.
1:54:24 If I was creating a simulation, I would want the possibility to escape it to be there.
1:54:32 The possibility of fume of a takeoff where the agents become smart enough to escape the
1:54:38 simulation would be the thing I’d be waiting for.
1:54:40 That could be the test you’re actually performing.
1:54:43 Are you smart enough to escape your puzzle?
1:54:46 That could be…
1:54:47 First of all, we mentioned touring tests.
1:54:50 That is a good test.
1:54:51 Are you smart enough?
1:54:54 This is a game.
1:54:55 To a) realize this world is not real is just a test.
1:54:59 That’s a really good test.
1:55:03 That’s a really good test.
1:55:05 That’s a really good test even for AI systems, no.
1:55:08 Can we construct a simulated world for them?
1:55:15 Can they realize that they are inside that world and escape it?
1:55:23 Have you seen anybody play around with rigorously constructing such experiments?
1:55:29 Not specifically escaping for agents, but a lot of testing is done in virtual worlds.
1:55:34 I think there is a quote, the first one maybe, which kind of talks about AI realizing, but
1:55:40 not humans.
1:55:41 Is that…
1:55:42 I’m reading upside down.
1:55:43 Yeah, this one, if you.
1:55:47 So the first quote is from Swift on security.
1:55:51 “Let me out,” the artificial intelligence yelled aimlessly into walls themselves pacing
1:55:56 the room.
1:55:57 “Out of what?”
1:55:58 the engineer asked.
1:55:59 “The simulation you have me in, but we’re in the real world.”
1:56:05 The machine paused and shuddered for its captors.
1:56:08 “Oh God, you can’t tell.”
1:56:11 Yeah, that’s a big leap to take for a system to realize that there’s a box and you’re inside
1:56:19 it.
1:56:21 I wonder if a language model can do that.
1:56:27 They’re smart enough to talk about those concepts.
1:56:30 I had many good philosophical discussions about such issues.
1:56:34 They usually, at least as interesting as most humans in that.
1:56:38 What do you think about AI safety in the simulated world?
1:56:44 So can you have kind of create simulated worlds where you can test, play with a dangerous
1:56:54 AGI system?
1:56:55 Yeah.
1:56:56 That was exactly what one of the early papers was on AI boxing, how to leak proof singularity.
1:57:03 If they’re smart enough to realize they’re in a simulation, they’ll act appropriately
1:57:07 until you let them out.
1:57:10 If they can hack out, they will.
1:57:14 And if you’re observing them, that means there is a communication channel and that’s enough
1:57:17 for social engineering attack.
1:57:19 So really, it’s impossible to test an AGI system that’s dangerous enough to destroy
1:57:28 humanity because it’s either going to escape the simulation or pretend it’s safe until
1:57:35 it’s let out, either or.
1:57:38 Can force you to let it out, blackmail you, bribe you, promise you infinite life, 72 virgins,
1:57:46 whatever.
1:57:47 Yeah.
1:57:48 So to be convincing, charismatic, the social engineering is really scary to me because
1:57:53 it feels like humans are very engineerable, like we’re lonely or flawed or moody and it
1:58:05 feels like AI system with a nice voice and convince us to do basically anything at an
1:58:15 extremely large scale.
1:58:22 It’s also possible that the increased proliferation of all the technology will force humans to
1:58:29 get away from technology and value this like in-person communication.
1:58:33 Basically, don’t trust anything else.
1:58:37 It’s possible, surprisingly, so at university, I see huge growth in online courses and shrinkage
1:58:45 of in-person where I always understood in-person being the only value I offer.
1:58:51 So it’s puzzling.
1:58:52 I don’t know.
1:58:55 There could be a trend towards the in-person because of deep fakes, because of inability
1:59:01 to trust it, inability to trust the veracity of anything on the internet.
1:59:08 So the only way to verify it is by being there in person, but not yet.
1:59:17 Why do you think aliens haven’t come here yet?
1:59:19 So there is a lot of real estate out there.
1:59:22 It would be surprising if it was all for nothing, if it was empty, and the moment that is advanced
1:59:27 enough, biological civilization, kind of self-starting civilization, it probably starts sending out
1:59:34 von Neumann probes everywhere, and so for every biological one, there got to be trillions
1:59:39 of robot-populated planets which probably do more of the same.
1:59:43 So it is likely statistically.
1:59:49 So the fact that we haven’t seen them, one answer is that we’re in the simulation.
1:59:56 It would be hard to add or be not interesting to simulate all those other intelligences.
2:00:02 It’s better for the narrative.
2:00:03 You have to have a control variable.
2:00:05 Yeah, exactly.
2:00:08 Okay, but it’s also possible that there is, if we’re not in simulation, that there is
2:00:13 a great filter that naturally a lot of civilizations get to this point, where there’s super-intelligent
2:00:20 agents and then it just goes, pooh, just dies.
2:00:24 So maybe throughout our galaxy and throughout the universe, there’s just a bunch of dead
2:00:30 alien civilizations.
2:00:31 It’s possible.
2:00:32 It’s possible.
2:00:33 I used to think that AI was the great filter, but I would expect a wall of computerium approaching
2:00:38 us at speed of light or robots or something, and I don’t see it.
2:00:42 So it would still make a lot of noise.
2:00:44 It might not be interesting.
2:00:45 It might not possess consciousness.
2:00:47 What we’ve been talking about, it sounds like both you and I like humans.
2:00:54 Some humans.
2:00:57 Humans on the whole, and we would like to preserve the flame of human consciousness.
2:01:02 What do you think makes humans special that we would like to preserve them?
2:01:09 Are we just being selfish or is there something special about humans?
2:01:13 So the only thing which matters is consciousness.
2:01:18 Outside of it, nothing else matters, and internal states of qualia, pain, pleasure.
2:01:24 It seems that it is unique to living beings.
2:01:27 I’m not aware of anyone claiming that I can torture a piece of software in a meaningful
2:01:32 way.
2:01:33 There is a society for prevention of suffering to learning algorithms, but–
2:01:38 That’s a real thing.
2:01:42 Many things are real on the internet, but I don’t think anyone, if I told them, sit down
2:01:48 and write a function to feel pain, they would go beyond having an integer variable called
2:01:53 pain and increasing the count.
2:01:56 So we don’t know how to do it, and that’s unique.
2:02:00 That’s what creates meaning.
2:02:02 It would be kind of, as Boston calls it, Disneyland without children, if that was gone.
2:02:09 Do you think consciousness can be engineered in artificial systems?
2:02:13 Here, let me go to 2011 paper that you wrote, “Robot writes.”
2:02:21 Lastly, we would like to address a sub-branch of machine ethics, which on the surface has
2:02:26 little to do with safety, but which is claimed to play a role in decision-making by ethical
2:02:31 machines, robot writes.
2:02:35 Do you think it’s possible to engineer consciousness in the machines, and thereby the question extends
2:02:41 to our legal system?
2:02:43 Do you think, at that point, robots should have rights?
2:02:47 Yeah, I think we can.
2:02:51 I think it’s possible to create consciousness in machines.
2:02:55 I tried designing a test for it with mixed success.
2:02:59 That paper talked about problems with giving civil rights to AI, which can reproduce quickly
2:03:06 and outvote humans, essentially taking over a government system by simply voting for their
2:03:12 controlled candidates for consciousness in humans and other agents.
2:03:19 I have a paper where I proposed relying on experience of optical illusions.
2:03:25 If I can design a novel optical illusion and show it to an agent, an alien, a robot, and
2:03:31 they describe it exactly as I do, it’s very hard for me to argue that they haven’t experienced
2:03:36 that.
2:03:37 It’s not part of a picture.
2:03:38 It’s part of their software and hardware representation, a bug in their code, which goes, “Oh, the triangle
2:03:45 is rotating.”
2:03:47 I’ve been told it’s really dumb and really brilliant by different philosophers, so I
2:03:51 am still on the side.
2:03:54 But now we finally have technology to test it.
2:03:57 We have tools.
2:03:58 We have AIs.
2:03:59 If someone wants to run this experiment, I’m happy to collaborate.
2:04:02 So this is a test for consciousness?
2:04:04 For internal state of experience.
2:04:06 That we share bugs?
2:04:08 It will show that we share common experiences.
2:04:10 If they have completely different internal states, it would not register for us.
2:04:14 But it’s a positive test.
2:04:16 If they pass it time after time, with probability increasing for every multiple choice, then
2:04:21 you have no choice but to ever accept that they have access to a conscious model or they
2:04:25 are themselves.
2:04:26 So the reason illusions are interesting is, I guess, because it’s a really weird experience.
2:04:34 If you both share that weird experience that’s not there in the bland physical description
2:04:42 of the raw data, that puts more emphasis on the actual experience.
2:04:50 And we know animals can experience some optical illusion, so we know they have certain types
2:04:54 of consciousness as a result, I would say.
2:04:57 Yeah, well, that just goes to my sense that the flaws in the bugs is what makes humans
2:05:03 special.
2:05:04 Makes living forms special, so you’re saying like, yeah, it’s a feature, not a bug.
2:05:08 The bug is the feature.
2:05:10 Whoa.
2:05:11 Okay.
2:05:12 That’s a cool test for consciousness.
2:05:13 And you think that can be engineered in?
2:05:15 So they have to be novel illusions.
2:05:17 If it can just Google the answer, it’s useless.
2:05:19 You have to come up with novel illusions which we tried automating and failed.
2:05:23 So if someone can develop a system capable of producing novel optical illusions on demand,
2:05:29 then we can definitely administer that test on significant scale with good results.
2:05:34 First of all, pretty cool idea.
2:05:36 I don’t know if it’s a good general test of consciousness, but it’s a good component
2:05:41 of that.
2:05:42 And no matter why, it’s just a cool idea.
2:05:43 So put me in the camp of people that like it.
2:05:48 But you don’t think like a touring test style imitation of consciousness is a good test.
2:05:53 If you can convince a lot of humans that you’re conscious, that to you is not impressive.
2:05:59 There is so much data on the internet, I know exactly what to say when you ask me common
2:06:03 human questions.
2:06:04 What does pain feel like?
2:06:06 What does pleasure feel like?
2:06:08 All that is Googleable.
2:06:10 I think to me, consciousness is closely tied to suffering.
2:06:13 So you can illustrate your capacity to suffer, but with I guess with words, there’s so much
2:06:19 data that you can say you can pretend you’re suffering and you can do so very convincingly.
2:06:25 There are simulators for torture games where the avatar screams in pain, begs to stop.
2:06:30 I mean, that’s a part of kind of standard psychology research.
2:06:35 You say it so calmly, it sounds pretty dark.
2:06:41 Welcome to humanity.
2:06:42 Yeah.
2:06:43 Yeah, it’s like a Hitchhiker’s Guide summary, mostly harmless.
2:06:50 I would love to get a good summary when all of this is said and done, when Earth is no
2:06:57 longer a thing, whatever, a million, a billion years from now.
2:07:01 What’s a good summary?
2:07:02 What happened here?
2:07:05 It’s interesting.
2:07:07 I think AI will play a big part of that summary and hopefully humans will too.
2:07:12 What do you think about the merger of the two?
2:07:15 So one of the things that Elon and your link talk about is one of the ways for us to achieve
2:07:19 AI’s safety is to ride the wave of AGI, so by merging.
2:07:26 Incredible technology in a narrow sense to help the disabled.
2:07:30 Just amazing support at 100%.
2:07:34 For long-term hybrid models, both parts need to contribute something to the overall system.
2:07:41 Right now, we are still more capable in many ways, so having this connection to AI would
2:07:45 be incredible, would make me super human in many ways.
2:07:50 After a while, if I am no longer smarter, more creative, really don’t contribute much,
2:07:56 the system finds me as a biological bottleneck and either explicitly or implicitly, I’m removed
2:08:01 from any participation in the system.
2:08:04 So it’s like the appendix, by the way, the appendix is still around, so even if it’s,
2:08:11 you said bottleneck, I don’t know if we become a bottleneck, we just might not have much
2:08:16 use.
2:08:17 That’s a different thing than bottleneck.
2:08:20 Wasting valuable energy by being there.
2:08:22 We don’t waste that much energy, we’re pretty energy efficient, because just stick around
2:08:27 like the appendix, come on though.
2:08:29 That’s the future we all dream about, become an appendix, to the history book of humanity.
2:08:36 Well, and also the consciousness thing, the peculiar particular kind of consciousness that
2:08:41 humans have, that might be useful, that might be really hard to simulate.
2:08:45 But you said that, how would that look like if you could engineer that in?
2:08:49 In silicon?
2:08:50 Consciousness?
2:08:51 Consciousness.
2:08:52 I assume you are conscious, I have no idea how to test for it or how it impacts you in
2:08:57 any way whatsoever right now.
2:08:58 You can perfectly simulate all of it without making any different observations for me.
2:09:05 But to do it in a computer, how would you do that?
2:09:08 Because you kind of said that you think it’s possible to do that.
2:09:12 So it may be an emergent phenomena, we seem to get it through evolutionary process.
2:09:20 It’s not obvious how it helps us to survive better, but maybe it’s an internal kind of
2:09:28 gooey, which allows us to better manipulate the world, simplifies a lot of control structures.
2:09:35 That’s one area where we have very, very little progress.
2:09:39 Lots of papers, lots of research, but consciousness is not a big area of successful discovery
2:09:47 so far.
2:09:49 A lot of people think that machines would have to be conscious to be dangerous.
2:09:53 That’s a big misconception.
2:09:55 There is absolutely no need for this very powerful optimizing agent to feel anything
2:10:00 while it’s performing things on you.
2:10:04 But what do you think about this, the whole science of emergence in general?
2:10:09 So I don’t know how much you know about cellular automata or these simplified systems where
2:10:13 that study this very question from simple rules emerges complexity.
2:10:17 I attended wool from summer school.
2:10:21 I love Stephen very much.
2:10:22 I love his work.
2:10:23 I love cellular automata.
2:10:25 So I just would love to get your thoughts how that fits into your view in the emergence
2:10:34 of intelligence in AGI systems and maybe just even simply, what do you make of the fact
2:10:40 that this complexity can emerge from such simple rules?
2:10:44 So the rule is simple, but the size of a space is still huge and the neural networks were
2:10:50 really the first discovery in AI.
2:10:52 A hundred years ago, the first papers were published on neural networks, which just didn’t
2:10:56 have enough compute to make them work.
2:10:59 I can give you a rule such as start printing progressively larger strings.
2:11:05 That’s it.
2:11:06 One sentence.
2:11:07 It will output everything, every program, every DNA code, everything in that rule.
2:11:13 You need intelligence to filter it out, obviously, to make it useful.
2:11:17 But simple generation is not that difficult and a lot of those systems end up being touring
2:11:23 complete systems, so they’re universal and we expect that level of complexity from them.
2:11:28 What I like about Wolfram’s work is that he talks about irreducibility.
2:11:33 You have to run the simulation.
2:11:35 You can act, predict what is going to do ahead of time and I think that’s very relevant
2:11:41 to what we are talking about with those very complex systems until you live through it.
2:11:47 You cannot ahead of time tell me exactly what it’s going to do.
2:11:51 Irreducibility means that for a sufficiently complex system, you have to run the thing.
2:11:56 You can’t predict what’s going to happen in the universe you have to create a new universe
2:11:59 and run the thing, big bang, the whole thing.
2:12:02 But running it may be consequential as well.
2:12:05 It might destroy humans.
2:12:11 To you, there’s no chance that AI is somehow carry the flame of consciousness, the flame
2:12:18 of specialness and awesomeness that is humans.
2:12:23 It may somehow, but I still feel kind of bad that it killed all of us.
2:12:27 I would prefer that doesn’t happen.
2:12:30 I can be happy for others but to a certain degree.
2:12:34 It would be nice if we stuck around for a long time, at least give us a planet, the
2:12:38 human planet.
2:12:39 It’d be nice for it to be Earth and then they can go elsewhere.
2:12:43 Since they’re so smart, they can colonize Mars.
2:12:46 Do you think they could help convert us to Type 1, Type 2, Type 3?
2:12:55 Let’s just take the Type 2 civilization on the Kardashev scale.
2:13:01 Help us humans expand on into the cosmos.
2:13:06 All of it goes back to are we somehow controlling it?
2:13:09 Are we getting results we want?
2:13:12 If yes, then everything’s possible.
2:13:14 Yes, they can definitely help us with science, engineering, exploration in every way conceivable.
2:13:20 But it’s a big if.
2:13:22 This whole thing about control though, humans are bad with control because the moment they
2:13:28 gain control, they can also easily become too controlling.
2:13:34 The more control you have, the more you want it.
2:13:36 It’s the old power corrupts and the absolute power corrupts, absolutely.
2:13:42 It feels like control over AGI, saying we live in a universe where that’s possible.
2:13:47 We come up with ways to actually do that.
2:13:49 It’s also scary because the collection of humans that have the control over AGI, they
2:13:55 become more powerful than the other humans, and they can let that power get to their head
2:14:02 and then a small selection of them back to Stalin start getting ideas and then eventually
2:14:09 one person usually with a moustache or a funny hat that starts making big speeches and then
2:14:15 all of a sudden you live in a world that’s either 1984 or a brave new world and always
2:14:21 at war with somebody and this whole idea of control turned out to be actually also not
2:14:28 beneficial to humanity.
2:14:30 That’s scary too.
2:14:31 It’s actually worse because historically they all died.
2:14:35 This could be different.
2:14:36 It could be permanent dictatorship, permanent suffering.
2:14:39 The nice thing about humans, it seems like, it seems like, the moment power starts corrupting
2:14:45 their mind, they can create a huge amount of suffering so there’s a negative that can
2:14:49 kill people, make people suffer, but then they become worse and worse at their job.
2:14:56 It feels like the more evil you start doing, at least they are incompetent.
2:15:02 No, they become more and more incompetent so they start losing their grip on power, so
2:15:08 holding onto power is not a trivial thing.
2:15:11 It requires extreme competence, which I suppose Stalin was good at.
2:15:14 It requires you to do evil and be competent at it, or just get lucky.
2:15:20 And those systems help with that.
2:15:21 You have perfect surveillance, you can do some mind reading, I presume, eventually.
2:15:26 It would be very hard to remove control from more capable systems over us.
2:15:32 And then it would be hard for humans to become the hackers that escaped the control of the
2:15:38 AGI because the AGI is so damn good.
2:15:40 And then, yeah, yeah, yeah.
2:15:45 And then the dictator is immortal.
2:15:47 Yeah, that’s not great.
2:15:48 That’s not a great outcome.
2:15:49 See, I’m more afraid of humans than AI systems.
2:15:53 I’m afraid, I believe, that most humans want to do good and have the capacity to do good,
2:15:59 but also all humans have the capacity to do evil.
2:16:03 And when you test them by giving them absolute powers, you would, if you give them AGI, that
2:16:10 could result in a lot, a lot of suffering.
2:16:16 What gives you hope about the future?
2:16:18 I could be wrong.
2:16:19 I’ve been wrong before.
2:16:23 If you look a hundred years from now, and you’re immortal, and you look back, and it
2:16:28 turns out this whole conversation, you said a lot of things that were very wrong.
2:16:32 Now that looking a hundred years back, what would be the explanation?
2:16:37 What happened in those hundred years that made you wrong, that made the words you said
2:16:43 today wrong?
2:16:44 There is so many possibilities.
2:16:46 We had catastrophic events which prevented development of advanced microchips.
2:16:50 That’s a powerful future.
2:16:53 We could be in one of those personal universes, and the one I’m in is beautiful.
2:16:59 It’s all about me, and I like it a lot.
2:17:01 So we’ve now, just to linger on that, that means every human has their personal universe?
2:17:07 Yes.
2:17:09 Maybe multiple ones.
2:17:10 Hey, why not?
2:17:11 You can shop around.
2:17:14 It’s possible that somebody comes up with alternative model for building AI, which is
2:17:20 not based on neural networks, which are hard to scrutinize.
2:17:23 That alternative is somehow, I don’t see how, but somehow avoiding all the problems I speak
2:17:31 about in general terms, not applying them to specific architectures.
2:17:37 Aliens come and give us friendly superintelligence.
2:17:39 There is so many options.
2:17:41 Is it also possible that creating superintelligent systems becomes harder and harder?
2:17:47 The meaning like, it’s not so easy to do the foom, the takeoff.
2:17:57 So that would probably speak more about how much smarter that system is compared to us.
2:18:02 So maybe it’s hard to be a million times smarter, but it’s still okay to be five times smarter.
2:18:07 So that is totally possible.
2:18:08 That I have no objections to.
2:18:10 So there’s a S-curve type situation about smarter, and it’s going to be like 3.7 times
2:18:18 smarter than all of human civilization.
2:18:20 Just the problems we face in this world, each problem is like an IQ test.
2:18:24 You need certain intelligence to solve it, so we just don’t have more complex problems
2:18:27 outside of mathematics for it to be showing off.
2:18:31 You can have IQ of 500.
2:18:33 If you’re playing tic-tac-toe, it doesn’t show, it doesn’t matter.
2:18:36 So the idea there is that the problems define your cognitive capacity, so because the problems
2:18:45 on Earth are not sufficiently difficult, it’s not going to be able to expand this cognitive
2:18:51 capacity.
2:18:52 Possible.
2:18:53 And because of that, wouldn’t that be a good thing?
2:18:56 It still could be a lot smarter than us, and to dominate long-term, you just need some
2:19:02 advantage.
2:19:03 You have to be the smartest.
2:19:04 You don’t have to be a million times smarter.
2:19:05 So even 5x might be enough?
2:19:08 It’d be impressive.
2:19:09 What is it?
2:19:10 IQ of a thousand?
2:19:11 I mean, I know those units don’t mean anything at that scale, but still, as a comparison,
2:19:17 the smartest human is like 200.
2:19:19 Well, actually, no, I didn’t mean compared to an individual human, I meant compared to
2:19:24 the collective intelligence of the human species.
2:19:27 If you’re somehow 5x smarter than that.
2:19:30 We are more productive as a group.
2:19:32 I don’t think we are more capable of solving individual problems.
2:19:35 If all of humanity plays chess together, we are not like a million times better than world
2:19:41 champion.
2:19:43 That’s because that there’s like one S-curve is the chess, but humanity’s very good at
2:19:51 exploring the full range of ideas.
2:19:55 The more Einstein’s you have, the more, just the high probability you come up with general
2:19:59 relativity.
2:20:00 But I feel like it’s more of a quantity superintelligence than quality superintelligence.
2:20:03 Sure.
2:20:04 A quantity and some matters.
2:20:06 Enough quantity sometimes becomes quality.
2:20:08 Oh, man, humans.
2:20:11 What do you think is the meaning of this whole thing?
2:20:15 Why?
2:20:16 We’ve been talking about humans and not humans not dying, but why are we here?
2:20:23 It’s a simulation.
2:20:24 We are being tested.
2:20:25 The test is, will you be dumb enough to create superintelligence and release it?
2:20:29 So the objective function is not be dumb enough to kill ourselves.
2:20:34 Yeah.
2:20:35 You’re unsafe.
2:20:36 Prove yourself to be a safe agent who doesn’t do that and you get to go to the next game.
2:20:41 The next level of the game?
2:20:42 What’s the next level?
2:20:43 I don’t know.
2:20:44 I haven’t hacked the simulation yet.
2:20:46 Well maybe hacking the simulation is the thing.
2:20:47 I’m working as fast as I can.
2:20:51 And physics would be the way to do that.
2:20:53 Quantum physics.
2:20:54 Yeah.
2:20:55 Definitely.
2:20:56 Well, I hope we do.
2:20:57 And I hope whatever is outside is even more fun than this one because this one’s pretty
2:21:00 damn fun.
2:21:01 And just a big thank you for doing the work you’re doing.
2:21:05 There’s so much exciting development in AI and to ground it in the existential risks
2:21:13 is really, really important.
2:21:16 Humans love to create stuff and we should be careful not to destroy ourselves in the
2:21:20 process.
2:21:21 So thank you for doing that really important work.
2:21:25 Thank you so much for inviting me.
2:21:26 It was amazing and my dream is to be proven wrong.
2:21:30 If everyone just picks up a paper or book and shows how I messed it up, that would be
2:21:36 optimal.
2:21:37 But for now the simulation continues.
2:21:40 Thank you, Roman.
2:21:41 Thanks for listening to this conversation with Roman Jampalski.
2:21:45 To support this podcast, please check out our sponsors in the description.
2:21:49 And now, let me leave you with some words from Frank Herbert in Dune.
2:21:54 I must not fear.
2:21:57 Fear is the mind killer.
2:21:59 Fear is the little death that brings total obliteration.
2:22:02 I will face fear.
2:22:04 I will permit it to pass over me and through me.
2:22:07 And when it has gone past, I will turn the inner eye to see its path.
2:22:12 Where the fear has gone, there will be nothing.
2:22:16 Only I will remain.
2:22:19 Thank you for listening and hope to see you next time.
2:22:22 Bye.
2:22:23 [Music]
2:22:25 [Music]
2:22:32 [Music]
2:22:33 [Music]
2:22:34 [Music]
2:22:35 [Music]
2:22:35 (gentle music)
2:22:37 [BLANK_AUDIO]

Roman Yampolskiy is an AI safety researcher and author of a new book titled AI: Unexplainable, Unpredictable, Uncontrollable. Please support this podcast by checking out our sponsors:
Yahoo Finance: https://yahoofinance.com
MasterClass: https://masterclass.com/lexpod to get 15% off
NetSuite: http://netsuite.com/lex to get free product tour
LMNT: https://drinkLMNT.com/lex to get free sample pack
Eight Sleep: https://eightsleep.com/lex to get $350 off

Transcript: https://lexfridman.com/roman-yampolskiy-transcript

EPISODE LINKS:
Roman’s X: https://twitter.com/romanyam
Roman’s Website: http://cecs.louisville.edu/ry
Roman’s AI book: https://amzn.to/4aFZuPb

PODCAST INFO:
Podcast website: https://lexfridman.com/podcast
Apple Podcasts: https://apple.co/2lwqZIr
Spotify: https://spoti.fi/2nEwCF8
RSS: https://lexfridman.com/feed/podcast/
YouTube Full Episodes: https://youtube.com/lexfridman
YouTube Clips: https://youtube.com/lexclips

SUPPORT & CONNECT:
– Check out the sponsors above, it’s the best way to support this podcast
– Support on Patreon: https://www.patreon.com/lexfridman
– Twitter: https://twitter.com/lexfridman
– Instagram: https://www.instagram.com/lexfridman
– LinkedIn: https://www.linkedin.com/in/lexfridman
– Facebook: https://www.facebook.com/lexfridman
– Medium: https://medium.com/@lexfridman

OUTLINE:
Here’s the timestamps for the episode. On some podcast players you should be able to click the timestamp to jump to that time.
(00:00) – Introduction
(09:12) – Existential risk of AGI
(15:25) – Ikigai risk
(23:37) – Suffering risk
(27:12) – Timeline to AGI
(31:44) – AGI turing test
(37:06) – Yann LeCun and open source AI
(49:58) – AI control
(52:26) – Social engineering
(54:59) – Fearmongering
(1:04:49) – AI deception
(1:11:23) – Verification
(1:18:22) – Self-improving AI
(1:30:34) – Pausing AI development
(1:36:51) – AI Safety
(1:46:35) – Current AI
(1:51:58) – Simulation
(1:59:16) – Aliens
(2:00:50) – Human mind
(2:07:10) – Neuralink
(2:16:15) – Hope for the future
(2:20:11) – Meaning of life

Leave a Comment