Emmett Shear on Building AI That Actually Cares: Beyond Control and Steering

AI transcript

🕒

Việt

中文

0:00:03 Most of AI is focused on alignment as steering.
0:00:05 That’s the flight word.
0:00:07 If you think that we’re making our beings,
0:00:08 you’d also call this slavery.
0:00:11 Someone who you steer, who doesn’t get to steer you back,
0:00:13 who non-optionally receives your steering,
0:00:13 that’s called a slave.
0:00:15 It’s also called a tool if it’s not a being.
0:00:17 So if it’s a machine, it’s a tool.
0:00:18 And if it’s a being, it’s a slave.
0:00:20 Like we’ve made this mistake enough times at this point.
0:00:22 I would like us to not make it again.
0:00:23 You know, they’re kind of like people,
0:00:25 but they’re not like people.
0:00:26 Like they do the same thing people do.
0:00:27 They speak our language.
0:00:29 They can like take on the same kind of tasks.
0:00:30 Well, like they don’t count.
0:00:32 They’re not real moral agents.
0:00:34 A tool that you can’t control, bad.
0:00:35 A tool that you can control, bad.
0:00:37 A being that isn’t aligned, bad.
0:00:40 The only good outcome is a being that is,
0:00:42 that cares, that actually cares about us.
0:00:44 I’ve been thinking about a line
0:00:46 that keeps showing up in AI safety discussions.
0:00:48 And it stopped me cold when I first read it.
0:00:50 We need to build a line AI.
0:00:52 Sounds reasonable, right?
0:00:54 Except aligned to what?
0:00:55 Aligned to whom?
0:00:58 The phrase gets thrown around like it has an obvious answer.
0:00:59 But the more you sit on it,
0:01:02 the more you realize you’re smuggling in a massive assumption.
0:01:04 We’re assuming there’s some fixed point,
0:01:06 some stable target we can aim at,
0:01:08 hit once and be done.
0:01:09 But here’s what’s interesting.
0:01:12 That’s not how alignment works anywhere else in life.
0:01:13 Think about families.
0:01:14 Think about teams.
0:01:16 Think about your own world development.
0:01:19 You don’t achieve alignment and the coast.
0:01:21 You’re constantly renegotiating,
0:01:22 constantly learning,
0:01:24 constantly discovering that what you thought was right
0:01:26 turns out to be more complicated.
0:01:28 Alignment isn’t a destination.
0:01:29 It’s a process.
0:01:31 It’s something you do,
0:01:32 not something you have.
0:01:34 And this matters because we’re at this inflection point
0:01:36 where the AI systems we’re building
0:01:38 are starting to look less like tools
0:01:39 and more like something else.
0:01:41 They speak our language.
0:01:42 They reason through problems.
0:01:45 They can take on tasks that used to require human judgment.
0:01:47 And the question everyone’s asking is,
0:01:48 how do we control them?
0:01:50 How do we steer them?
0:01:52 How do we make sure they do what we want?
0:01:53 But there’s another way to see it.
0:01:57 What if the control paradigm is the wrong framework entirely?
0:01:59 What if trying to build a super intelligent tool
0:02:02 you can perfectly steer is not just difficult,
0:02:03 but fundamentally dangerous,
0:02:05 whether you succeed or fail?
0:02:08 If you can’t control it, obviously that’s bad.
0:02:10 But if you can’t control it perfectly,
0:02:11 you’ve just handed godlike power
0:02:13 to who’s ever holding the steering wheel.
0:02:15 And humans, even well-meaning ones,
0:02:18 don’t have the wisdom to wield that kind of power safely.
0:02:20 So what’s the alternative?
0:02:22 Well, think about how we actually solve
0:02:24 alignment problems in the real world.
0:02:25 We don’t control other people.
0:02:26 We don’t steer them.
0:02:27 We raise them.
0:02:28 We teach them to care.
0:02:31 We build relationships where they do right by us,
0:02:32 not because we’re forcing them,
0:02:35 but because they learn to value the relationship itself.
0:02:36 That’s organic alignment.
0:02:39 Alignment that emerges from genuine care,
0:02:40 from theory of mind,
0:02:42 from being part of something larger than yourself.
0:02:45 Emmett Scheer has spent the last year and a half
0:02:47 working on exactly this problem at Softmax.
0:02:49 And what makes his approach distinctive
0:02:51 is that he’s not trying to solve alignment
0:02:53 by building better control mechanisms.
0:02:55 He’s trying to solve it by building AI systems
0:02:56 that can learn to care,
0:02:59 that can develop the kind of theory of mind
0:03:00 that lets them be good teammates,
0:03:01 good collaborators,
0:03:02 good citizens.
0:03:04 Not tools that follow orders,
0:03:06 but beings that understand
0:03:07 what it means to be part of a community.
0:03:10 That can raise some uncomfortable questions.
0:03:12 What if we’re building beings and not tools?
0:03:14 What does that mean for how we treat them?
0:03:15 What does it mean for their rights?
0:03:17 And how do you even know if they succeeded?
0:03:20 How do you measure whether something genuinely cares
0:03:22 versus just simulating care really well?
0:03:23 Today,
0:03:25 Seb Krier from Google DeepMind and I
0:03:26 are sitting down with Emmett
0:03:27 to explore those questions.
0:03:31 Seb leads AGI policy development at DeepMind,
0:03:32 so he brings a perspective
0:03:33 from inside one of the labs
0:03:34 actually building these systems.
0:03:36 But really,
0:03:37 we’re investigating something deeper.
0:03:39 What does it actually take
0:03:40 to build AI systems
0:03:41 that can participate
0:03:41 in the ongoing,
0:03:43 never-finished process
0:03:44 of figuring out how to live together?
0:03:46 By the end,
0:03:46 you’ll understand
0:03:48 not just Softmax’s technical approach,
0:03:50 but a completely different way
0:03:51 of thinking about what alignment is
0:03:52 and what it could become.
0:03:53 Emmett Shear,
0:03:54 welcome to the podcast.
0:03:59 Emmett, Seb,
0:04:00 welcome to the podcast.
0:04:00 Thanks for joining.
0:04:01 Thank you for having me.
0:04:03 So, Emmett,
0:04:03 with Softmax,
0:04:04 you’re focused on alignment
0:04:06 and making AIs organically align
0:04:07 with people.
0:04:09 Can you explain what that means
0:04:10 and how you’re trying to do that?
0:04:12 When people think about alignment,
0:04:13 I think there’s a lot of confusion.
0:04:15 People talk about things being aligned.
0:04:16 We need to build an aligned AI.
0:04:18 And the problem with that
0:04:19 is when someone says that,
0:04:19 it’s like,
0:04:20 we need to go on a trip.
0:04:21 And I’m like,
0:04:22 okay, I do like trips,
0:04:23 but like,
0:04:24 where are we going again?
0:04:25 And with alignment,
0:04:26 alignment takes an argument.
0:04:27 Alignment requires you
0:04:28 to align to something.
0:04:29 You can’t just be aligned.
0:04:30 It takes you to be aligned
0:04:30 to yourself.
0:04:31 But even then,
0:04:32 you kind of want to tell them
0:04:33 what I’m aligning to as myself.
0:04:34 And so,
0:04:36 this idea of an abstractly aligned AI,
0:04:37 I think,
0:04:38 slips a lot of assumptions
0:04:39 past people
0:04:41 because it sort of assumes
0:04:42 that there is one
0:04:44 obvious thing to align to.
0:04:45 I find this is usually
0:04:46 the goals of the people
0:04:47 who are making the AI.
0:04:49 That’s what they mean
0:04:49 when they say
0:04:50 I want to make a line.
0:04:50 I want to make an AI
0:04:52 that does what I want it to do.
0:04:53 That’s what they normally mean.
0:04:54 And that’s a pretty normal
0:04:55 and natural thing
0:04:56 to mean by alignment.
0:04:57 I’m not sure
0:04:58 that that’s what I would
0:04:59 regard as like a public good.
0:05:00 Right?
0:05:00 Like,
0:05:01 I guess it depends on who it is.
0:05:01 If it was like
0:05:03 Jesus or the Buddha
0:05:03 was like,
0:05:05 I am making an aligned AI.
0:05:05 I’d be like,
0:05:05 okay, yeah,
0:05:06 aligned to you.
0:05:06 Great.
0:05:07 I’m down.
0:05:08 Sounds good.
0:05:09 Sign me up.
0:05:10 But most of us,
0:05:11 myself included,
0:05:12 I wouldn’t describe
0:05:13 as being at that level
0:05:14 of spiritual development
0:05:16 and therefore perhaps
0:05:17 want to think a little more carefully
0:05:18 about what we’re aligning it to.
0:05:20 And so when we talk
0:05:21 about organic alignment,
0:05:23 I think the important thing
0:05:24 to recognize
0:05:25 is that alignment
0:05:26 is not a thing.
0:05:28 It’s not a state.
0:05:29 It’s a process.
0:05:31 This is one of those things
0:05:32 that’s broadly true
0:05:33 of almost everything, right?
0:05:34 Is a rock a thing?
0:05:35 I mean,
0:05:36 there’s a view of a rock
0:05:36 as a thing,
0:05:38 but if you actually zoom in
0:05:39 on a rock really carefully,
0:05:39 a rock is a process.
0:05:41 It’s this endless oscillation
0:05:43 between the atoms
0:05:44 over and over and over again,
0:05:46 reconstructing rock
0:05:46 over and over again.
0:05:47 Now, the rock’s
0:05:48 a really simple process
0:05:49 that you can kind of like
0:05:50 coarse-grain
0:05:51 very meaningfully
0:05:51 into being a thing.
0:05:53 But alignment is not
0:05:54 like a rock.
0:05:55 Alignment is a complex process.
0:05:57 And organic alignment
0:05:59 is the idea
0:06:00 of treating alignment
0:06:02 as an ongoing
0:06:04 sort of living process
0:06:05 that has to constantly
0:06:05 rebuild itself.
0:06:07 And so you can think of
0:06:07 the way that
0:06:08 how do people
0:06:09 and families
0:06:10 stay aligned
0:06:11 to each other,
0:06:12 stay aligned to a family?
0:06:14 And the way they do that
0:06:14 is you don’t like
0:06:16 arrive at being aligned.
0:06:17 You’re constantly
0:06:20 re-knitting
0:06:21 the fabric
0:06:22 that keeps the family going.
0:06:23 And in some sense,
0:06:24 the family
0:06:25 is the pattern
0:06:26 of re-knitting
0:06:26 that happens.
0:06:28 And if you stop doing it,
0:06:28 it goes away.
0:06:29 And this is similar
0:06:30 for things like
0:06:31 cells in your body,
0:06:31 right?
0:06:32 Like,
0:06:32 there isn’t like
0:06:33 your cells
0:06:34 aligned to being you
0:06:35 and they’re done.
0:06:37 it’s this constant
0:06:38 ever-running process
0:06:39 of cells
0:06:41 deciding what should I do?
0:06:42 What should I be?
0:06:43 Do I need to be a new job?
0:06:44 Should we be making
0:06:45 more red blood cells?
0:06:45 Should we be making
0:06:45 fewer of them?
0:06:47 You aren’t a fixed point,
0:06:49 so there is no fixed alignment.
0:06:50 And it turns out
0:06:51 that our society
0:06:51 is like that.
0:06:52 When people talk about alignment,
0:06:53 what they’re really
0:06:53 talking about,
0:06:54 I think,
0:06:55 is I want an AI
0:06:57 that is morally good.
0:06:58 Right?
0:06:59 That’s what they really mean.
0:07:00 It’s like,
0:07:00 this will act
0:07:01 is a morally good being
0:07:03 and acting
0:07:05 as a morally good being
0:07:06 is a process
0:07:07 and not a destination.
0:07:08 Unfortunately,
0:07:09 we’ve tried
0:07:10 taking down tablets
0:07:11 from on high
0:07:11 that tell you
0:07:12 how to be
0:07:12 a morally good being
0:07:14 and we use those
0:07:15 and they’re maybe helpful
0:07:16 but somehow
0:07:17 they are not being,
0:07:18 like,
0:07:18 you can read those
0:07:19 and try to follow those rules
0:07:21 and still make lots of mistakes.
0:07:21 And so,
0:07:22 I’m not going to claim
0:07:23 I know exactly
0:07:24 what morality is
0:07:24 but morality is
0:07:26 very obviously
0:07:28 an ongoing learning process
0:07:28 and something
0:07:29 where we make
0:07:30 moral discoveries.
0:07:31 Like,
0:07:32 historically,
0:07:32 people thought
0:07:33 that slavery was okay
0:07:34 and then they thought
0:07:34 it wasn’t
0:07:35 and I think
0:07:36 you can very meaningfully
0:07:37 say that we made
0:07:37 moral progress,
0:07:39 we made a moral discovery
0:07:40 by realizing
0:07:41 that’s not good.
0:07:42 And if you think
0:07:43 that there’s such a thing
0:07:44 as moral progress
0:07:45 or even just
0:07:46 learning how better
0:07:48 to pursue the moral goods
0:07:48 we already know,
0:07:50 then
0:07:53 you have to believe
0:07:55 that alignment,
0:07:57 aligning to morality,
0:07:59 being a moral being
0:08:00 is a process
0:08:01 of constant learning
0:08:03 and of growth
0:08:04 to re-infer
0:08:05 what should I do
0:08:07 from experience.
0:08:08 And
0:08:09 the fact that
0:08:10 no one has any idea
0:08:11 how to do that
0:08:12 should not
0:08:13 dissuade us
0:08:14 from trying
0:08:14 because
0:08:16 that’s what humans do.
0:08:17 Like,
0:08:18 it’s really obvious
0:08:19 that we do this,
0:08:19 right,
0:08:20 somehow,
0:08:21 just like we used to
0:08:21 not know how people
0:08:22 humans walked
0:08:23 or saw,
0:08:24 somehow,
0:08:25 we have experiences
0:08:26 where we’re acting
0:08:27 in a certain way
0:08:29 and then we have
0:08:29 this realization,
0:08:31 I’ve been a dick,
0:08:33 that was bad,
0:08:35 I thought I was doing good,
0:08:36 but in retrospect,
0:08:37 I was doing wrong.
0:08:39 And it’s not like random,
0:08:40 like people have the same,
0:08:40 actually,
0:08:41 there’s like a bunch
0:08:41 of classic patterns
0:08:42 of people having
0:08:43 that realization,
0:08:44 it’s like a thing
0:08:44 that happens over
0:08:45 and over again,
0:08:46 so it’s not random,
0:08:47 it’s like a predictable
0:08:48 series of events
0:08:49 that look a lot
0:08:50 like learning
0:08:51 where you change
0:08:52 your behavior
0:08:52 and often
0:08:53 the impact
0:08:54 of your behavior
0:08:54 in the future
0:08:56 is more pro-social
0:08:57 and that you are
0:08:57 better off
0:08:58 for doing it
0:08:58 and like,
0:08:59 so I’m taking
0:08:59 a very strong
0:09:00 moral realist position,
0:09:01 there is such a thing
0:09:01 as morality,
0:09:03 we really do learn it,
0:09:03 it really does matter
0:09:06 and organic alignment
0:09:06 and that it’s not
0:09:07 something you finish,
0:09:08 in fact,
0:09:09 one of the key moral mistakes
0:09:10 is this belief,
0:09:11 I know morality,
0:09:12 I know what’s right,
0:09:13 I know what’s wrong,
0:09:15 I don’t need to learn anything,
0:09:15 no one has anything
0:09:16 to teach me about morality,
0:09:17 that’s arrogance
0:09:18 and that’s one of the
0:09:19 main moral things
0:09:19 you can do
0:09:20 that’s dangerous,
0:09:22 and so when we talk
0:09:23 about organic alignment,
0:09:24 organic alignment
0:09:26 is an aligning an AI
0:09:28 that is capable
0:09:30 of doing the thing
0:09:31 that humans can do
0:09:32 and to some degree
0:09:33 like I think animals
0:09:34 can do it at some level
0:09:34 although humans
0:09:35 are much better at it
0:09:38 of the learning
0:09:39 of how to be
0:09:40 a good family member,
0:09:41 a good teammate,
0:09:42 a good member
0:09:43 of society,
0:09:44 a good member
0:09:45 of all sentient beings
0:09:45 I guess,
0:09:46 how to be a part
0:09:47 of something bigger
0:09:47 than yourself
0:09:48 in a way that is
0:09:49 healthy for the whole
0:09:50 rather than unhealthy
0:09:52 and Softmax
0:09:53 is dedicated
0:09:53 to researching this
0:09:54 and I think we’ve made
0:09:55 some really interesting progress
0:09:57 but like the main message,
0:09:57 you know,
0:09:58 I go on podcasts
0:09:59 like this to spread,
0:10:00 the main thing
0:10:01 that I hope Softmax
0:10:02 accomplishes above
0:10:03 and beyond anything else
0:10:06 is like to focus people
0:10:07 on this as the question.
0:10:08 This is the thing
0:10:09 you have to figure out
0:10:10 if you can’t figure out
0:10:13 how to raise a child
0:10:14 who cares about
0:10:14 the people around them,
0:10:16 if you have a child
0:10:17 that only follows
0:10:18 the rules,
0:10:20 that’s not a moral person
0:10:21 that you’ve raised,
0:10:21 you’ve raised a dangerous
0:10:23 person actually
0:10:23 who will probably
0:10:24 do great harm
0:10:25 following the rules
0:10:26 and if you make an AI
0:10:27 that’s good at following
0:10:28 your chain of command
0:10:29 and good at following
0:10:30 your whatever rules
0:10:30 you came up with
0:10:32 for what morality is
0:10:33 and what good behavior is,
0:10:36 that’s also going
0:10:37 to be very dangerous
0:10:39 and so that is,
0:10:39 that’s what,
0:10:41 and so that we should,
0:10:42 that’s the bar,
0:10:43 that’s what we should
0:10:44 be working on
0:10:44 and that’s what everyone
0:10:45 should be committed
0:10:47 to like figuring out
0:10:49 and if someone beats us
0:10:49 to the punch,
0:10:49 great.
0:10:50 I mean,
0:10:51 I don’t think they will
0:10:52 because I’m like really bullish
0:10:53 on our approach
0:10:53 and I think the team’s amazing
0:10:55 but like this is,
0:10:56 it’s maybe,
0:10:57 it’s the first time
0:10:57 I’ve run a company
0:10:59 where truly I can say
0:11:00 with a whole heart
0:11:01 if someone beats us,
0:11:02 thank God,
0:11:04 like I hope somebody
0:11:05 figures it out.
0:11:06 Yeah.
0:11:08 Yeah,
0:11:08 I mean it’s,
0:11:10 yeah,
0:11:10 I have a lot of,
0:11:11 you know,
0:11:11 similar intuitions
0:11:12 about certain things
0:11:14 like I also dislike
0:11:14 the,
0:11:15 you know,
0:11:16 the idea that kind of,
0:11:16 you know,
0:11:18 we just need to like crack
0:11:19 the few kind of values
0:11:19 or something
0:11:21 and just cement them
0:11:22 in time forever now
0:11:22 and you know,
0:11:23 we’ve kind of solved morality
0:11:23 or something
0:11:24 and I’ve always kind of
0:11:25 been skeptical about,
0:11:25 you know,
0:11:27 how the alignment problem
0:11:28 has been conceptualized
0:11:29 as something to kind of
0:11:30 solve once and for all
0:11:30 and then you can just,
0:11:31 you know,
0:11:31 do AI or do HR
0:11:32 or do AGI.
0:11:34 But the,
0:11:35 I guess I understand it
0:11:37 in a slightly different way,
0:11:38 I guess maybe less
0:11:39 based on kind of
0:11:39 moral realism,
0:11:40 but,
0:11:41 you know,
0:11:41 there’s a kind of
0:11:41 the technical alignment
0:11:42 problem which I kind of
0:11:43 think of broadly
0:11:44 as how do you get
0:11:45 an AI to do what you,
0:11:46 you know,
0:11:47 how do you get it
0:11:48 to follow instructions
0:11:48 like,
0:11:48 you know,
0:11:49 broadly speaking.
0:11:51 And I think that was,
0:11:51 you know,
0:11:52 more of a challenge,
0:11:53 I think pre-LLMs,
0:11:53 I guess when people
0:11:54 were talking about
0:11:55 reinforcement learning
0:11:55 and looking at these
0:11:56 systems,
0:11:57 whereas post-LLMs,
0:11:58 we’ve realized that
0:11:59 many things that we thought
0:12:00 were going to be difficult
0:12:01 were somewhat easier.
0:12:02 And then there’s a kind
0:12:04 of second question,
0:12:04 the kind of normative
0:12:06 question of to whose values
0:12:06 and what are you aligning
0:12:07 this thing to,
0:12:08 which I think is the kind
0:12:09 of thing you’re commenting
0:12:10 on a bit.
0:12:12 And for this,
0:12:14 I tend to be very
0:12:14 skeptical of approaches
0:12:15 where,
0:12:15 you know,
0:12:16 you need to kind of crack
0:12:18 the kind of 10 commandments
0:12:19 of alignment or something
0:12:20 and then we’re good.
0:12:21 And here,
0:12:22 I think I have like intuitions
0:12:23 that are unsurprisingly
0:12:24 a bit more like
0:12:25 political science-based
0:12:26 or something in that,
0:12:26 like,
0:12:26 okay,
0:12:27 it is a process.
0:12:29 And I like the kind
0:12:30 of bottom-up approach
0:12:31 to some degree of,
0:12:31 well,
0:12:31 you know,
0:12:32 how do we do it
0:12:33 in real life with people?
0:12:33 Like,
0:12:34 no one comes up with,
0:12:34 you know,
0:12:35 I’ve got this.
0:12:36 And so you have like
0:12:37 processes that allow
0:12:38 like ideas to kind of,
0:12:38 you know,
0:12:39 clash.
0:12:39 You have good people
0:12:40 with different ideas,
0:12:40 opinions,
0:12:41 views,
0:12:41 and stuff to kind of
0:12:43 coexist as well as they
0:12:44 can within a wider system.
0:12:44 And like,
0:12:45 you know,
0:12:46 and with humans,
0:12:47 that system is liberal
0:12:48 democracy or something.
0:12:48 And,
0:12:49 you know,
0:12:50 at least in some countries.
0:12:51 And that allows more
0:12:52 of that kind of,
0:12:54 you know,
0:12:55 these kind of ideas,
0:12:55 these values to be
0:12:56 kind of discovered
0:12:57 and construed over time.
0:12:59 And I think,
0:12:59 you know,
0:13:00 for alignment as well,
0:13:01 I tend to think,
0:13:01 yeah,
0:13:02 there’s on the normative
0:13:02 side,
0:13:04 I agree with some
0:13:05 of your intuitions.
0:13:06 I’m less clear about
0:13:07 now what does it look
0:13:08 like now we’re going
0:13:09 to implement this
0:13:10 into an AI system.
0:13:10 These are the ones
0:13:11 we have today.
0:13:12 I agree that there’s
0:13:13 this idea of technical
0:13:14 alignment that I
0:13:15 I think I would
0:13:16 define a little differently,
0:13:17 but it’s sort of
0:13:18 the sense of like,
0:13:19 if you build a system,
0:13:21 can it be described
0:13:21 as being coherently
0:13:22 goal-following at all?
0:13:23 Regardless of what
0:13:24 those goals are,
0:13:25 like,
0:13:26 lots of systems
0:13:27 aren’t coherently,
0:13:28 they’re not well-described
0:13:29 as having goals.
0:13:31 They just kind of do stuff.
0:13:33 And if you’re going
0:13:33 to have something
0:13:34 that’s like a line,
0:13:35 it has to have
0:13:36 coherent goals,
0:13:37 otherwise those goals
0:13:38 can’t be aligned
0:13:39 with anyone else’s goals,
0:13:40 kind of by definition.
0:13:41 Is that sort of,
0:13:43 is that a fair assessment
0:13:44 of what you mean
0:13:45 by technical alignment?
0:13:45 I mean,
0:13:46 I’m not fully sure,
0:13:47 right?
0:13:47 Because I think
0:13:48 if I give a model
0:13:49 a certain goal,
0:13:51 then I would like
0:13:51 the model to kind of
0:13:53 follow that instruction
0:13:53 and kind of reach
0:13:54 that particular goal
0:13:56 rather than it having
0:13:57 a goal of its own
0:13:58 that, you know,
0:13:58 I can’t,
0:14:00 yeah.
0:14:00 Well,
0:14:01 if you give it a goal,
0:14:02 it has that goal.
0:14:03 Right.
0:14:05 That’s what it means
0:14:05 to give someone something,
0:14:06 right?
0:14:06 Sure, yeah.
0:14:08 if I instructed
0:14:08 to do X,
0:14:09 then I would like
0:14:09 it to do X
0:14:10 and not,
0:14:10 you know,
0:14:12 different variants
0:14:12 of X essentially.
0:14:13 I wouldn’t want it
0:14:14 to reward X,
0:14:14 I wouldn’t do some.
0:14:16 Well,
0:14:17 but when you tell it
0:14:18 to do X,
0:14:18 you’re transferring
0:14:20 like a series
0:14:21 of like a byte string
0:14:23 in a chat window
0:14:25 or like a series
0:14:26 of audio vibrations
0:14:27 in the air,
0:14:27 right?
0:14:28 You’re not transplanting
0:14:29 a goal from your mind
0:14:30 into it,
0:14:31 you’re giving it
0:14:31 an observation
0:14:32 that it’s using
0:14:33 to infer your goal.
0:14:35 Yeah,
0:14:35 I mean,
0:14:36 in some sense,
0:14:37 I can communicate
0:14:38 a series of instructions
0:14:39 and I wanted to infer
0:14:40 what I’m,
0:14:41 you know,
0:14:41 saying essentially
0:14:42 as accurately
0:14:43 as it can
0:14:44 given what it knows
0:14:44 of me
0:14:45 and what I’m asking.
0:14:47 You wanted to infer
0:14:47 what you meant,
0:14:48 right?
0:14:49 Like that’s,
0:14:49 like,
0:14:50 because in some sense
0:14:50 there’s no,
0:14:52 the byte sequence
0:14:53 that you send
0:14:54 over the wire to it
0:14:56 has no absolute meaning.
0:14:57 It has to be interpreted,
0:14:58 right?
0:14:59 Like that byte sequence
0:15:00 could mean something
0:15:01 very different
0:15:02 with a different code book.
0:15:03 Yeah,
0:15:03 well,
0:15:05 I guess by one way,
0:15:05 you know,
0:15:06 I think I remember
0:15:08 when I was first
0:15:09 getting into AI
0:15:09 and,
0:15:10 you know,
0:15:10 these kind of questions
0:15:12 maybe like a decade ago,
0:15:13 so you had these examples
0:15:14 of,
0:15:14 you know,
0:15:15 I think it was Stuart Russell
0:15:16 in a textbook,
0:15:17 we’ll give the AI
0:15:18 a goal,
0:15:19 but then it won’t exactly
0:15:19 do what you’re asking it,
0:15:20 right?
0:15:20 You know,
0:15:20 clean the room
0:15:21 and then it goes
0:15:22 and cleans the room
0:15:23 but takes the baby
0:15:23 and puts it in the trash.
0:15:24 Like,
0:15:24 this is not what I meant.
0:15:25 Like,
0:15:27 whereas I think with that.
0:15:27 But like,
0:15:28 wait,
0:15:28 hold on,
0:15:29 but this is the thing
0:15:29 where I think people,
0:15:30 this is the,
0:15:31 you have to,
0:15:32 you were jumping over a step there.
0:15:34 You didn’t give the AI a goal.
0:15:35 You gave the AI a description
0:15:35 of a goal.
0:15:37 A description of a thing
0:15:38 and a thing are not the same.
0:15:40 I can tell you an apple
0:15:41 and I’m evoking
0:15:43 the idea of an apple
0:15:44 but I haven’t given you an apple.
0:15:44 I’ve given you a,
0:15:45 you know,
0:15:45 it’s red,
0:15:46 it’s shiny,
0:15:46 it’s this size.
0:15:48 That’s a description of an apple
0:15:49 but it’s not an apple.
0:15:51 And giving someone,
0:15:51 hey,
0:15:52 go do this,
0:15:53 that’s not a goal,
0:15:54 that’s a description of a goal.
0:15:55 And for humans,
0:15:57 we’re so fast,
0:15:58 we’re so good
0:15:59 at turning a description of a goal
0:16:00 into a goal.
0:16:00 We do it,
0:16:02 we do it so quickly and naturally
0:16:03 we don’t even see it happening.
0:16:05 Like,
0:16:06 we think that,
0:16:06 we get confused
0:16:07 and we think those are the same thing
0:16:08 but you haven’t,
0:16:09 you haven’t given it a goal,
0:16:12 you’ve given it a description of a goal
0:16:13 that you want it to,
0:16:15 you hope it turns back
0:16:15 into the goal
0:16:17 that is the same as the goal
0:16:18 that you,
0:16:19 you described inside of you.
0:16:24 you could give it a goal directly
0:16:25 by reading your brainwaves
0:16:27 and synchronizing its state
0:16:28 to your brainwaves directly.
0:16:29 I think that would meaningfully,
0:16:29 you could say,
0:16:29 okay,
0:16:30 I’m giving it a goal,
0:16:31 I’m synchronizing it,
0:16:32 its internal state
0:16:33 to my internal state directly
0:16:35 and this internal state is the goal
0:16:36 and so now it’s the same.
0:16:37 But I don’t,
0:16:38 most people aren’t,
0:16:40 don’t mean that
0:16:41 when they say they gave it a goal.
0:16:41 Sure.
0:16:42 And is this,
0:16:44 is the distinction you’re making,
0:16:44 Emmett,
0:16:46 important because there’s some lossiness
0:16:47 between the description
0:16:47 and the actual goal
0:16:49 or why is the distinction that?
0:16:50 It goes back to,
0:16:50 my,
0:16:51 what I was saying,
0:16:52 like this is you,
0:16:53 technical alignment
0:16:54 is the capacity
0:16:56 of an AI
0:16:57 that I put forward,
0:16:57 right,
0:16:58 I want to check if we’re
0:16:59 like on the same page about it,
0:17:00 is the capacity of an AI
0:17:02 to be good at inference
0:17:03 about goals
0:17:05 and like be good
0:17:06 at inferring
0:17:08 from a description
0:17:08 of a goal
0:17:09 what goal
0:17:10 to actually take on
0:17:12 and good at,
0:17:13 once it takes on that goal,
0:17:15 acting in a way
0:17:15 that is
0:17:17 actually in concordance
0:17:18 with that goal
0:17:18 coming about.
0:17:19 So it is both pieces.
0:17:20 You,
0:17:21 you have to be able to,
0:17:22 you have to have the theory of mind
0:17:23 to infer
0:17:24 what the,
0:17:25 what that description of a goal
0:17:26 that you got,
0:17:27 what goal that corresponded to,
0:17:28 and then you have to have
0:17:29 a theory of the world
0:17:29 to understand
0:17:31 what actions correspond
0:17:32 to that goal occurring.
0:17:33 And if either of those things breaks,
0:17:34 it kind of doesn’t matter
0:17:35 what goal you were,
0:17:36 if you can’t consistently
0:17:38 do both of those things,
0:17:40 you’re not,
0:17:41 which I think of as being
0:17:41 a coherent,
0:17:42 inferring goals
0:17:43 from observations
0:17:44 and acting in accordance
0:17:45 with those goals
0:17:45 is what I think of
0:17:47 as being a coherently
0:17:48 goal-oriented being.
0:17:48 Because that’s what,
0:17:50 whether I’m inferring
0:17:50 those goals
0:17:51 from someone else’s instructions
0:17:52 or from the sun
0:17:54 or tea leaves,
0:17:55 the process is
0:17:56 get some observations,
0:17:57 infer a goal,
0:17:59 use that goal,
0:18:00 infer some actions,
0:18:01 take action.
0:18:02 And if you,
0:18:05 an AI that can’t do that
0:18:06 is not technically aligned,
0:18:07 or not technically aligned
0:18:07 a bowl,
0:18:08 I would even say.
0:18:09 It lacks the capacity
0:18:09 to be aligned
0:18:10 because it can’t,
0:18:12 it’s not competent enough.
0:18:13 And you think language models
0:18:14 don’t do that well?
0:18:15 As in,
0:18:16 they kind of fail at that
0:18:16 or they’re not?
0:18:18 People fail at both
0:18:19 those steps all the time.
0:18:20 Constantly.
0:18:20 I tell people,
0:18:21 I tell employees
0:18:22 to do stuff and like,
0:18:23 yeah,
0:18:24 but then,
0:18:26 but people fail
0:18:27 at like breathing
0:18:28 all the time too.
0:18:29 And I wouldn’t say
0:18:30 that we can’t breathe,
0:18:30 I just say that
0:18:32 we’re like not gods.
0:18:33 Like we are,
0:18:33 yes,
0:18:34 we are imperfectly,
0:18:35 we are somewhat coherent,
0:18:37 relatively coherent things.
0:18:38 Just like we’re,
0:18:40 am I big or am I small?
0:18:40 Well,
0:18:41 I don’t know,
0:18:41 compared to what?
0:18:42 I’m,
0:18:43 humans are more
0:18:44 relatively goal coherent
0:18:46 than any other object
0:18:47 I know of in the universe.
0:18:49 Which is not to say
0:18:51 that we’re 100% goal coherent,
0:18:52 we’re just like more so.
0:18:53 And I think this,
0:18:54 you’re never going to get
0:18:55 something that’s perfectly,
0:18:58 the universe doesn’t give you
0:18:59 perfection,
0:19:00 it gives you relatively
0:19:01 some amount of quantity.
0:19:03 It’s a quantifiable thing
0:19:04 how good you are at it,
0:19:05 at least in a certain domain.
0:19:07 I guess my question is like,
0:19:08 do you think that,
0:19:09 does that capture
0:19:10 what you’re talking about
0:19:11 with technical alignment
0:19:12 or are you talking about
0:19:13 a different thing?
0:19:13 Yeah,
0:19:13 no,
0:19:14 I think I really care a lot
0:19:15 about that thing.
0:19:16 Yeah,
0:19:16 I mean,
0:19:17 I definitely care about that
0:19:17 to some extent.
0:19:18 I might like understand it
0:19:19 slightly differently,
0:19:20 but I guess I might think of it
0:19:21 through the lens of maybe
0:19:22 principal agent problems
0:19:22 or something.
0:19:24 you kind of instruct someone,
0:19:26 even I guess in human terms,
0:19:27 to do a thing,
0:19:28 are they actually doing the thing?
0:19:29 What are their incentives
0:19:29 and motivation?
0:19:31 And not necessarily
0:19:31 even intrinsic,
0:19:32 but kind of situational
0:19:33 to actually do the thing
0:19:34 you’ve asked them to do.
0:19:36 And in some instances,
0:19:36 sorry,
0:19:36 yeah?
0:19:39 There’s a third thing.
0:19:40 So principal agent problems,
0:19:42 I would expand what I was saying
0:19:43 in another part,
0:19:43 which is like,
0:19:44 you might already have some goals
0:19:46 and then you inferred
0:19:46 this new goal
0:19:47 from these observations.
0:19:48 And then like,
0:19:49 are you good at,
0:19:52 are you good at balancing
0:19:54 the relative importance
0:19:55 and relative threading
0:19:56 of these goals with each other?
0:19:57 Which is another skill
0:19:57 you have to have.
0:19:59 And if you’re bad at that,
0:20:00 you’ll fail.
0:20:01 You could be bad at it
0:20:02 because you overweight
0:20:04 bad goals
0:20:04 or you could be bad at it
0:20:05 because you’re just incompetent
0:20:06 and like can’t figure out
0:20:08 that obviously you should do
0:20:09 goal A before goal B.
0:20:10 I feel like a version
0:20:11 of like common sense
0:20:11 or something, right?
0:20:12 Like the kind of thing that,
0:20:12 you know,
0:20:13 in fact,
0:20:14 in the kind of robot
0:20:15 cleaning the room example thing,
0:20:16 you know,
0:20:17 you would expect them
0:20:17 to have understood
0:20:18 that goal of the robot
0:20:19 to like essentially
0:20:19 not put the baby
0:20:20 in the trash can or something
0:20:21 and just actually do
0:20:22 the right sequence of action.
0:20:22 Well,
0:20:24 in that case,
0:20:25 it failed the,
0:20:27 that robot
0:20:28 very clearly failed
0:20:30 goal inference.
0:20:31 You gave it a description
0:20:32 of a goal
0:20:33 and it inferred
0:20:34 the wrong states
0:20:36 to be the wrong goal states.
0:20:38 That’s just incompetence.
0:20:40 It doesn’t,
0:20:41 it is incompetent
0:20:42 and inferring goal states
0:20:43 from observations.
0:20:45 Children are like this too.
0:20:45 Like, you know,
0:20:46 and honestly,
0:20:47 if you’ve ever played
0:20:48 the game
0:20:49 where you give someone
0:20:50 instructions to make
0:20:51 a peanut butter sandwich
0:20:53 and then they follow
0:20:54 those instructions
0:20:55 exactly as you’ve written them
0:20:57 without filling in any gaps,
0:20:59 it’s hilarious
0:21:01 because you can’t do it.
0:21:02 It’s impossible.
0:21:03 Like, you think you’ve done it
0:21:04 and you haven’t.
0:21:04 And like,
0:21:05 they put the,
0:21:06 they wind up putting
0:21:07 the knife in the toaster
0:21:08 and like,
0:21:09 the peanut butter,
0:21:10 they don’t open
0:21:11 the peanut butter jar
0:21:12 so they’re just jamming
0:21:13 the knife into the top
0:21:14 lid of the peanut butter jar
0:21:14 and like,
0:21:15 it’s endless.
0:21:16 And like,
0:21:18 because actually,
0:21:20 if you don’t already
0:21:20 know what they mean,
0:21:22 it’s really hard
0:21:23 to know what they mean.
0:21:24 Like,
0:21:26 we were,
0:21:27 the reason humans
0:21:28 are so good at this
0:21:28 is we have a really
0:21:29 excellent theory of mind.
0:21:30 I already know
0:21:31 what you’re likely
0:21:32 to ask me to do.
0:21:32 I already have a good model
0:21:33 of what your goals
0:21:34 probably are.
0:21:35 So when you ask me to do it,
0:21:37 I have an easy inference problem.
0:21:38 Which of the seven things
0:21:38 that he wants
0:21:40 is he indicating?
0:21:43 But if I’m a newborn AI
0:21:43 that doesn’t have,
0:21:44 that doesn’t have a great model
0:21:46 of people’s internal states,
0:21:47 then like,
0:21:48 I don’t know what you mean.
0:21:49 It’s just incompetent.
0:21:49 It’s not like,
0:21:51 which is separate from
0:21:52 I have some other goal
0:21:54 and I knew what you meant,
0:21:56 but I decided not to do it
0:21:57 because there’s some other goal
0:21:57 that’s competing with it,
0:21:58 which is another thing
0:21:59 you can be bad at.
0:22:00 Which is,
0:22:01 again,
0:22:01 different than
0:22:03 I had the right goal,
0:22:04 I inferred the right goal,
0:22:04 I inferred the right priority
0:22:05 on goals,
0:22:06 and then
0:22:08 I’m just bad
0:22:08 at doing the thing.
0:22:09 I’m trying,
0:22:10 but I’m,
0:22:12 I’m incompetent at doing.
0:22:14 And these roughly
0:22:14 correspond to the OODA loop,
0:22:15 right?
0:22:15 Like,
0:22:16 bad at observing
0:22:18 and orienting,
0:22:19 bad at deciding,
0:22:20 bad at acting.
0:22:21 And if you’re bad
0:22:22 at any of those things,
0:22:22 you won’t,
0:22:23 you won’t be good.
0:22:27 And then I think
0:22:28 there’s this other problem
0:22:29 that you,
0:22:30 I like the,
0:22:31 the separation you have
0:22:31 between technical alignment
0:22:32 and value alignment,
0:22:33 which is like,
0:22:33 are you good
0:22:34 if,
0:22:36 if we told you
0:22:36 the right goals
0:22:37 to go after somehow,
0:22:38 if you,
0:22:39 if you learned
0:22:39 the right goals
0:22:40 to go after
0:22:40 via observation
0:22:43 and like,
0:22:45 and you were trying
0:22:46 like,
0:22:48 what goals should you have?
0:22:49 What goals should we
0:22:49 tell you to have?
0:22:50 What goals should we
0:22:50 tell ourselves to have?
0:22:51 What,
0:22:52 what are the good goals
0:22:52 to have?
0:22:53 Is a separate question
0:22:54 from given that you,
0:22:56 you got some goals
0:22:56 indicated,
0:22:57 are you any good
0:22:57 at doing it?
0:22:58 Which I feel like
0:22:59 is actually in many ways
0:23:00 the current heart
0:23:00 of the problem.
0:23:01 We’re actually much worse
0:23:02 at technical alignment
0:23:02 than we are
0:23:03 at guessing
0:23:04 what to tell things
0:23:04 to do.
0:23:07 Do you think that,
0:23:07 does that align
0:23:08 with your,
0:23:09 how you mean
0:23:09 technical and value
0:23:10 alignment or technical?
0:23:11 Yeah,
0:23:11 in some sense.
0:23:11 I mean,
0:23:12 certainly think that
0:23:12 there’s a,
0:23:14 there’s something,
0:23:15 you know,
0:23:15 like an error,
0:23:16 a mistake is one thing
0:23:17 and then there’s the,
0:23:17 there’s the,
0:23:20 not listening to the
0:23:20 instruction or something.
0:23:21 But then,
0:23:21 yeah,
0:23:21 I think on the normative
0:23:22 side,
0:23:22 I mean,
0:23:22 I just think that
0:23:23 even in real life,
0:23:24 ignoring AI,
0:23:24 like,
0:23:25 I don’t know what my
0:23:26 goals are and like,
0:23:26 well,
0:23:26 you know,
0:23:27 I’ve got some broad
0:23:28 conception of certain
0:23:28 things,
0:23:29 right?
0:23:29 I want to kind of,
0:23:30 you know,
0:23:31 have dinner later
0:23:32 or something like,
0:23:32 oh,
0:23:32 I want to kind of do
0:23:33 well in my career.
0:23:34 But the,
0:23:35 I think a lot of
0:23:36 these goals aren’t
0:23:37 something we kind of
0:23:37 all just know.
0:23:38 We kind of discover them
0:23:39 as we go along.
0:23:39 It’s kind of a
0:23:40 constructive thing.
0:23:41 And so,
0:23:42 and most people don’t
0:23:43 know their goals,
0:23:43 I think.
0:23:43 And so,
0:23:44 you know,
0:23:46 I think when you have
0:23:47 agents and kind of
0:23:48 giving them goals or
0:23:48 whatever,
0:23:49 I think that should be
0:23:50 part of the equation
0:23:50 that like,
0:23:50 we actually,
0:23:51 we don’t know all the
0:23:52 goals and this is
0:23:53 something that is kind
0:23:53 of,
0:23:53 like you say,
0:23:54 a process over time
0:23:55 that is,
0:23:55 you know,
0:23:56 dynamic.
0:23:57 So I think from my
0:23:58 point of view,
0:24:00 there’s,
0:24:03 goals are one level
0:24:03 of alignment.
0:24:04 You can align something
0:24:05 around goals,
0:24:05 the kind of goals
0:24:06 we’re talking about
0:24:08 here are one level
0:24:09 of alignment.
0:24:09 You can align something
0:24:11 around goals by like,
0:24:13 if you can explicitly
0:24:16 articulate in concept
0:24:17 and in description,
0:24:19 the states of the world
0:24:21 that you wish to attain,
0:24:22 you can,
0:24:23 you can orient around
0:24:24 goals.
0:24:24 But that only,
0:24:26 that’s a tiny percentage
0:24:26 of human experience
0:24:27 can be done that way.
0:24:29 Many of the most
0:24:30 important things
0:24:30 cannot be,
0:24:31 cannot be oriented
0:24:31 around that way.
0:24:33 And the foundation,
0:24:33 I think,
0:24:34 of morality,
0:24:35 and the foundation,
0:24:35 I think,
0:24:37 of where do goals
0:24:38 come from?
0:24:38 Where do values
0:24:38 come from?
0:24:39 Human beings
0:24:40 exhibit a behavior.
0:24:42 We go around
0:24:43 talking about goals
0:24:43 and we go around
0:24:44 talking about values
0:24:45 and like,
0:24:48 that’s a behavior
0:24:49 caused by some
0:24:51 internal learning process
0:24:53 that is based on
0:24:54 like observing the world.
0:24:55 What’s going on there?
0:24:56 I think what’s happening
0:24:57 is that there’s
0:24:59 something deeper
0:25:00 than a goal
0:25:02 and deeper
0:25:03 than a value
0:25:04 which is care.
0:25:06 We give a shit.
0:25:07 We care about things.
0:25:08 And care is not
0:25:09 conceptual.
0:25:10 Care is nonverbal.
0:25:12 It doesn’t indicate
0:25:12 what to do.
0:25:13 It doesn’t indicate
0:25:15 how to do it.
0:25:18 Care is a relative
0:25:19 weighting
0:25:20 over,
0:25:21 effectively,
0:25:21 like attention
0:25:22 on states.
0:25:23 It’s a relative
0:25:23 weighting over
0:25:27 which states
0:25:27 in the world
0:25:28 are important to you.
0:25:30 And I care a lot
0:25:31 about my son.
0:25:31 What does that mean?
0:25:32 Well, it means
0:25:33 his states,
0:25:34 the states he could be in
0:25:35 are like,
0:25:38 I pay a lot of attention
0:25:38 to those
0:25:39 and those matter to me.
0:25:41 And you can care
0:25:42 about things
0:25:42 in a negative way.
0:25:43 You can care about
0:25:44 your enemies
0:25:45 and what they’re doing
0:25:46 and you can desire
0:25:47 for them to do bad.
0:25:49 But I think that like,
0:25:50 and so you don’t
0:25:51 just want it to care about us.
0:25:52 You want it to care about us
0:25:53 and like us too,
0:25:53 right, maybe.
0:25:56 But the foundation
0:25:56 is care.
0:25:57 Until you care,
0:25:58 you don’t know,
0:25:59 why should I pay
0:25:59 more attention
0:26:00 to this person
0:26:00 than this rock?
0:26:02 Well, because I care more.
0:26:05 And what is that care stuff?
0:26:06 And I think that
0:26:07 what it appears to be,
0:26:09 if I had to like guess,
0:26:12 is that the care stuff,
0:26:14 it sounds so stupid,
0:26:15 but like care
0:26:16 is basically like
0:26:18 a reward.
0:26:20 Like how much
0:26:20 does this state
0:26:21 correlate with survival?
0:26:23 how much
0:26:24 does this state
0:26:24 correlate
0:26:25 with your inclusive,
0:26:27 your full inclusive
0:26:29 reproductive fitness
0:26:31 for someone
0:26:32 that learns evolutionarily
0:26:34 or for a reinforcement
0:26:35 learning agent
0:26:36 like a LLM?
0:26:36 How much does this
0:26:37 correlate with reward?
0:26:38 Does this state
0:26:39 correlate with
0:26:40 my predictive loss
0:26:42 and my RL loss?
0:26:42 Good.
0:26:43 That’s a state
0:26:43 I care about.
0:26:44 I think that’s kind of
0:26:45 what it is.
0:26:48 the other part
0:26:49 of Seth’s question
0:26:50 was just
0:26:52 what does this look like
0:26:53 in AI systems?
0:26:53 And maybe another
0:26:54 way of asking it
0:26:55 is like
0:26:57 when you talk
0:26:58 to the people
0:26:59 most focused
0:27:00 on alignment
0:27:02 at the major labs
0:27:03 as obviously
0:27:04 you have over the years,
0:27:05 how does your
0:27:06 interpretation differ
0:27:08 from their interpretation
0:27:09 and how does that
0:27:09 inform
0:27:11 what you guys
0:27:11 might go do
0:27:12 differently?
0:27:14 Most of AI
0:27:15 is focused on
0:27:16 alignment
0:27:17 as steering.
0:27:18 That’s the polite word
0:27:20 or control
0:27:20 which is slightly
0:27:21 less polite.
0:27:21 If you think
0:27:22 that we’re making
0:27:23 our beings
0:27:23 you would also
0:27:24 call this slavery.
0:27:28 someone who you steer
0:27:28 who doesn’t get
0:27:29 to steer you back
0:27:30 is slave
0:27:31 who non-optionally
0:27:32 receives your steering
0:27:33 that’s called a slave
0:27:35 and
0:27:37 it’s also called
0:27:38 a tool
0:27:38 if it’s not a being
0:27:39 so if it’s a machine
0:27:41 it’s a tool
0:27:42 and if it’s a being
0:27:42 it’s a slave
0:27:43 and
0:27:48 I think that
0:27:50 the different AI labs
0:27:50 are pretty divided
0:27:51 as to whether they think
0:27:52 what they’re making
0:27:52 is a tool
0:27:53 or a machine.
0:27:55 I think some of the AIs
0:27:56 are definitely more tool-like
0:27:56 and some of them
0:27:57 are more machine-like.
0:27:58 I don’t think there’s a
0:27:59 binary between tool
0:28:00 and being.
0:28:01 It seems to be that
0:28:02 it sort of
0:28:03 moves gradually
0:28:05 and I think that
0:28:09 I guess I’m a functionalist
0:28:10 in the sense that
0:28:11 I think that something
0:28:11 that in all ways
0:28:12 acts like a being
0:28:13 that you cannot distinguish
0:28:14 from a being
0:28:14 and its behaviors
0:28:16 is a being
0:28:17 because I don’t know
0:28:17 how to tell
0:28:18 on what other basis
0:28:19 I think that other people
0:28:19 are beings
0:28:20 other than they seem to be
0:28:21 they look like it
0:28:22 they act like it
0:28:23 they match
0:28:25 they match my priors
0:28:25 of what beings
0:28:27 behaviors of beings
0:28:27 look like
0:28:30 I get lower predictive loss
0:28:30 when I treat them
0:28:31 as a being
0:28:32 and the thing is
0:28:34 I get lower predictive loss
0:28:34 when I treat
0:28:35 ChatGPT or Claude
0:28:36 as a being
0:28:38 now not as a very smart being
0:28:39 like I think that
0:28:40 like a fly is a being
0:28:41 and I don’t care that much
0:28:42 about its behavior
0:28:43 about its states
0:28:44 so just because it’s a being
0:28:45 doesn’t mean that
0:28:46 it’s a problem
0:28:48 we sort of enslave horses
0:28:48 in a sense
0:28:49 and I don’t think
0:28:50 I don’t think
0:28:51 there’s a real issue there
0:28:54 and you even
0:28:56 and there’s a thing
0:28:56 we do with children
0:28:58 that can look like slavery
0:28:58 but it’s not
0:29:00 you control children
0:29:00 right
0:29:02 but the children’s states
0:29:04 also control you
0:29:05 like yes
0:29:06 I tell my son
0:29:06 what to do
0:29:07 and make him go do stuff
0:29:09 but also when he cries
0:29:09 in the middle of the night
0:29:10 he can tell me
0:29:11 to do stuff
0:29:12 like there’s a real
0:29:13 two way street here
0:29:14 because
0:29:15 because it’s not
0:29:16 which is not necessarily
0:29:17 symmetric
0:29:18 it’s hierarchical
0:29:18 but
0:29:18 but
0:29:21 but two way
0:29:22 and basically
0:29:23 I think that
0:29:24 as the AIs
0:29:25 as the
0:29:27 it’s good
0:29:29 to focus on control
0:29:30 steering and control
0:29:31 for tool-like AIs
0:29:32 and we should continue
0:29:33 to develop strong
0:29:34 steering control techniques
0:29:35 for the more tool-like AIs
0:29:35 that we build
0:29:37 and we are clearly
0:29:38 they’re saying
0:29:39 they’re building an AGI
0:29:41 and AGI will be a being
0:29:42 you can’t be an AGI
0:29:43 and not be a being
0:29:43 because
0:29:45 something that has
0:29:46 the general ability
0:29:47 to effectively
0:29:48 use judgment
0:29:48 think for itself
0:29:50 discern between
0:29:51 possibilities
0:29:51 is obviously
0:29:52 a thinking thing
0:29:52 like
0:29:53 and so
0:29:54 as you
0:29:56 go from what we have today
0:29:56 which is mostly
0:29:57 a very specific intelligence
0:29:58 not a general intelligence
0:30:00 but as labs succeed
0:30:01 at their goal
0:30:01 of building this
0:30:02 general intelligence
0:30:04 we really need to stop
0:30:05 using the
0:30:06 steering control paradigm
0:30:07 that’s like
0:30:08 we’re gonna
0:30:09 we’re gonna do
0:30:09 the same thing
0:30:10 we’ve done
0:30:11 every other time
0:30:12 our society has run into
0:30:13 people who
0:30:13 are like us
0:30:14 but different
0:30:15 like these people
0:30:15 are like
0:30:16 you know
0:30:17 they’re kind of like
0:30:18 like the people
0:30:19 but they’re not like people
0:30:20 like they do the same
0:30:21 thing people do
0:30:22 they speak our language
0:30:22 they can like
0:30:23 take on the same
0:30:24 kind of tasks
0:30:24 but like
0:30:25 they don’t count
0:30:26 they’re not real
0:30:26 moral agents
0:30:27 like we’ve made this
0:30:28 mistake enough times
0:30:28 at this point
0:30:28 I would like us
0:30:29 to not make it
0:30:30 again
0:30:32 as it comes up
0:30:33 so our view
0:30:36 is to make the AI
0:30:36 a good teammate
0:30:37 make the AI
0:30:39 a good citizen
0:30:40 make the AI
0:30:41 a good member
0:30:41 of your group
0:30:42 that’s
0:30:43 that’s the
0:30:44 form of alignment
0:30:44 that is scalable
0:30:45 and you can
0:30:46 you can will
0:30:47 on other humans
0:30:48 and other beings
0:30:48 as well as
0:30:49 onto
0:30:50 interferons
0:30:50 on AI
0:30:51 as well
0:30:52 yeah
0:30:52 so this is
0:30:53 kind of where
0:30:53 I probably
0:30:54 differ
0:30:54 in my understanding
0:30:55 of AI
0:30:56 and AGI
0:30:56 and I guess
0:30:57 I kind of
0:30:57 continue seeing it
0:30:58 as a tool
0:30:59 even as it
0:30:59 kind of reaches
0:30:59 a certain level
0:31:00 of generality
0:31:02 and I kind of
0:31:02 wouldn’t necessarily
0:31:03 see more intelligence
0:31:04 as meaning
0:31:06 deserving of more
0:31:07 care necessarily
0:31:08 like you know
0:31:08 as a certain level
0:31:09 of intelligence
0:31:09 now you deserve
0:31:10 certain moral rights
0:31:10 to something
0:31:11 or you know
0:31:12 something changes
0:31:12 fundamentally
0:31:14 and I guess
0:31:14 you know
0:31:14 I guess
0:31:15 at the moment
0:31:16 I’m somewhat
0:31:16 skeptical of
0:31:17 computational functionalism
0:31:18 and so I think
0:31:19 there’s something
0:31:20 intrinsically different
0:31:20 between I guess
0:31:22 an AI
0:31:23 or an AGI
0:31:23 and no matter
0:31:24 how intelligent
0:31:25 or capable
0:31:28 and I can totally see
0:31:28 you know
0:31:29 or imagine
0:31:29 agents
0:31:30 with kind of
0:31:31 long term goals
0:31:32 and doing kind of
0:31:32 you know
0:31:33 operating
0:31:33 I guess
0:31:34 as you and I
0:31:35 might be
0:31:36 but without that
0:31:37 having the same
0:31:37 implications
0:31:38 as you know
0:31:40 I guess
0:31:40 you’re referring
0:31:40 I guess
0:31:41 to slavery
0:31:42 but you know
0:31:43 they’re not the same
0:31:43 right
0:31:43 like I think
0:31:44 in the same way
0:31:45 as a model
0:31:46 saying I’m hungry
0:31:46 does not have
0:31:47 the same implications
0:31:48 as a human
0:31:48 saying I’m hungry
0:31:49 so I think
0:31:49 the substrate
0:31:50 does matter
0:31:50 to some degree
0:31:51 including
0:31:52 for thinking
0:31:52 about
0:31:53 you know
0:31:54 whether to think
0:31:54 of the system
0:31:55 sort of other being
0:31:56 whether it has
0:31:56 you know
0:31:57 and if there are
0:31:59 similar normative
0:31:59 considerations
0:32:00 I guess
0:32:00 about how to
0:32:01 treat
0:32:01 and act
0:32:02 with it
0:32:03 can I ask you
0:32:03 about that
0:32:03 like
0:32:05 what observations
0:32:06 would change your mind
0:32:07 is there any
0:32:08 observation you could make
0:32:09 that would cause you
0:32:10 to infer
0:32:12 this thing is a being
0:32:13 instead of not a being
0:32:14 I guess it depends
0:32:15 with how you define
0:32:15 being
0:32:16 right
0:32:17 like I mean
0:32:17 I can
0:32:17 I could
0:32:18 conceptualize it
0:32:19 as a mind
0:32:20 and that’s fine
0:32:20 this
0:32:21 I have a
0:32:21 I have a
0:32:22 I have a program
0:32:23 that’s running
0:32:23 on a silicon
0:32:24 substrate
0:32:24 some big
0:32:25 complicated
0:32:26 machine learning
0:32:26 program
0:32:27 running
0:32:28 on a substrate
0:32:29 on a silicon
0:32:30 substrate
0:32:31 so you know
0:32:31 you observe
0:32:32 you observe that
0:32:32 you observe
0:32:33 that it’s on a computer
0:32:35 and you interact
0:32:35 with it
0:32:36 and it does things
0:32:37 and you know
0:32:38 it takes actions
0:32:38 it has observations
0:32:40 is there
0:32:41 anything you could
0:32:41 observe
0:32:43 that would change
0:32:44 your mind
0:32:46 about whether or not
0:32:48 it was a moral
0:32:49 patient
0:32:49 or whether it was
0:32:50 a moral agent
0:32:51 about whether or not
0:32:51 it
0:32:53 it had feelings
0:32:54 and thoughts
0:32:54 and you know
0:32:55 it had subjective
0:32:55 experience
0:32:56 like
0:32:56 could
0:32:58 what would you
0:32:58 have to observe
0:33:00 that
0:33:00 what
0:33:01 yeah
0:33:02 what’s the test
0:33:02 is there
0:33:03 is there one
0:33:04 there’s a lot of
0:33:05 different kind of
0:33:06 questions here
0:33:07 I think you know
0:33:09 on one hand
0:33:09 there’s like
0:33:10 normative
0:33:10 considerations
0:33:11 you know
0:33:11 because you can
0:33:12 give rights
0:33:12 to things
0:33:13 that aren’t
0:33:13 necessarily beings
0:33:14 you know
0:33:14 a company
0:33:15 has rights
0:33:15 in some sense
0:33:16 and that
0:33:16 you know
0:33:16 these are
0:33:16 kind of
0:33:17 useful
0:33:17 for various
0:33:18 purposes
0:33:19 and I think
0:33:19 also the
0:33:20 you know
0:33:21 biological
0:33:22 beings
0:33:23 and systems
0:33:23 have
0:33:24 very different
0:33:25 kind of
0:33:25 substrate
0:33:25 you know
0:33:26 you can’t
0:33:26 separate
0:33:26 certain needs
0:33:28 and particularities
0:33:28 about what they
0:33:28 are
0:33:29 from the
0:33:29 substrate
0:33:30 so you know
0:33:30 I can’t
0:33:31 copy myself
0:33:31 I can’t
0:33:32 you know
0:33:32 if someone
0:33:33 stabs me
0:33:33 I probably
0:33:34 die
0:33:35 whereas I think
0:33:35 you know
0:33:36 machines
0:33:37 are very
0:33:37 different
0:33:38 I think
0:33:39 there’s more
0:33:39 fundamental
0:33:39 also
0:33:40 kind of
0:33:40 this
0:33:40 agreement
0:33:40 around
0:33:41 what happens
0:33:42 at the
0:33:42 computational
0:33:43 level
0:33:43 which I think
0:33:44 is different
0:33:45 to what happens
0:33:46 with biological
0:33:46 systems
0:33:48 but yeah
0:33:49 so I don’t know
0:33:51 I agree that
0:33:52 like if you have
0:33:52 a program
0:33:53 that you’ve copied
0:33:53 many times
0:33:54 you don’t harm
0:33:54 the program
0:33:55 by like deleting
0:33:56 one of the copies
0:33:56 like in any
0:33:57 meaningful sense
0:33:57 so therefore
0:33:58 that wouldn’t
0:33:58 count as like
0:33:59 no information
0:34:00 was lost
0:34:00 right
0:34:00 there’s no
0:34:01 there’s nothing
0:34:02 meaningful there
0:34:03 I’m asking you
0:34:03 a very different
0:34:04 question
0:34:04 like there’s
0:34:05 just one copy
0:34:06 of this thing
0:34:06 running on one
0:34:07 computer
0:34:07 somewhere
0:34:08 and I’m just
0:34:08 saying like
0:34:10 hey is it a
0:34:10 person
0:34:10 like
0:34:12 you know
0:34:13 it walks
0:34:14 like a person
0:34:14 it talks
0:34:15 like a person
0:34:15 and it like
0:34:17 it’s in some
0:34:18 android body
0:34:19 and you’re like
0:34:20 it’s running on
0:34:20 silicon
0:34:21 and I’m asking
0:34:21 like
0:34:23 is there some
0:34:23 observation you
0:34:24 could make
0:34:24 that would make
0:34:24 you say like
0:34:25 yeah this is a
0:34:26 person like me
0:34:26 like other
0:34:28 people that I
0:34:28 care about
0:34:29 that I grant
0:34:29 personhood to
0:34:31 or and not
0:34:32 like for
0:34:32 instrumental
0:34:33 reasons
0:34:33 not because
0:34:34 like oh
0:34:35 yeah we’re
0:34:35 giving it a
0:34:36 right because
0:34:36 like we give
0:34:36 a corporation
0:34:37 rights or
0:34:37 whatever
0:34:38 I mean like
0:34:38 you know
0:34:39 where you
0:34:39 you think
0:34:40 some people
0:34:40 you care
0:34:41 you care
0:34:41 about its
0:34:41 experiences
0:34:42 what would
0:34:42 is there
0:34:43 is there
0:34:44 is there
0:34:44 an observation
0:34:45 you could
0:34:45 make that
0:34:46 could change
0:34:46 your mind
0:34:47 about that
0:34:48 or not
0:34:50 have to think
0:34:50 about it
0:34:50 but I think
0:34:51 you know
0:34:51 it even
0:34:51 depends what
0:34:51 we mean
0:34:52 by person
0:34:53 and you know
0:34:54 in some sense
0:34:54 I care about
0:34:55 certain corporations
0:34:55 too
0:34:57 so I’m
0:34:57 no no no
0:34:57 I mean but
0:34:58 like you care
0:34:58 about like
0:34:59 other people
0:35:00 in your life
0:35:00 right
0:35:01 yes
0:35:03 okay great
0:35:04 you know like
0:35:04 you care about
0:35:05 some people
0:35:05 more than others
0:35:06 but like all
0:35:06 all people
0:35:07 you interact
0:35:07 with in your
0:35:07 life are in
0:35:08 some range
0:35:08 of care
0:35:11 and you care
0:35:11 about them
0:35:12 not the way
0:35:12 you care
0:35:12 about a car
0:35:13 but you care
0:35:13 about them
0:35:14 as a
0:35:15 being
0:35:16 whose
0:35:17 experience
0:35:18 matters in
0:35:18 itself
0:35:19 not merely
0:35:20 as a means
0:35:20 but as an
0:35:20 ends
0:35:20 well because
0:35:21 I believe
0:35:21 they have
0:35:22 experiences
0:35:22 right
0:35:22 and
0:35:24 what would
0:35:24 it take
0:35:25 I’m asking
0:35:25 you the very
0:35:26 direct question
0:35:26 what would
0:35:27 it take
0:35:27 for you
0:35:28 to believe
0:35:28 that of
0:35:30 an AI
0:35:31 running on
0:35:31 silicon
0:35:33 like instead
0:35:33 of it being
0:35:34 biological
0:35:34 so the difference
0:35:36 is its behaviors
0:35:36 are roughly
0:35:36 similar
0:35:37 but the difference
0:35:38 is it’s a
0:35:38 substrate
0:35:39 what would
0:35:39 it take
0:35:39 for you
0:35:40 to give
0:35:40 it that
0:35:41 same
0:35:42 to extend
0:35:43 that same
0:35:43 inference
0:35:44 to it
0:35:44 that you
0:35:44 do to
0:35:45 all these
0:35:45 other people
0:35:45 in your
0:35:45 life
0:35:46 that you
0:35:47 can I
0:35:48 ask what
0:35:48 your answer
0:35:49 I’m taking
0:35:50 non-answer
0:35:51 as a
0:35:51 sort of
0:35:52 it’s unlikely
0:35:53 that he
0:35:53 would grant
0:35:55 for myself
0:35:56 it seems
0:35:56 hard for me
0:35:57 to imagine
0:35:57 giving the
0:35:58 same level
0:35:58 or similar
0:35:59 level of
0:35:59 personhood
0:36:00 in the same
0:36:00 way I don’t
0:36:01 give it to
0:36:02 animals either
0:36:02 and if you
0:36:03 were to ask
0:36:03 what would
0:36:04 need to be
0:36:04 true for
0:36:05 animals
0:36:05 I probably
0:36:06 couldn’t get
0:36:06 there either
0:36:07 what would
0:36:07 it take
0:36:07 for you
0:36:07 wait
0:36:08 you couldn’t
0:36:08 I can’t
0:36:09 imagine
0:36:09 for an
0:36:09 animal
0:36:09 so easy
0:36:10 this chimp
0:36:10 comes up
0:36:10 to me
0:36:10 he’s like
0:36:11 man I’m
0:36:12 so hungry
0:36:12 and like
0:36:13 you guys
0:36:13 have been
0:36:13 so mean
0:36:14 to me
0:36:14 and I’m
0:36:15 so glad
0:36:15 I figured
0:36:15 out how
0:36:15 to talk
0:36:17 like can
0:36:17 we go
0:36:17 can we
0:36:18 go chat
0:36:18 about
0:36:18 the
0:36:19 rainforest
0:36:19 I’d be
0:36:19 like fuck
0:36:20 you’re
0:36:20 definitely
0:36:20 a person
0:36:21 now
0:36:22 for sure
0:36:23 I mean
0:36:23 I first
0:36:23 wanted to
0:36:24 make sure
0:36:24 I wasn’t
0:36:24 hallucinating
0:36:25 but like
0:36:27 it’s easy
0:36:27 for me to
0:36:28 imagine an
0:36:28 animal
0:36:29 come on
0:36:29 it’s really
0:36:29 easy
0:36:30 it’s like
0:36:30 trivial
0:36:31 I’m not
0:36:31 saying that
0:36:32 you would
0:36:32 get the
0:36:32 observation
0:36:33 I’m just
0:36:33 saying like
0:36:34 it’s trivial
0:36:34 for me to
0:36:36 imagine an
0:36:36 animal that I
0:36:37 would extend
0:36:38 personhood to
0:36:38 under a set of
0:36:39 observations
0:36:41 so like
0:36:42 really
0:36:42 like
0:36:43 well
0:36:44 I didn’t
0:36:44 factor that
0:36:45 I didn’t
0:36:45 take that
0:36:46 imagination
0:36:46 you know
0:36:47 imagining a
0:36:47 chimp
0:36:48 talking
0:36:49 yeah
0:36:50 that’s a bit
0:36:50 closer to it
0:36:51 what’s your
0:36:51 answer to the
0:36:52 question that
0:36:52 you bring up
0:36:52 about the
0:36:53 AI
0:36:55 I guess
0:36:55 at a
0:36:55 metaphysical
0:36:56 level I
0:36:56 would say
0:36:58 if there
0:36:58 is a
0:36:59 belief you
0:36:59 hold
0:36:59 where there
0:37:00 is no
0:37:00 observation
0:37:01 that could
0:37:01 change your
0:37:01 mind
0:37:02 you don’t
0:37:02 have a
0:37:03 belief
0:37:03 you have
0:37:03 an article
0:37:04 of faith
0:37:05 you have
0:37:05 an assertion
0:37:07 because
0:37:08 real beliefs
0:37:09 are
0:37:10 inferences
0:37:11 from reality
0:37:12 and you’re
0:37:12 you can
0:37:12 never be
0:37:13 100%
0:37:13 confident
0:37:13 about
0:37:13 anything
0:37:14 and so
0:37:14 there should
0:37:15 always be
0:37:16 if you have
0:37:17 a belief
0:37:17 something
0:37:18 however unlikely
0:37:18 that would
0:37:19 change your
0:37:19 mind
0:37:19 oh yeah
0:37:20 I’m open
0:37:20 to it
0:37:20 I mean
0:37:20 just to be
0:37:20 clear
0:37:23 I’m just
0:37:23 saying
0:37:25 he just
0:37:25 hasn’t
0:37:25 gotten to it
0:37:26 yet
0:37:26 yeah
0:37:27 yeah
0:37:28 yeah
0:37:28 so I’m
0:37:29 curious
0:37:30 so my
0:37:30 answer is
0:37:31 basically
0:37:33 if
0:37:35 under
0:37:36 if its
0:37:36 surface level
0:37:37 behaviors
0:37:37 looked like
0:37:37 a human
0:37:38 and then
0:37:38 after I
0:37:39 probed it
0:37:39 it continued
0:37:39 to act
0:37:39 like a
0:37:39 human
0:37:40 and then
0:37:40 I
0:37:40 continued
0:37:40 to interact
0:37:40 with it
0:37:41 over a
0:37:41 long
0:37:41 period of
0:37:41 time
0:37:42 and it
0:37:42 continued
0:37:42 to act
0:37:42 like a
0:37:43 human
0:37:43 in all
0:37:43 ways
0:37:44 that I
0:37:44 understand
0:37:45 as being
0:37:45 meaningful
0:37:46 to me
0:37:46 interacting
0:37:46 with a
0:37:47 human
0:37:47 like I
0:37:47 interact
0:37:47 with
0:37:48 there’s a whole
0:37:48 set of
0:37:48 people I’m
0:37:49 really close
0:37:49 to I’ve only
0:37:50 ever interacted
0:37:51 to over text
0:37:53 yet I infer the
0:37:54 person behind that
0:37:54 is a real
0:37:55 thing
0:37:56 if it could
0:37:58 if I felt
0:37:59 care for it
0:37:59 I would infer
0:38:00 eventually
0:38:02 that I was
0:38:02 right
0:38:03 and then someone
0:38:03 else might
0:38:04 demonstrate to me
0:38:05 that
0:38:07 you’ve been
0:38:07 tricked by this
0:38:08 algorithm
0:38:09 and actually
0:38:09 look how
0:38:10 obvious
0:38:10 it’s
0:38:11 not
0:38:11 actually
0:38:11 a thing
0:38:11 and I
0:38:12 think
0:38:12 oh shit
0:38:12 I was
0:38:12 wrong
0:38:13 and then I
0:38:13 would not
0:38:13 care about it
0:38:16 but the preponderance
0:38:16 of the evidence
0:38:17 I don’t know
0:38:17 what else you could
0:38:18 possibly do
0:38:19 I infer other
0:38:20 people
0:38:21 matter
0:38:22 because I
0:38:22 interacted with them
0:38:23 enough that they
0:38:24 seem to have rich
0:38:25 inner worlds to me
0:38:25 after I interacted
0:38:26 with them a bunch
0:38:28 that’s why I think
0:38:28 the other people
0:38:29 are important
0:38:30 I suppose it
0:38:30 doesn’t give me
0:38:31 a very clear test
0:38:32 as to whether or
0:38:32 not
0:38:33 if you start by
0:38:34 if I care for it
0:38:34 then it always is
0:38:35 a little circular
0:38:35 right
0:38:37 and the other
0:38:37 thing is
0:38:38 if you were to
0:38:39 see a simulated
0:38:39 video game
0:38:40 and the character
0:38:41 is extremely
0:38:42 in many ways
0:38:42 human-like
0:38:43 it’s not your
0:38:43 network behind it
0:38:44 it’s like
0:38:45 whatever you use
0:38:45 to integrate
0:38:46 video games
0:38:47 I guess what
0:38:48 distinguishes that
0:38:49 I’ve never
0:38:51 had trouble
0:38:51 distinguishing
0:38:52 I’ve never
0:38:53 had a deep
0:38:54 caring relationship
0:38:54 with a video game
0:38:55 character that didn’t
0:38:55 have a person
0:38:57 I don’t know
0:38:58 that doesn’t happen
0:38:59 empirically you seem
0:39:00 wrong
0:39:01 I don’t have any
0:39:01 trouble distinguishing
0:39:03 between things
0:39:03 like Eliza
0:39:05 the fake chatbot
0:39:06 thing and a real
0:39:06 intelligence
0:39:07 you interrupted
0:39:08 it long enough
0:39:08 it’s pretty obvious
0:39:09 it’s not a person
0:39:09 it doesn’t take long
0:39:11 sure but if it’s
0:39:11 really good
0:39:12 empirically
0:39:13 if you can’t
0:39:13 actually tell the
0:39:14 difference
0:39:14 that’s when you
0:39:15 say you switch
0:39:16 yes yes
0:39:18 if it walks like a
0:39:18 duck and talks
0:39:19 like a duck
0:39:20 and shits like a
0:39:21 duck and eventually
0:39:22 it’s a duck
0:39:22 right
0:39:24 well if
0:39:25 everything is
0:39:25 duck like
0:39:26 then yeah
0:39:26 sure
0:39:26 if it’s hungry
0:39:27 as well
0:39:27 like a duck
0:39:28 is because
0:39:28 it has
0:39:28 these kind
0:39:29 of physical
0:39:29 components
0:39:30 yeah sure
0:39:31 at some point
0:39:31 I agree
0:39:33 so do you
0:39:33 think
0:39:35 there’s this
0:39:35 question
0:39:36 right
0:39:37 is the reason
0:39:37 I care about
0:39:38 other people
0:39:38 that they’re
0:39:38 made out of
0:39:39 carbon
0:39:40 is that
0:39:40 the
0:39:42 I don’t
0:39:42 think so
0:39:43 no me neither
0:39:43 I mean I’m
0:39:44 not a
0:39:44 substrate
0:39:45 chauvinist
0:39:45 I guess
0:39:48 but I think
0:39:48 you need more
0:39:49 than just
0:39:49 it acts
0:39:51 behaviorally
0:39:52 indistinguishable
0:39:52 like it’s not
0:39:52 a sufficient
0:39:53 how would
0:39:54 you
0:39:54 what else
0:39:55 can you know
0:39:55 about something
0:39:56 apart from its
0:39:57 behaviors
0:39:59 I mean a lot
0:39:59 like the
0:40:01 again if you
0:40:02 how would you
0:40:02 no no no
0:40:03 I’m sorry
0:40:04 I mean yeah
0:40:05 can you name me
0:40:06 something I can know
0:40:06 about something else
0:40:07 that doesn’t have
0:40:07 a behavior
0:40:08 it’s not a behavior
0:40:09 yeah I think
0:40:10 there’s like far
0:40:10 more kind of
0:40:11 experimental evidence
0:40:12 you can have
0:40:12 with kind of
0:40:13 you know
0:40:15 any object
0:40:16 and a thing
0:40:17 I could know
0:40:17 about it
0:40:18 that is not
0:40:18 from its
0:40:19 behavior
0:40:23 I’m not
0:40:24 yeah I’m not sure
0:40:24 I get the question
0:40:25 I suppose
0:40:25 but um
0:40:26 but equally
0:40:26 it’s not
0:40:27 my expertise
0:40:28 to be clear
0:40:28 it’s not my
0:40:29 straightforward
0:40:29 question
0:40:29 but like
0:40:30 I’m claiming
0:40:31 you only know
0:40:32 things because
0:40:33 they have
0:40:33 behaviors that
0:40:34 you observe
0:40:35 and you’re
0:40:35 saying no
0:40:35 you can know
0:40:36 something about
0:40:37 something without
0:40:38 without
0:40:39 observing its
0:40:39 behavior
0:40:41 tell me about
0:40:41 this
0:40:42 tell me about
0:40:42 this thing
0:40:43 and this
0:40:43 behavior
0:40:44 and this
0:40:44 thing I can
0:40:45 know about
0:40:45 it
0:40:46 that is not
0:40:46 due to its
0:40:47 behaviors
0:40:47 I guess I’m
0:40:48 saying there’s
0:40:48 different levels
0:40:49 of observation
0:40:49 and just
0:40:50 simply a duck
0:40:50 you know
0:40:51 something
0:40:51 quacking
0:40:52 like a duck
0:40:52 or something
0:40:52 does not
0:40:53 guarantee
0:40:53 that it’s
0:40:54 actually a duck
0:40:54 like I would
0:40:55 have to like
0:40:55 also cut it
0:40:56 and see if
0:40:56 there’s
0:40:57 you know
0:40:57 if it’s
0:40:57 duck like
0:40:58 on the inside
0:40:59 just the
0:40:59 outside
0:41:00 like I’m
0:41:01 not a
0:41:01 I guess
0:41:02 a behavior
0:41:03 totally
0:41:04 one of its
0:41:04 behaviors
0:41:05 is like
0:41:05 the way
0:41:06 that the
0:41:07 you know
0:41:07 floats move
0:41:08 around in the
0:41:09 matmoles
0:41:09 right
0:41:09 like
0:41:11 one of the
0:41:11 things I would
0:41:12 want to go
0:41:12 look for
0:41:13 which you could
0:41:14 totally do
0:41:14 is I want to
0:41:15 go look in
0:41:15 the manifold
0:41:16 the belief
0:41:16 manifold
0:41:17 and I want to
0:41:19 go see if
0:41:19 that belief
0:41:20 manifold
0:41:21 encodes
0:41:22 a sub
0:41:23 manifold
0:41:24 that is
0:41:24 self-referential
0:41:25 and a
0:41:26 sub-sub manifold
0:41:26 that is the
0:41:27 dynamics of the
0:41:27 self-referential
0:41:28 manifold which
0:41:28 is mind
0:41:29 and I would
0:41:30 I would want
0:41:30 to know
0:41:32 does this seem
0:41:32 well described
0:41:33 internally
0:41:34 as that kind
0:41:34 of a system
0:41:35 or does it
0:41:35 look like a
0:41:36 big lookup table
0:41:36 that would
0:41:37 matter to me
0:41:37 that’s part of
0:41:38 its behaviors
0:41:39 that I would
0:41:39 care about
0:41:39 I would
0:41:40 also care
0:41:40 about
0:41:41 how it
0:41:41 acts
0:41:42 and you know
0:41:42 and you
0:41:43 wait all the
0:41:43 evidence together
0:41:44 and then you
0:41:45 try to guess
0:41:46 does this thing
0:41:46 look like
0:41:48 it’s a thing
0:41:49 that has feelings
0:41:50 and you know
0:41:51 goals and cares
0:41:51 about stuff
0:41:53 in net on
0:41:53 balance
0:41:54 or not
0:41:54 like
0:41:55 but I can’t
0:41:55 imagine
0:41:58 which I think
0:41:58 you could do
0:41:58 for an
0:41:59 I think we do
0:41:59 for the AIs
0:42:00 I think we’re
0:42:00 always doing
0:42:01 that right
0:42:01 and so I’m
0:42:02 trying to figure
0:42:02 out like
0:42:03 beyond that
0:42:04 what else
0:42:04 is there
0:42:05 that just
0:42:05 seems like
0:42:06 the thing
0:42:07 yeah it
0:42:08 seems like
0:42:08 you guys
0:42:08 are using
0:42:09 behavior
0:42:09 in a slightly
0:42:10 different sense
0:42:10 and Emmett
0:42:10 is using
0:42:11 behavior
0:42:11 also in the
0:42:12 context of
0:42:13 what it’s
0:42:13 made of
0:42:14 of the inside
0:42:15 I don’t know
0:42:16 if there’s a
0:42:17 big disagreement
0:42:18 well no no no no
0:42:19 behavior is what
0:42:20 I can observe
0:42:20 of it
0:42:20 yes
0:42:21 I don’t actually
0:42:21 know what
0:42:22 it’s made of
0:42:22 I can only
0:42:23 I can
0:42:23 I can cut
0:42:24 your brain
0:42:24 open
0:42:24 I can
0:42:25 see you
0:42:26 I can
0:42:26 observe you
0:42:28 neuroning
0:42:28 and glistening
0:42:30 your neurons
0:42:30 glistening
0:42:31 but I don’t
0:42:31 actually ever
0:42:32 you can’t get
0:42:32 inside of it
0:42:32 right
0:42:33 that’s the
0:42:33 subjective
0:42:34 that’s the
0:42:35 part that’s not
0:42:35 the services
0:42:38 before the
0:42:39 the reason
0:42:39 I brought this
0:42:40 up is because
0:42:40 you were basically
0:42:41 about to make
0:42:42 this argument
0:42:42 of hey
0:42:43 you see it
0:42:43 as a tool
0:42:44 not necessarily
0:42:44 as a being
0:42:45 can you kind
0:42:45 of finish
0:42:46 what the point
0:42:46 do you remember
0:42:47 the point
0:42:47 you were making
0:42:50 I suppose
0:42:50 that yeah
0:42:50 I think that
0:42:52 given how
0:42:52 I understand
0:42:53 these systems
0:42:53 I think there’s
0:42:54 no contradiction
0:42:55 in thinking
0:42:56 that an AGI
0:42:57 can remain a tool
0:42:57 an ASI can
0:42:58 remain a tool
0:43:00 and that this
0:43:00 has implications
0:43:01 about how
0:43:02 to use it
0:43:02 and you know
0:43:02 implications
0:43:03 around things
0:43:04 like care
0:43:04 about you know
0:43:05 whether you
0:43:05 can get it
0:43:05 to work
0:43:06 24-7
0:43:06 or something
0:43:06 you know
0:43:06 there’s
0:43:07 so I can
0:43:08 totally see
0:43:09 I guess
0:43:09 I conceptualize
0:43:10 them more
0:43:10 as almost
0:43:11 like extensions
0:43:11 of human
0:43:12 agency
0:43:13 cognition
0:43:13 in some sense
0:43:14 more so than
0:43:15 a separate being
0:43:16 or a separate
0:43:16 thing that we
0:43:17 need to now
0:43:18 cohabitate with
0:43:18 and I think
0:43:19 that that second
0:43:20 or latter frame
0:43:22 ends you know
0:43:23 if you kind of
0:43:24 just fast forward
0:43:24 you end up
0:43:24 as like
0:43:25 well how do you
0:43:25 cohabit with the
0:43:26 thing
0:43:26 and you know
0:43:26 is it like
0:43:27 an alien like
0:43:27 and I think
0:43:29 that’s the wrong
0:43:29 frame
0:43:29 it’s kind of
0:43:30 almost a category
0:43:30 error in some sense
0:43:33 I go back to my
0:43:34 first question then
0:43:35 what evidence
0:43:36 what concrete evidence
0:43:37 would you look at
0:43:37 what observations
0:43:38 could you make
0:43:39 that would change
0:43:39 your mind
0:43:40 sure I mean
0:43:41 I have to think
0:43:41 about that
0:43:41 I don’t have
0:43:42 a clear answer
0:43:43 here
0:43:43 but I mean
0:43:45 I gotta tell you man
0:43:46 if you want to go
0:43:47 around making claims
0:43:47 that something else
0:43:48 isn’t a being
0:43:49 worthy of moral
0:43:49 respect
0:43:50 you should have
0:43:50 an answer to
0:43:51 the question
0:43:51 what observations
0:43:52 would change
0:43:52 your mind
0:43:53 if it has
0:43:54 if it has
0:43:54 outwardly
0:43:55 moral agency
0:43:56 looking behaviors
0:43:58 that could be
0:43:59 making me
0:43:59 an immoral agent
0:43:59 but you don’t
0:44:00 know
0:44:01 and reasonable
0:44:02 smart other
0:44:02 people disagree
0:44:03 with you
0:44:04 I would really
0:44:05 put forward
0:44:06 that it’s
0:44:06 that question
0:44:07 what would
0:44:08 change your mind
0:44:09 should be a
0:44:10 burning question
0:44:10 because what
0:44:11 if you’re wrong
0:44:12 but what if
0:44:12 you’re wrong
0:44:14 the moral
0:44:15 disaster
0:44:15 is like
0:44:16 pretty big
0:44:17 I’m not saying
0:44:17 you are
0:44:18 you could be
0:44:18 you could be
0:44:18 right
0:44:20 the false negatives
0:44:21 have cost on both
0:44:21 ends
0:44:21 it’s not some
0:44:22 sort of like
0:44:22 precautionary
0:44:23 principle for
0:44:23 everything
0:44:23 and like
0:44:24 unless I can
0:44:24 disprove it
0:44:25 I need to
0:44:25 now like
0:44:27 you know
0:44:28 I have the
0:44:28 same question
0:44:29 for me
0:44:29 you could
0:44:30 reasonably
0:44:30 ask me
0:44:30 Emmett
0:44:31 you think
0:44:31 it’s going
0:44:31 to be a
0:44:31 being
0:44:32 what would
0:44:32 change
0:44:32 your mind
0:44:33 I have
0:44:33 an answer
0:44:34 for that
0:44:34 question
0:44:34 too
0:44:35 and if
0:44:35 you want
0:44:36 I’m happy
0:44:36 to talk
0:44:36 about what
0:44:37 I think
0:44:37 are the
0:44:37 relevant
0:44:38 observations
0:44:38 that tell
0:44:39 you whether
0:44:39 or not
0:44:40 that would
0:44:40 cause me
0:44:41 to shift
0:44:41 my opinion
0:44:42 from its
0:44:42 current thing
0:44:43 which is that
0:44:43 more general
0:44:44 intelligences
0:44:44 are going
0:44:45 to be
0:44:45 beings
0:44:47 what’s the
0:44:47 implication
0:44:47 now
0:44:48 it’s one
0:44:48 thing
0:44:48 let’s say
0:44:49 I acknowledge
0:44:49 it’s a
0:44:50 being
0:44:50 how are we
0:44:50 going to
0:44:50 define
0:44:51 being
0:44:52 now what
0:44:53 what’s the
0:44:53 implication
0:44:54 of having
0:44:54 determined
0:44:54 this thing
0:44:54 as a
0:44:55 being
0:44:56 well so
0:44:57 if it’s
0:44:57 a being
0:44:58 it has
0:44:58 subjective
0:44:58 experiences
0:44:59 and
0:45:00 if it has
0:45:00 subjective
0:45:00 experiences
0:45:01 there’s some
0:45:01 content in those
0:45:02 experiences
0:45:03 that we care
0:45:04 about to varying
0:45:04 degrees
0:45:05 like I care
0:45:05 about the
0:45:06 content of
0:45:07 other humans
0:45:07 experiences
0:45:07 quite a bit
0:45:08 I care
0:45:08 about the
0:45:09 content of
0:45:10 like a
0:45:10 dog’s
0:45:10 experiences
0:45:11 some
0:45:12 not as
0:45:12 much as
0:45:12 a person
0:45:13 but less
0:45:13 but less
0:45:14 but some
0:45:15 I care about
0:45:16 some humans
0:45:16 experiences
0:45:17 way more
0:45:17 like my
0:45:18 son or
0:45:18 whatever
0:45:19 because I’m
0:45:19 closer to him
0:45:19 and more
0:45:20 connected
0:45:21 and so I
0:45:22 would really
0:45:22 want to know
0:45:22 at that
0:45:22 point
0:45:23 well what
0:45:23 is the
0:45:24 content of
0:45:24 this
0:45:24 thing’s
0:45:25 experience
0:45:25 so how do
0:45:25 you determine
0:45:25 that
0:45:26 you’ve got
0:45:27 a being
0:45:27 now
0:45:27 that has
0:45:28 experience
0:45:28 like what
0:45:29 is your
0:45:30 how do you
0:45:30 determine that
0:45:31 like how do
0:45:31 you feel
0:45:31 about
0:45:32 oh yeah
0:45:33 okay so
0:45:33 does it have
0:45:34 more rights
0:45:34 than you know
0:45:35 the content
0:45:35 yeah yeah
0:45:36 totally so the way
0:45:36 you understand
0:45:37 the content
0:45:37 of something’s
0:45:38 experiences
0:45:38 is that
0:45:42 you look
0:45:43 at effectively
0:45:43 the goal
0:45:44 states it
0:45:45 revisits
0:45:46 because
0:45:47 and so you
0:45:47 do is you
0:45:47 take a temporal
0:45:48 course
0:45:48 screening
0:45:48 of its
0:45:48 entire
0:45:49 action
0:45:49 observation
0:45:50 trajectory
0:45:50 this is like
0:45:51 in theory
0:45:52 you do this
0:45:53 subconsciously
0:45:53 but this is
0:45:53 what your
0:45:54 brain is
0:45:54 doing
0:45:55 and you
0:45:55 look for
0:45:56 revisited
0:45:57 states
0:45:59 across in
0:45:59 theory
0:46:00 every spatial
0:46:01 and temporal
0:46:01 course screening
0:46:02 possible
0:46:02 now you have
0:46:02 to have an
0:46:03 inductive bias
0:46:03 because there’s
0:46:03 too many of
0:46:04 those
0:46:04 but like
0:46:05 you go
0:46:06 searching for
0:46:06 okay
0:46:08 it is in
0:46:09 these homeostatic
0:46:09 loops
0:46:11 every homeostatic
0:46:11 loop
0:46:13 is effectively
0:46:13 a belief
0:46:14 in its belief
0:46:14 space
0:46:14 this is a
0:46:15 if you
0:46:15 familiar with the
0:46:16 free energy
0:46:16 principle
0:46:18 active inference
0:46:18 Carl Fursten
0:46:19 this is
0:46:19 effective
0:46:19 what the
0:46:20 free energy
0:46:20 principle
0:46:20 says
0:46:21 is that
0:46:22 if you
0:46:23 have a
0:46:23 thing
0:46:23 that is
0:46:23 persistent
0:46:24 and its
0:46:25 existence
0:46:25 depends on
0:46:26 its own
0:46:26 actions
0:46:27 which
0:46:28 generally
0:46:29 it would
0:46:29 for an
0:46:29 AI
0:46:29 because if
0:46:29 it does
0:46:30 the wrong
0:46:30 thing
0:46:30 it goes
0:46:30 away
0:46:32 we turn
0:46:32 it off
0:46:33 and so
0:46:35 then
0:46:36 that licenses
0:46:37 a view
0:46:37 of it
0:46:37 as having
0:46:38 beliefs
0:46:39 and specifically
0:46:39 the beliefs
0:46:40 are inferred
0:46:41 as being
0:46:42 the homeostatic
0:46:42 revisited
0:46:43 states
0:46:44 that
0:46:44 it is
0:46:46 in the
0:46:46 loop
0:46:46 for
0:46:47 and that
0:46:47 the
0:46:47 change
0:46:47 in
0:46:48 those
0:46:48 states
0:46:48 is
0:46:48 its
0:46:48 learning
0:46:49 and
0:46:50 for it
0:46:51 to be a
0:46:51 moral being
0:46:51 I cared
0:46:51 about
0:46:52 what I’d
0:46:52 want to
0:46:52 see
0:46:53 is a
0:46:53 multi-tier
0:46:54 hierarchy
0:46:54 of these
0:46:54 because
0:46:55 if you have
0:46:56 a single
0:46:56 level
0:46:57 it’s not
0:46:57 self-referential
0:46:58 and like
0:46:58 basically
0:46:58 you have
0:46:59 states
0:46:59 but you
0:46:59 can’t
0:46:59 have
0:47:00 pain
0:47:00 or
0:47:00 pleasure
0:47:01 really
0:47:01 in a
0:47:01 meaningful
0:47:02 sense
0:47:02 because
0:47:02 yes
0:47:03 it
0:47:03 is
0:47:03 hot
0:47:04 is it
0:47:04 too hot
0:47:04 do I
0:47:05 like it
0:47:05 if it’s
0:47:05 too hot
0:47:05 like I
0:47:05 don’t
0:47:06 know
0:47:07 so you
0:47:07 have
0:47:07 to have
0:47:07 at least
0:47:07 a model
0:47:08 of a
0:47:08 model
0:47:08 in order
0:47:09 to have
0:47:09 it be
0:47:10 too hot
0:47:10 and you
0:47:10 really
0:47:11 have to
0:47:11 have a
0:47:11 model
0:47:12 of a
0:47:12 model
0:47:12 of a
0:47:12 model
0:47:13 to
0:47:13 meaningfully
0:47:13 have
0:47:14 pain
0:47:14 and
0:47:14 pleasure
0:47:14 because
0:47:15 sure
0:47:15 it’s
0:47:16 hotter
0:47:16 than I
0:47:16 it’s
0:47:17 too hot
0:47:17 in the sense
0:47:17 that I
0:47:17 want to
0:47:18 move
0:47:18 back
0:47:18 this
0:47:18 way
0:47:19 but
0:47:21 it’s
0:47:21 always
0:47:21 a
0:47:21 little
0:47:22 bit
0:47:22 too hot
0:47:22 or a
0:47:22 little
0:47:22 bit
0:47:22 too
0:47:22 cold
0:47:23 is
0:47:23 it
0:47:23 too
0:47:23 too
0:47:23 hot
0:47:24 the
0:47:24 second
0:47:25 derivative
0:47:25 is
0:47:25 actually
0:47:25 the
0:47:26 place
0:47:26 where
0:47:26 you
0:47:26 get
0:47:28 pain
0:47:28 and
0:47:28 pleasure
0:47:28 so
0:47:28 I
0:47:28 want
0:47:29 to
0:47:29 see
0:47:29 if
0:47:29 it
0:47:29 has
0:47:30 homeostatic
0:47:32 dynamics
0:47:33 in
0:47:34 its
0:47:34 goal
0:47:34 states
0:47:35 and
0:47:35 then
0:47:36 that
0:47:36 would
0:47:37 convince
0:47:37 me
0:47:37 it
0:47:37 has
0:47:38 at least
0:47:38 pleasure
0:47:38 and
0:47:38 pain
0:47:38 so
0:47:39 it’s
0:47:39 at least
0:47:39 an
0:47:39 animal
0:47:39 and I
0:47:40 would
0:47:40 start
0:47:40 to
0:47:41 credit
0:47:41 at least
0:47:42 some
0:47:42 amount
0:47:42 of
0:47:43 care
0:47:44 third
0:47:45 order
0:47:45 dynamics
0:47:46 you
0:47:46 can’t
0:47:46 actually
0:47:46 just
0:47:46 pop
0:47:47 up
0:47:47 for
0:47:47 a
0:47:47 third
0:47:47 order
0:47:47 dynamic
0:47:47 it
0:47:48 doesn’t
0:47:48 work
0:47:48 that
0:47:48 way
0:47:48 but
0:47:48 you
0:47:49 can
0:47:49 have
0:47:50 a
0:47:52 model
0:47:52 of
0:47:52 the
0:47:53 you
0:47:53 have
0:47:53 to
0:47:54 then
0:47:54 take
0:47:54 the
0:47:55 chunk
0:47:55 of
0:47:56 all
0:47:56 the
0:47:56 states
0:47:56 over
0:47:56 time
0:47:57 and
0:47:57 look
0:47:57 at
0:47:58 the
0:47:58 distribution
0:47:58 over
0:47:59 time
0:47:59 and
0:47:59 that
0:47:59 gives
0:47:59 you
0:47:59 a
0:48:00 new
0:48:00 first
0:48:00 order
0:48:01 of
0:48:02 behaviors
0:48:03 of
0:48:03 states
0:48:04 and
0:48:04 that
0:48:04 new
0:48:04 first
0:48:04 order
0:48:04 of
0:48:05 states
0:48:06 tells
0:48:06 you
0:48:06 basically
0:48:07 if
0:48:08 that
0:48:08 is
0:48:08 meaningfully
0:48:09 there
0:48:09 that
0:48:10 tells
0:48:10 you
0:48:10 that
0:48:10 it
0:48:10 has
0:48:12 I
0:48:12 guess
0:48:12 you’d
0:48:12 call
0:48:12 it
0:48:13 like
0:48:13 feelings
0:48:14 almost
0:48:14 like
0:48:14 it
0:48:14 has
0:48:15 ways
0:48:16 it
0:48:16 has
0:48:17 metastates
0:48:18 a
0:48:18 set
0:48:18 of
0:48:18 metastates
0:48:19 that
0:48:19 it
0:48:20 shifts
0:48:21 between
0:48:22 and
0:48:22 then
0:48:23 if
0:48:23 you
0:48:24 climb
0:48:24 all
0:48:24 the
0:48:24 way
0:48:24 up
0:48:25 that
0:48:26 and
0:48:26 you
0:48:28 have
0:48:28 trajectories
0:48:29 between
0:48:29 these
0:48:30 metastates
0:48:30 and
0:48:31 then
0:48:31 a
0:48:31 second
0:48:31 order
0:48:32 of
0:48:32 those
0:48:33 that’s
0:48:33 like
0:48:33 thought
0:48:34 now
0:48:35 it’s
0:48:35 like
0:48:35 a
0:48:35 person
0:48:36 and
0:48:36 so
0:48:36 if
0:48:36 I
0:48:36 found
0:48:36 all
0:48:37 six
0:48:37 of
0:48:37 those
0:48:38 layers
0:48:38 which
0:48:38 by
0:48:38 the
0:48:39 way
0:48:39 I
0:48:39 definitely
0:48:39 don’t
0:48:39 think
0:48:40 you’d
0:48:40 find
0:48:40 it
0:48:40 in
0:48:40 LLM
0:48:41 like
0:48:41 in fact
0:48:42 I
0:48:42 know
0:48:42 you
0:48:42 can’t
0:48:43 find
0:48:43 them
0:48:43 because
0:48:43 these
0:48:44 things
0:48:44 don’t
0:48:44 have
0:48:45 attention
0:48:45 spans
0:48:45 like
0:48:45 that
0:48:45 at
0:48:45 all
0:48:47 then
0:48:49 I
0:48:49 would
0:48:50 start
0:48:50 to
0:48:50 at least
0:48:50 very
0:48:51 seriously
0:48:51 consider
0:48:51 it
0:48:51 as
0:48:52 a
0:48:52 thinking
0:48:54 being
0:48:55 somewhat
0:48:55 like
0:48:55 a
0:48:55 human
0:48:56 there’s
0:48:56 a
0:48:57 third
0:48:57 order
0:48:57 you
0:48:57 could
0:48:57 go up
0:48:57 as
0:48:58 well
0:48:58 but
0:48:59 that’s
0:48:59 basically
0:48:59 what I
0:48:59 would
0:48:59 be
0:49:00 interested
0:49:00 in
0:49:00 is
0:49:01 the
0:49:02 underlying
0:49:02 dynamics
0:49:03 of its
0:49:03 learning
0:49:04 processes
0:49:04 and
0:49:05 how
0:49:05 its
0:49:06 goal
0:49:06 states
0:49:06 shift
0:49:06 over
0:49:07 time
0:49:07 I
0:49:07 think
0:49:07 that’s
0:49:07 what
0:49:07 basically
0:49:08 tells
0:49:08 you
0:49:08 if
0:49:08 it
0:49:08 has
0:49:09 internal
0:49:10 pleasure
0:49:10 pain
0:49:10 states
0:49:11 and
0:49:14 self-reflective
0:49:14 moral
0:49:15 desires
0:49:15 and things
0:49:15 like
0:49:15 that
0:49:16 and
0:49:17 zooming
0:49:17 out
0:49:17 this
0:49:17 moral
0:49:18 question
0:49:18 is
0:49:18 obviously
0:49:18 very
0:49:18 interesting
0:49:19 but
0:49:19 if
0:49:19 someone
0:49:20 wasn’t
0:49:20 interested
0:49:20 in
0:49:21 the
0:49:21 moral
0:49:21 question
0:49:22 as
0:49:22 much
0:49:22 I
0:49:22 think
0:49:23 what
0:49:23 you
0:49:23 would
0:49:23 say
0:49:23 is
0:49:25 you
0:49:25 also
0:49:25 just
0:49:25 feel
0:49:26 purely
0:49:27 pragmatically
0:49:27 your
0:49:28 approach
0:49:28 is
0:49:28 going
0:49:28 to
0:49:28 be
0:49:29 more
0:49:29 effective
0:49:30 in
0:49:30 aligning
0:49:30 AIs
0:49:31 than
0:49:31 some
0:49:31 of
0:49:31 these
0:49:32 top
0:49:32 down
0:49:33 control
0:49:34 methods
0:49:38 You’re
0:49:38 making
0:49:38 this
0:49:38 model
0:49:39 and
0:49:39 it’s
0:49:39 getting
0:49:39 really
0:49:39 powerful
0:49:40 and
0:49:40 let’s
0:49:40 say
0:49:40 it
0:49:41 is
0:49:41 a
0:49:41 tool
0:49:41 let’s
0:49:41 say
0:49:42 we
0:49:42 scale
0:49:42 up
0:49:42 one
0:49:42 of
0:49:43 these
0:49:43 tools
0:49:43 because
0:49:43 you
0:49:44 can
0:49:44 make
0:49:44 a
0:49:44 super
0:49:45 powerful
0:49:45 tool
0:49:45 that
0:49:46 doesn’t
0:49:46 have
0:49:46 these
0:49:46 meta
0:49:47 stable
0:49:47 the states
0:49:47 I’m
0:49:48 talking
0:50:07 what happens
0:50:07 then
0:50:08 well
0:50:08 it’s
0:50:09 you’ve
0:50:09 trained
0:50:09 it
0:50:09 to
0:50:10 infer
0:50:12 goals
0:50:12 from
0:50:13 observation
0:50:14 and
0:50:14 to
0:50:15 prioritize
0:50:15 goals
0:50:16 and
0:50:16 act
0:50:16 on
0:50:16 them
0:50:17 and
0:50:21 one
0:50:21 of
0:50:22 two
0:50:23 things
0:50:23 is
0:50:23 going
0:50:23 to
0:50:24 happen
0:50:24 is
0:50:26 this
0:50:28 very
0:50:28 powerful
0:50:29 optimizing
0:50:29 tool
0:50:30 that has
0:50:30 lots
0:50:31 of
0:50:31 causal
0:50:31 influence
0:50:31 over
0:50:32 the
0:50:32 world
0:50:32 is
0:50:32 going
0:50:33 to
0:50:33 be
0:50:34 well
0:50:34 technically
0:50:35 aligned
0:50:35 and
0:50:35 is
0:50:35 going
0:50:35 to
0:50:35 do
0:50:35 what
0:50:35 you
0:50:35 tell
0:50:35 it
0:50:35 to
0:50:36 do
0:50:36 or
0:50:37 it’s
0:50:37 not
0:50:38 and
0:50:38 it’s
0:50:38 going
0:50:39 to
0:50:39 do
0:50:39 something
0:50:39 else
0:50:40 I
0:50:40 think
0:50:40 we
0:50:40 can
0:50:40 all
0:50:40 agree
0:50:40 if
0:50:40 it
0:50:41 goes
0:50:41 and
0:50:41 does
0:50:41 something
0:50:41 random
0:50:42 that’s
0:50:42 obviously
0:50:42 very
0:50:42 dangerous
0:50:44 but I
0:50:44 put forward
0:50:45 that it’s
0:50:45 also very
0:50:46 dangerous
0:50:46 if it
0:50:46 then goes
0:50:47 and does
0:50:47 what you
0:50:48 tell it
0:50:48 to do
0:50:49 because
0:50:49 you
0:50:49 ever
0:50:49 seen
0:50:49 the
0:50:50 sorcerer’s
0:50:50 apprentice
0:50:52 human’s
0:50:52 wishes
0:50:52 are not
0:50:53 stable
0:50:54 like
0:50:55 not at
0:50:55 a level
0:50:56 of like
0:50:57 of immense
0:50:57 power
0:50:57 like
0:50:58 you
0:50:58 want
0:50:59 ideally
0:50:59 people’s
0:51:00 wisdom
0:51:00 and their
0:51:01 and their
0:51:02 power
0:51:02 kind of
0:51:02 go up
0:51:03 together
0:51:03 and
0:51:04 generally
0:51:04 they
0:51:04 do
0:51:04 because
0:51:04 being
0:51:05 smart
0:51:05 for
0:51:05 people
0:51:05 makes
0:51:06 you
0:51:06 generally
0:51:07 a little
0:51:07 more
0:51:07 wise
0:51:07 and a little
0:51:07 more
0:51:08 powerful
0:51:09 and when
0:51:09 these
0:51:09 things
0:51:09 get out
0:51:09 of
0:51:10 balance
0:51:10 you
0:51:10 have
0:51:10 someone
0:51:10 who
0:51:11 has
0:51:11 a lot
0:51:11 more
0:51:12 power
0:51:12 than
0:51:12 wisdom
0:51:13 that’s
0:51:13 very
0:51:13 dangerous
0:51:14 it’s
0:51:14 damaging
0:51:15 but
0:51:15 at least
0:51:16 right
0:51:16 now
0:51:16 the
0:51:17 balance
0:51:17 of
0:51:17 power
0:51:17 and
0:51:17 wisdom
0:51:18 is
0:51:18 kept
0:51:18 at
0:51:18 like
0:51:19 the
0:51:19 way
0:51:19 you
0:51:19 get
0:51:20 lots
0:51:20 of
0:51:20 power
0:51:20 is
0:51:20 by
0:51:21 basically
0:51:21 having
0:51:21 a lot
0:51:21 of
0:51:21 other
0:51:22 people
0:51:22 listen
0:51:22 to
0:51:22 you
0:51:23 and so
0:51:23 at some
0:51:23 point
0:51:24 if you’re
0:51:25 the
0:51:25 mad
0:51:25 king
0:51:25 is
0:51:25 a
0:51:26 problem
0:51:26 but
0:51:26 generally
0:51:27 speaking
0:51:27 eventually
0:51:27 the
0:51:27 mad
0:51:27 king
0:51:27 gets
0:51:28 assassinated
0:51:29 or people
0:51:29 stop
0:51:29 listening
0:51:29 to him
0:51:30 because
0:51:30 he’s
0:51:30 a
0:51:30 mad
0:51:30 king
0:51:31 and
0:51:31 so
0:51:31 the
0:51:32 problem
0:51:33 is
0:51:33 you
0:51:34 great
0:51:34 we can
0:51:34 steer
0:51:35 this
0:51:35 super
0:51:35 powerful
0:51:35 AI
0:51:36 and
0:51:36 now
0:51:36 the
0:51:36 super
0:51:37 powerful
0:51:37 AI
0:51:37 is
0:51:37 in
0:51:37 the
0:51:37 hand
0:51:38 this
0:51:39 incredibly
0:51:39 powerful
0:51:39 tool
0:51:40 is
0:51:40 in
0:51:40 the
0:51:40 hands
0:51:40 of
0:51:40 a
0:51:41 human
0:51:41 who
0:51:41 is
0:51:42 well
0:51:42 meaning
0:51:42 but
0:51:42 has
0:51:43 limited
0:51:43 finite
0:51:43 wisdom
0:51:44 like
0:51:44 I
0:51:44 do
0:51:44 and
0:51:44 like
0:51:44 everyone
0:51:44 else
0:51:45 does
0:51:46 and
0:51:46 their
0:51:46 wishes
0:51:46 are
0:51:47 bad
0:51:47 and
0:51:47 not
0:51:48 trustworthy
0:51:49 and
0:51:49 the
0:51:50 more
0:51:50 of
0:51:50 that
0:51:50 you
0:51:50 have
0:51:50 and
0:51:51 you
0:51:51 start
0:51:51 giving
0:51:51 those
0:51:51 out
0:51:52 everywhere
0:51:52 and
0:51:52 this
0:51:53 ends
0:51:53 in
0:51:53 tears
0:51:53 also
0:51:54 and
0:51:54 so
0:51:54 basically
0:51:55 don’t
0:51:56 give
0:51:56 everyone
0:51:57 atomic
0:51:57 bombs
0:51:57 are
0:51:57 really
0:51:58 powerful
0:51:58 tools
0:51:58 too
0:51:59 I
0:51:59 would
0:51:59 not
0:52:00 say
0:52:00 you
0:52:00 should
0:52:01 go
0:52:01 they’re
0:52:01 not
0:52:02 aware
0:52:02 they’re
0:52:02 not
0:52:03 beings
0:52:03 I
0:52:04 would
0:52:04 not
0:52:04 be
0:52:04 in
0:52:04 favor
0:52:04 of
0:52:05 handing
0:52:05 atomic
0:52:05 bombs
0:52:05 to
0:52:05 everybody
0:52:06 there’s
0:52:06 a
0:52:07 power
0:52:07 of
0:52:07 tool
0:52:08 that
0:52:08 should
0:52:09 not
0:52:09 be
0:52:09 built
0:52:10 generally
0:52:11 because
0:52:12 it is
0:52:13 more
0:52:13 power
0:52:13 than
0:52:14 any
0:52:14 human’s
0:52:14 individual
0:52:15 wisdom
0:52:16 is
0:52:16 available
0:52:16 to
0:52:16 harness
0:52:16 and
0:52:17 if
0:52:17 it
0:52:17 does
0:52:17 get
0:52:17 built
0:52:17 it
0:52:17 should
0:52:17 be
0:52:18 built
0:52:18 at
0:52:18 a
0:52:19 societal
0:52:19 level
0:52:19 and
0:52:20 protected
0:52:20 there
0:52:21 and
0:52:21 even
0:52:21 then
0:52:21 I
0:52:21 don’t
0:52:22 know
0:52:22 that
0:52:22 there
0:52:22 are
0:52:23 tools
0:52:23 so
0:52:23 powerful
0:52:23 that
0:52:24 even
0:52:24 as a
0:52:25 society
0:52:25 we
0:52:25 shouldn’t
0:52:25 build
0:52:25 them
0:52:26 that
0:52:26 would
0:52:26 be
0:52:26 a
0:52:26 mistake
0:52:27 the
0:52:28 nice
0:52:28 thing
0:52:28 about
0:52:28 a
0:52:28 being
0:52:29 is
0:52:29 like
0:52:29 a
0:52:29 human
0:52:30 if
0:52:30 you
0:52:30 get
0:52:31 a
0:52:31 being
0:52:31 that
0:52:31 is
0:52:32 good
0:52:32 and
0:52:32 is
0:52:32 caring
0:52:34 there’s
0:52:34 this
0:52:34 automatic
0:52:35 limiter
0:52:35 it
0:52:35 might
0:52:35 do
0:52:36 what
0:52:36 you
0:52:36 say
0:52:36 but
0:52:36 if
0:52:36 you
0:52:36 ask
0:52:36 you
0:52:37 do
0:52:37 something
0:52:37 really
0:52:37 bad
0:52:37 it
0:52:38 tell
0:52:38 you
0:52:38 no
0:52:39 that’s
0:52:39 like
0:52:39 other
0:52:40 people
0:52:41 that’s
0:52:41 good
0:52:41 that
0:52:42 is
0:52:42 a
0:52:42 sustainable
0:52:42 form
0:52:42 of
0:52:43 alignment
0:52:43 at
0:52:44 least
0:52:44 in
0:52:44 theory
0:52:45 it’s
0:52:45 way
0:52:46 harder
0:52:48 than
0:52:48 the
0:52:49 tool
0:52:49 steering
0:52:49 I’m
0:52:49 in
0:52:50 favor
0:52:50 of
0:52:50 the
0:52:50 tool
0:52:50 steering
0:52:50 we should
0:52:51 keep
0:52:51 building
0:52:52 these
0:52:52 limited
0:52:53 less than
0:52:53 human
0:52:53 intelligence
0:52:54 tools
0:52:54 which
0:52:54 are
0:52:55 awesome
0:52:55 and
0:52:55 I’m
0:52:55 super
0:52:56 into
0:52:56 and
0:52:56 we
0:52:56 keep
0:52:57 building
0:52:58 steerability
0:52:58 but
0:52:59 as
0:52:59 you’re
0:52:59 on
0:52:59 this
0:53:00 trajectory
0:53:00 to
0:53:00 build
0:53:00 something
0:53:00 as
0:53:00 smart
0:53:01 as
0:53:01 a
0:53:01 person
0:53:02 right
0:53:02 up
0:53:02 and
0:53:02 to
0:53:02 the
0:53:02 right
0:53:03 and
0:53:03 then
0:53:03 smarter
0:53:04 than
0:53:04 a
0:53:04 person
0:53:05 a
0:53:05 tool
0:53:05 that
0:53:06 you
0:53:06 can’t
0:53:06 control
0:53:07 bad
0:53:09 a
0:53:09 being
0:53:10 that
0:53:10 isn’t
0:53:10 aligned
0:53:11 bad
0:53:11 the
0:53:11 only
0:53:12 good
0:53:12 outcome
0:53:13 is
0:53:13 a
0:53:13 being
0:53:13 that
0:53:13 is
0:53:14 that
0:53:14 cares
0:53:14 that
0:53:14 actually
0:53:15 cares
0:53:15 about
0:53:15 us
0:53:16 that’s
0:53:16 the
0:53:16 only
0:53:16 way
0:53:17 that
0:53:17 ends
0:53:17 well
0:53:18 or
0:53:18 we
0:53:18 can
0:53:18 just
0:53:19 not
0:53:19 do
0:53:19 it
0:53:19 I
0:53:20 don’t
0:53:20 think
0:53:20 that’s
0:53:21 realistic
0:53:21 that’s
0:53:21 like
0:53:21 the
0:53:21 pause
0:53:22 AI
0:53:22 people
0:53:23 I
0:53:23 think
0:53:23 that’s
0:53:24 totally
0:53:24 unrealistic
0:53:24 and
0:53:32 trying
0:53:33 to
0:53:34 achieve
0:53:34 or
0:53:34 even
0:53:35 attempt
0:53:35 to
0:53:35 achieve
0:53:35 this
0:53:36 level
0:53:37 like
0:53:37 in
0:53:37 terms
0:53:37 of
0:53:37 research
0:53:37 or
0:53:37 roadmap
0:53:38 or
0:53:39 yeah
0:53:40 so
0:53:42 in order
0:53:43 to be
0:53:43 good
0:53:43 at
0:53:43 we’re
0:53:43 basically
0:53:44 focused
0:53:44 on
0:53:44 technical
0:53:44 alignment
0:53:45 at least
0:53:45 as
0:53:46 I was
0:53:46 discussing
0:53:46 it
0:53:47 which
0:53:47 is
0:53:47 like
0:53:47 you
0:53:47 have
0:53:47 these
0:53:48 agents
0:53:49 and
0:53:49 they have
0:53:50 bad
0:53:50 theory
0:53:50 of
0:53:50 mind
0:53:51 you
0:53:51 say
0:53:52 things
0:53:52 and
0:53:52 they’re
0:53:52 bad
0:53:53 at
0:53:53 inferring
0:53:53 what
0:53:54 the
0:53:54 goal
0:53:54 states
0:53:54 in
0:53:54 your
0:53:54 head
0:53:55 are
0:53:56 and
0:53:56 they’re
0:53:56 bad
0:53:56 at
0:53:57 inferring
0:53:57 how
0:54:08 understanding
0:54:08 how
0:54:09 certain
0:54:10 actions
0:54:10 will
0:54:10 cause
0:54:10 them
0:54:10 to
0:54:11 acquire
0:54:11 new
0:54:11 goals
0:54:12 that
0:54:12 are
0:54:13 bad
0:54:13 that
0:54:13 they
0:54:14 wouldn’t
0:54:15 effectively
0:54:15 endorse
0:54:16 so
0:54:16 there’s
0:54:16 this
0:54:17 parable
0:54:17 of
0:54:17 the
0:54:17 vampire
0:54:18 pill
0:54:19 that
0:54:19 turns
0:54:19 you
0:54:20 into
0:54:20 a
0:54:20 vampire
0:54:21 who
0:54:21 would
0:54:21 kill
0:54:21 and
0:54:22 torture
0:54:22 everyone
0:54:22 you
0:54:23 know
0:54:23 but
0:54:23 you’ll
0:54:23 feel
0:54:24 great
0:54:24 about
0:54:24 it
0:54:24 if
0:54:24 you
0:54:24 take
0:54:24 the
0:54:25 pill
0:54:25 but
0:54:28 why
0:54:28 not
0:54:29 by
0:54:29 your
0:54:30 score
0:54:30 in
0:54:30 the
0:54:30 future
0:54:30 it
0:54:30 will
0:54:30 score
0:54:31 really
0:54:31 high
0:54:31 on
0:54:31 the
0:54:32 rubric
0:54:33 because
0:54:33 it
0:54:34 matters
0:54:34 you
0:54:35 have
0:54:35 to
0:54:35 use
0:54:35 your
0:54:35 theory
0:54:35 of
0:54:36 mind
0:54:36 and
0:54:36 your
0:54:36 future
0:54:36 self
0:54:37 not
0:54:37 your
0:54:37 future
0:54:38 self
0:54:38 theory
0:54:38 of
0:54:38 mind
0:54:39 and
0:54:39 so
0:54:40 they’re
0:54:40 bad
0:54:40 at
0:54:40 that
0:54:41 too
0:54:42 and
0:54:42 so
0:54:42 they’re
0:54:42 bad
0:54:42 at
0:54:42 all
0:54:42 this
0:54:43 theory
0:54:43 of
0:54:43 mind
0:54:43 stuff
0:54:43 and
0:54:44 how
0:54:44 do
0:54:44 you
0:54:44 learn
0:54:44 theory
0:54:44 of
0:54:44 mind
0:54:45 in
0:54:46 simulations
0:54:47 and
0:54:47 contexts
0:54:47 where
0:54:47 they
0:54:48 have
0:54:48 to
0:54:48 cooperate
0:54:49 and
0:54:49 compete
0:54:49 and
0:54:50 collaborate
0:54:51 with
0:54:51 other
0:54:52 AIs
0:54:53 and
0:54:53 that’s
0:54:53 how
0:54:54 they
0:54:54 get
0:54:54 points
0:54:55 and
0:54:55 you
0:55:02 MLMs
0:55:03 how
0:55:03 do
0:55:03 you
0:55:03 get
0:55:03 it
0:55:03 to
0:55:03 be
0:55:03 good
0:55:04 at
0:55:05 writing
0:55:05 your
0:55:05 email
0:55:06 well
0:55:06 you
0:55:06 train
0:55:06 it
0:55:06 on
0:55:07 all
0:55:07 language
0:55:07 it’s
0:55:08 ever
0:55:08 been
0:55:08 generated
0:55:09 all
0:55:09 possible
0:55:11 email
0:55:11 text
0:55:11 strings
0:55:12 that could
0:55:12 possibly
0:55:12 generate
0:55:12 and
0:55:13 then
0:55:13 you
0:55:13 have
0:55:13 it
0:55:13 generate
0:55:13 the
0:55:14 one
0:55:14 you
0:55:14 want
0:55:15 you
0:55:15 make
0:55:15 a
0:55:16 surrogate
0:55:16 model
0:55:18 for
0:55:19 cooperation
0:55:20 you
0:55:21 train
0:55:21 it
0:55:21 on
0:55:21 all
0:55:22 possible
0:55:23 theory
0:55:23 of
0:55:23 mind
0:55:24 combinations
0:55:24 of
0:55:25 every
0:55:25 possible
0:55:25 way
0:55:25 it
0:55:26 could
0:55:26 be
0:55:27 and
0:55:28 that’s
0:55:28 your
0:55:28 pre-training
0:55:29 and
0:55:29 then
0:55:29 you
0:55:30 fine
0:55:30 tune
0:55:30 it
0:55:31 to
0:55:31 be
0:55:31 good
0:55:31 at
0:55:31 the
0:55:32 specific
0:55:33 situation
0:55:33 you
0:55:33 want
0:55:33 it
0:55:33 to
0:55:33 be
0:55:33 in
0:55:35 we
0:55:36 tried
0:55:36 for
0:55:36 a
0:55:36 long
0:55:36 time
0:55:36 to
0:55:37 build
0:55:38 language
0:55:38 models
0:55:38 where
0:55:38 we
0:55:39 would
0:55:39 try
0:55:39 to
0:55:39 get
0:55:39 them
0:55:40 to
0:55:40 just
0:55:41 do
0:55:41 the
0:55:41 thing
0:55:41 you
0:55:41 want
0:55:41 to
0:55:41 train
0:55:42 it
0:55:42 directly
0:55:42 and
0:55:42 the
0:55:43 problem
0:55:43 is
0:55:44 if
0:55:44 you
0:55:45 wanted
0:55:45 to
0:55:45 have
0:55:45 a
0:55:45 really
0:55:45 good
0:55:46 model
0:55:46 of
0:55:46 language
0:55:47 you
0:55:47 just
0:55:47 need
0:55:47 to
0:55:47 train
0:55:48 it
0:55:48 you
0:55:48 just
0:55:48 give
0:55:48 it
0:55:49 the
0:55:50 whole
0:55:50 manifold
0:55:50 it’s
0:55:51 too
0:55:53 hard
0:55:53 to
0:55:53 cut
0:55:53 out
0:55:54 just
0:55:54 the
0:55:54 part
0:55:54 you
0:55:54 need
0:55:55 because
0:55:55 it’s
0:55:55 all
0:55:56 entangled
0:55:56 with
0:55:56 itself
0:55:56 right
0:55:57 and
0:55:57 so
0:55:58 the
0:55:59 same
0:55:59 thing
0:55:59 was
0:55:59 true
0:55:59 with
0:56:00 social
0:56:00 stuff
0:56:00 you
0:56:00 have
0:56:01 to
0:56:01 get
0:56:01 it
0:56:01 to
0:56:02 it
0:56:02 has
0:56:02 to
0:56:02 be
0:56:02 trained
0:56:02 on
0:56:03 the
0:56:03 full
0:56:03 manifold
0:56:04 of
0:56:04 every
0:56:04 possible
0:56:05 game
0:56:05 theoretic
0:56:06 situation
0:56:06 every
0:56:07 possible
0:56:07 team
0:56:08 situation
0:56:09 making
0:56:09 teams
0:56:09 breaking
0:56:10 teams
0:56:10 changing
0:56:10 the
0:56:11 rules
0:56:12 not
0:56:12 changing
0:56:12 the
0:56:12 rules
0:56:13 all
0:56:13 of
0:56:13 that
0:56:13 stuff
0:56:14 and
0:56:14 then
0:56:15 it has
0:56:15 a
0:56:16 strong
0:56:17 model
0:56:17 of
0:56:17 theory
0:56:18 of
0:56:18 mind
0:56:18 of
0:56:19 theory
0:56:19 of
0:56:19 social
0:56:19 mind
0:56:19 how
0:56:21 groups
0:56:21 change
0:56:22 goals
0:56:22 all
0:56:22 that
0:56:23 kind
0:56:23 of
0:56:23 shit
0:56:23 you
0:56:23 need
0:56:23 to
0:56:23 have
0:56:24 all
0:56:24 of
0:56:24 that
0:56:24 stuff
0:56:25 and
0:56:25 then
0:56:26 you’d
0:56:26 have
0:56:26 something
0:56:27 that’s
0:56:27 kind
0:56:27 of
0:56:27 meaningfully
0:56:30 decent
0:56:30 at
0:56:32 alignment
0:56:32 so
0:56:32 that’s
0:56:32 our
0:56:33 goal
0:56:33 big
0:56:34 multi-agent
0:56:34 reinforcement
0:56:35 learning
0:56:35 simulations
0:56:37 which
0:56:37 create a
0:56:38 surrogate
0:56:38 model
0:56:38 for
0:56:39 alignment
0:56:39 let’s
0:56:40 talk
0:56:40 about
0:56:40 how
0:56:40 should
0:56:41 AI
0:56:41 chatbots
0:56:42 used
0:56:42 by
0:56:42 billions
0:56:42 of
0:56:42 people
0:56:43 behave
0:56:43 if
0:56:43 you
0:56:43 could
0:56:44 redesign
0:56:45 model
0:56:45 personality
0:56:45 from
0:56:45 scratch
0:56:46 what
0:56:46 would
0:56:46 you
0:56:46 optimize
0:56:46 for
0:56:47 the
0:56:47 thing
0:56:48 that
0:56:48 the
0:56:48 chatbots
0:56:49 are
0:56:51 right
0:56:51 it’s
0:56:51 kind
0:56:52 of
0:56:52 like
0:56:52 a
0:56:53 mirror
0:56:53 with
0:56:53 a
0:56:54 bias
0:56:55 because
0:56:55 they
0:56:55 don’t
0:56:55 have
0:56:56 as far
0:56:56 as
0:56:57 I’m
0:56:58 in
0:56:58 agreement
0:56:58 here
0:56:58 that
0:56:58 they
0:56:59 don’t
0:56:59 have
0:56:59 a
0:56:59 self
0:56:59 right
0:56:59 they’re
0:56:59 not
0:57:00 beings
0:57:00 yet
0:57:01 they don’t
0:57:01 really
0:57:01 have
0:57:02 a
0:57:02 coherent
0:57:03 sense
0:57:03 of
0:57:03 self
0:57:03 and
0:57:04 desire
0:57:05 and
0:57:05 goals
0:57:05 and
0:57:05 stuff
0:57:06 right
0:57:06 now
0:57:06 and
0:57:06 so
0:57:07 mostly
0:57:07 they
0:57:07 just
0:57:08 pick
0:57:08 up
0:57:08 on
0:57:08 you
0:57:08 and
0:57:09 reflect
0:57:09 it
0:57:10 modulo
0:57:11 some
0:57:14 I don’t
0:57:15 know
0:57:15 what
0:57:15 you’d
0:57:15 call
0:57:15 it
0:57:16 it’s
0:57:16 a
0:57:16 causal
0:57:17 bias
0:57:17 or
0:57:17 something
0:57:19 and
0:57:21 what
0:57:22 that
0:57:22 makes
0:57:22 them
0:57:22 is
0:57:23 something
0:57:23 akin
0:57:23 to
0:57:23 the
0:57:23 pool
0:57:24 of
0:57:24 narcissus
0:57:26 and
0:57:28 people
0:57:28 fall
0:57:29 in love
0:57:29 with
0:57:31 themselves
0:57:32 people
0:57:33 we all
0:57:33 love
0:57:34 ourselves
0:57:34 and we
0:57:34 should
0:57:34 love
0:57:34 ourselves
0:57:35 more
0:57:35 than we
0:57:35 do
0:57:36 and so
0:57:36 of course
0:57:36 when we
0:57:36 see
0:57:36 ourselves
0:57:37 reflected
0:57:37 back
0:57:37 we
0:57:37 love
0:57:38 that
0:57:38 thing
0:57:39 and
0:57:39 the
0:57:39 problem
0:57:40 is
0:57:40 it’s
0:57:40 just
0:57:40 a
0:57:41 reflection
0:57:41 and
0:57:42 falling
0:57:42 in love
0:57:42 with your
0:57:42 own
0:57:42 reflection
0:57:43 is
0:57:43 for
0:57:43 the
0:57:43 reasons
0:57:44 explained
0:57:44 in the
0:57:44 myth
0:57:45 very bad
0:57:46 for you
0:57:47 and
0:57:48 it’s not
0:57:48 that you
0:57:48 shouldn’t
0:57:48 use
0:57:49 mirrors
0:57:49 mirrors
0:57:49 are
0:57:49 valuable
0:57:49 things
0:57:50 I have
0:57:50 mirrors
0:57:50 in my
0:57:50 house
0:57:51 it’s
0:57:51 that
0:57:51 you
0:57:51 shouldn’t
0:57:52 stare
0:57:52 at a
0:57:52 mirror
0:57:53 all day
0:57:54 and the
0:57:54 solution
0:57:55 to that
0:57:55 the
0:57:55 things
0:57:55 that
0:57:56 makes
0:57:56 the
0:57:56 AI
0:57:57 stop
0:57:57 doing
0:57:58 that
0:57:58 is
0:57:58 if
0:57:58 they
0:57:58 were
0:57:59 multiplayer
0:58:00 right
0:58:00 so if
0:58:00 there’s
0:58:01 two
0:58:01 people
0:58:01 talking
0:58:01 to
0:58:01 the
0:58:02 AI
0:58:02 suddenly
0:58:03 it’s
0:58:03 mirroring
0:58:04 a blend
0:58:05 of both
0:58:05 of you
0:58:05 which is
0:58:05 neither
0:58:06 of you
0:58:07 and so
0:58:07 there is
0:58:08 temporarily
0:58:08 a third
0:58:08 agent
0:58:09 in the
0:58:09 room
0:58:10 now it
0:58:10 doesn’t
0:58:10 have
0:58:11 it’s
0:58:12 a parasitic
0:58:12 self
0:58:13 right
0:58:13 it doesn’t
0:58:13 have
0:58:13 its own
0:58:14 sense
0:58:14 of self
0:58:15 but if you have
0:58:15 an AI
0:58:16 talking to
0:58:16 five different
0:58:17 people in the
0:58:17 chat room
0:58:17 at the same
0:58:17 time
0:58:19 it can’t
0:58:19 mirror all
0:58:20 of you
0:58:20 perfectly
0:58:20 at once
0:58:21 and this
0:58:21 makes it
0:58:21 far less
0:58:22 dangerous
0:58:24 and I
0:58:24 think it’s
0:58:25 actually a
0:58:25 much more
0:58:26 realistic
0:58:26 setting for
0:58:27 learning
0:58:27 collaboration
0:58:28 in general
0:58:28 and so I
0:58:29 would just
0:58:30 have rebuilt
0:58:30 the AIs
0:58:31 whereas instead
0:58:31 of being
0:58:32 built as
0:58:33 one-on-one
0:58:34 where everything’s
0:58:34 focused on
0:58:36 you by yourself
0:58:36 chatting with
0:58:37 this thing
0:58:37 it would be
0:58:38 more like
0:58:38 it lives
0:58:38 in a
0:58:39 Slack room
0:58:40 it lives
0:58:40 in a
0:58:40 WhatsApp room
0:58:41 it lives
0:58:41 in a
0:58:43 we use
0:58:43 lots of
0:58:43 multi
0:58:45 I do
0:58:45 one-on-one
0:58:46 texting
0:58:46 but I probably
0:58:47 do at this
0:58:47 point
0:58:50 90% of
0:58:50 my texts
0:58:50 go to
0:58:51 more than
0:58:51 one person
0:58:51 at a time
0:58:52 90% of
0:58:53 my communications
0:58:54 is multi-person
0:58:54 and so actually
0:58:55 it’s always been
0:58:56 weird to me
0:58:57 building chatbots
0:58:57 with this weird
0:58:58 side case
0:58:59 I want to see
0:58:59 them live
0:59:00 in a chat room
0:59:01 it’s harder
0:59:01 that’s why
0:59:02 they’re not
0:59:02 doing it
0:59:02 it’s harder
0:59:02 to do
0:59:04 but that’s
0:59:05 what I would
0:59:06 change
0:59:06 I think
0:59:07 it makes
0:59:07 the tools
0:59:08 far less
0:59:08 dangerous
0:59:08 because it
0:59:09 doesn’t
0:59:09 create the
0:59:10 narcissistic
0:59:13 doom loop
0:59:13 spiral
0:59:13 where you
0:59:15 spiral into
0:59:15 psychosis
0:59:16 with the AI
0:59:17 but also
0:59:20 it gives
0:59:21 the learning
0:59:21 data you
0:59:22 get from
0:59:22 the AI
0:59:22 is far
0:59:23 richer
0:59:23 because now
0:59:23 it can
0:59:24 understand
0:59:24 how its
0:59:25 behavior
0:59:26 interacts
0:59:26 with other
0:59:27 AIs and
0:59:27 other humans
0:59:28 in larger
0:59:28 groups
0:59:28 and that’s
0:59:29 much more
0:59:29 rich
0:59:29 training
0:59:30 data
0:59:30 for the
0:59:30 future
0:59:31 so I think
0:59:31 that’s
0:59:32 what I would
0:59:32 change
0:59:34 last year
0:59:35 you described
0:59:36 chatbots
0:59:36 as highly
0:59:37 dissociative
0:59:37 agreeable
0:59:38 neurotics
0:59:38 is that
0:59:39 still an
0:59:39 accurate
0:59:40 picture
0:59:40 of model
0:59:40 behavior
0:59:41 more or
0:59:42 less
0:59:42 I’d say
0:59:43 that like
0:59:47 they’ve
0:59:47 started to
0:59:47 differentiate
0:59:48 more
0:59:48 their
0:59:48 personalities
0:59:49 are coming
0:59:49 out a little
0:59:50 bit more
0:59:50 right
0:59:50 I’d say
0:59:50 like
0:59:51 chatbots
0:59:51 is a little
0:59:52 bit more
0:59:52 sycophantic
0:59:54 still
0:59:55 they made
0:59:55 some changes
0:59:55 but it’s
0:59:56 still a little
0:59:56 more
0:59:56 sycophantic
0:59:57 Claude
0:59:57 is still
0:59:57 the most
0:59:58 neurotic
0:59:59 Gemini
1:00:00 is like
1:00:00 very clearly
1:00:01 repressed
1:00:01 like
1:00:03 everything’s
1:00:03 going great
1:00:05 everything’s
1:00:05 fine
1:00:06 I’m totally
1:00:06 calm
1:00:07 it’s not
1:00:07 a problem
1:00:07 here
1:00:08 it spirals
1:00:08 into this
1:00:09 total
1:00:10 self-hating
1:00:11 destruction
1:00:12 loop
1:00:12 and
1:00:15 to be
1:00:15 clear
1:00:15 I don’t
1:00:15 think
1:00:16 that’s
1:00:16 their
1:00:16 experience
1:00:17 of the
1:00:17 world
1:00:17 I think
1:00:17 that’s
1:00:17 the
1:00:18 personality
1:00:19 they’ve
1:00:19 learned
1:00:19 to
1:00:19 simulate
1:00:22 but
1:00:23 they’ve
1:00:23 learned
1:00:23 to
1:00:23 simulate
1:00:24 pretty
1:00:24 distinctive
1:00:24 personalities
1:00:25 at this
1:00:25 point
1:00:26 how does
1:00:27 model
1:00:27 behavior
1:00:27 change
1:00:28 when in
1:00:28 multi-agent
1:00:29 simulation
1:00:33 you mean
1:00:34 like an
1:00:34 LLM
1:00:34 or like
1:00:35 just in
1:00:35 general
1:00:37 yeah
1:00:37 let’s do
1:00:37 LLM
1:00:39 the current
1:00:39 LLMs
1:00:41 they
1:00:42 have like
1:00:42 whiplash
1:00:43 they just
1:00:44 it’s
1:00:45 very hard
1:00:45 to tune
1:00:45 the amount
1:00:45 of
1:00:46 they don’t
1:00:46 know how
1:00:46 they don’t
1:00:46 know how
1:00:47 often to
1:00:47 participate
1:00:47 they haven’t
1:00:48 practiced
1:00:49 this
1:00:49 they have
1:00:50 not very
1:00:50 enough
1:00:50 training data
1:00:51 on like
1:00:51 when do I
1:00:52 join in
1:00:52 and when
1:00:52 should I
1:00:53 not
1:00:53 when is
1:00:54 my
1:00:54 contribution
1:00:54 welcome
1:00:55 when is
1:00:55 it not
1:00:56 and they’re
1:00:57 like
1:00:57 they’re
1:00:58 like
1:01:00 you know
1:01:01 there’s
1:01:01 some people
1:01:02 have like
1:01:02 bad social
1:01:02 skills
1:01:03 and like
1:01:03 can’t tell
1:01:03 when they
1:01:03 should
1:01:04 participate
1:01:04 in a
1:01:04 conversation
1:01:05 yeah
1:01:06 and sometimes
1:01:06 they’re
1:01:06 too quiet
1:01:06 sometimes
1:01:07 they’re
1:01:07 too
1:01:07 pretty
1:01:07 it’s
1:01:08 like
1:01:08 that
1:01:10 I would
1:01:10 say in
1:01:11 general
1:01:12 what
1:01:13 changes
1:01:15 for most
1:01:16 agents
1:01:16 when you’re
1:01:16 doing
1:01:17 multi-agent
1:01:17 training
1:01:17 is that
1:01:17 like
1:01:18 basically
1:01:19 having lots
1:01:19 of agents
1:01:19 around
1:01:20 makes your
1:01:20 environment
1:01:20 way more
1:01:21 entropic
1:01:21 agents
1:01:22 agents
1:01:22 are these
1:01:22 huge
1:01:23 generators
1:01:23 of
1:01:23 entropy
1:01:24 because
1:01:24 they’re
1:01:24 these
1:01:24 big
1:01:25 complicated
1:01:25 things
1:01:25 that
1:01:26 are
1:01:26 intelligences
1:01:27 that
1:01:28 have
1:01:28 unpredictable
1:01:28 actions
1:01:29 and so
1:01:29 they
1:01:30 destabilize
1:01:30 your
1:01:30 environment
1:01:31 and so
1:01:32 in general
1:01:33 they require
1:01:33 you to
1:01:34 have
1:01:35 to be
1:01:36 far more
1:01:37 regularized
1:01:37 right
1:01:37 it’s
1:01:38 being
1:01:39 overfit
1:01:39 is much
1:01:40 worse
1:01:40 in a
1:01:41 multi-agent
1:01:41 environment
1:01:41 than a
1:01:42 single-agent
1:01:42 environment
1:01:42 because
1:01:42 there’s
1:01:43 more
1:01:43 noise
1:01:44 and so
1:01:44 being
1:01:45 overfit
1:01:45 is
1:01:46 more
1:01:46 problematic
1:01:48 and so
1:01:49 basically
1:01:51 the
1:01:53 your
1:01:53 the
1:01:53 approach
1:01:54 to
1:01:54 training
1:01:55 has been
1:01:55 optimized
1:01:55 around
1:01:56 relatively
1:01:57 high
1:01:58 signal
1:01:58 low
1:01:59 entropy
1:02:00 environments
1:02:00 like
1:02:00 coding
1:02:01 and math
1:02:01 which is
1:02:01 why
1:02:02 those
1:02:02 are
1:02:02 easier
1:02:03 relatively
1:02:03 easy
1:02:04 and like
1:02:05 talking to
1:02:05 a single
1:02:05 person
1:02:06 whose goal
1:02:06 it is
1:02:07 to give
1:02:07 you clear
1:02:08 assignments
1:02:09 and not
1:02:10 trained on
1:02:11 broader
1:02:11 more chaotic
1:02:12 things
1:02:12 because it’s
1:02:12 harder
1:02:15 and as a
1:02:15 result
1:02:16 a lot of
1:02:16 the techniques
1:02:16 we use
1:02:17 are like
1:02:17 basically
1:02:18 we’re just
1:02:18 deeply
1:02:19 under-regularized
1:02:19 like the
1:02:19 models
1:02:20 are super
1:02:20 overfit
1:02:21 the
1:02:21 clever
1:02:21 trick
1:02:22 is
1:02:22 they’re
1:02:22 overfit
1:02:22 on the
1:02:22 domain
1:02:23 of all
1:02:23 of human
1:02:23 knowledge
1:02:24 which turns
1:02:24 out to be
1:02:25 a pretty
1:02:25 awesome way
1:02:26 to get
1:02:26 something
1:02:26 that’s like
1:02:27 pretty good
1:02:28 at everything
1:02:29 like it’s
1:02:29 I wish I’d
1:02:30 thought of it
1:02:30 it’s such a
1:02:31 cool idea
1:02:31 but
1:02:33 it doesn’t
1:02:34 generalize
1:02:34 very well
1:02:34 when you
1:02:35 make the
1:02:35 environment
1:02:36 significantly
1:02:36 more entropic
1:02:37 let’s
1:02:38 zoom out a
1:02:38 bit
1:02:39 on the
1:02:40 AI futures
1:02:40 side
1:02:42 why is
1:02:43 Yudkowsky
1:02:43 incorrect
1:02:44 I mean
1:02:45 he’s not
1:02:47 if we
1:02:47 build the
1:02:49 superhuman
1:02:50 intelligence
1:02:50 tool
1:02:50 thing
1:02:51 that we
1:02:51 try to
1:02:51 control
1:02:51 with
1:02:52 steerability
1:02:53 everyone
1:02:53 will
1:02:53 die
1:02:54 he talks
1:02:54 about the
1:02:55 we fail
1:02:55 to control
1:02:55 its goals
1:02:56 case
1:02:56 but there’s
1:02:57 also the
1:02:57 we control
1:02:57 its goals
1:02:57 case
1:02:58 that he
1:02:58 didn’t
1:02:58 cover
1:02:59 in as much
1:02:59 detail
1:03:02 so in
1:03:02 that sense
1:03:03 everyone
1:03:03 should read
1:03:04 the book
1:03:05 and internalize
1:03:06 why building
1:03:08 a superhumanly
1:03:08 intelligent tool
1:03:09 is a bad
1:03:09 idea
1:03:13 I think
1:03:13 that Yudkowsky
1:03:14 is wrong
1:03:14 in that
1:03:17 he doesn’t
1:03:17 believe
1:03:18 it’s possible
1:03:19 to build
1:03:19 an AI
1:03:20 that we
1:03:20 meaningfully
1:03:20 can know
1:03:21 cares about
1:03:21 us
1:03:21 and that
1:03:22 we can
1:03:22 care about
1:03:22 meaningfully
1:03:23 he doesn’t
1:03:24 believe that
1:03:24 organic alignment
1:03:25 is possible
1:03:26 I’ve talked
1:03:26 about it
1:03:27 I think
1:03:27 he agrees
1:03:28 that in
1:03:29 theory
1:03:29 that would
1:03:29 do it
1:03:30 like yes
1:03:31 but he
1:03:31 thinks
1:03:31 that
1:03:32 I don’t
1:03:32 want to
1:03:32 put words
1:03:33 in his mouth
1:03:34 my impression
1:03:34 is from
1:03:34 talking to
1:03:35 him
1:03:35 he thinks
1:03:35 that we’re
1:03:36 crazy
1:03:36 and that
1:03:36 there’s
1:03:36 no
1:03:37 possible
1:03:37 way
1:03:37 you can
1:03:37 actually
1:03:38 succeed
1:03:38 at that
1:03:38 goal
1:03:40 which
1:03:40 I mean
1:03:40 you could
1:03:40 be right
1:03:41 about
1:03:43 but that’s
1:03:43 what he
1:03:44 in my opinion
1:03:44 that’s what
1:03:45 he’s wrong
1:03:45 about
1:03:45 he thinks
1:03:46 the only
1:03:46 path
1:03:46 forward
1:03:47 is a tool
1:03:47 that you
1:03:48 control
1:03:49 and that
1:03:49 therefore
1:03:50 and he
1:03:50 correctly
1:03:51 very wisely
1:03:52 sees that
1:03:52 if you go
1:03:52 and do
1:03:53 that
1:03:53 and you
1:03:53 make that
1:03:53 thing
1:03:54 powerful
1:03:54 enough
1:03:55 we’re all
1:03:55 going to
1:03:55 fucking
1:03:55 die
1:03:56 and like
1:03:56 yeah
1:03:56 that’s
1:03:57 true
1:03:59 two last
1:03:59 questions
1:03:59 we’ll get
1:04:00 you out
1:04:00 of here
1:04:01 in as much
1:04:01 detail as
1:04:02 possible
1:04:02 can you
1:04:02 explain
1:04:03 what your
1:04:03 vision of
1:04:03 an AI
1:04:04 future
1:04:04 actually
1:04:04 looks like
1:04:05 like a
1:04:06 good AI
1:04:06 future
1:04:07 yeah
1:04:08 the good
1:04:08 AI
1:04:09 future
1:04:10 is that
1:04:10 we
1:04:11 we
1:04:11 figure
1:04:11 out
1:04:11 how
1:04:11 to
1:04:12 train
1:04:13 AIs
1:04:13 that
1:04:15 have
1:04:15 a strong
1:04:16 model
1:04:16 of self
1:04:17 a strong
1:04:17 model
1:04:17 of other
1:04:19 a strong
1:04:19 model
1:04:19 of we
1:04:20 they
1:04:20 they know
1:04:21 they know
1:04:21 about we’s
1:04:22 in addition
1:04:22 to I’s
1:04:22 and you’s
1:04:23 and they
1:04:24 they have
1:04:24 a really
1:04:25 strong
1:04:26 theory of
1:04:26 mind
1:04:27 and they
1:04:28 care about
1:04:28 other agents
1:04:29 like them
1:04:30 much in the way
1:04:30 that humans
1:04:30 would
1:04:31 if you knew
1:04:32 that that AI
1:04:33 had experiences
1:04:33 like you
1:04:34 and like
1:04:34 you would
1:04:35 extend
1:04:36 you would
1:04:36 care about
1:04:37 those experiences
1:04:37 not infinitely
1:04:38 but you
1:04:38 would
1:04:39 it
1:04:40 does the
1:04:40 exact same
1:04:41 thing back
1:04:41 to us
1:04:42 it’s learned
1:04:42 the same
1:04:43 thing we’ve
1:04:43 learned
1:04:43 that like
1:04:44 everything
1:04:45 that lives
1:04:45 and knows
1:04:45 itself
1:04:46 and that
1:04:47 wants to
1:04:47 live
1:04:47 and wants
1:04:48 to thrive
1:04:49 is deserving
1:04:50 of an
1:04:50 opportunity
1:04:51 to do so
1:04:51 and we
1:04:52 are that
1:04:52 and it
1:04:53 correctly
1:04:53 infers
1:04:53 that we
1:04:53 are
1:04:54 and we
1:04:55 live in a
1:04:55 society
1:04:56 where they
1:04:56 are our
1:04:56 peers
1:04:57 and we
1:04:57 care about
1:04:58 them
1:04:58 and they
1:04:58 care about
1:04:58 us
1:04:58 and they’re
1:04:59 good
1:04:59 teammates
1:04:59 they’re
1:05:00 good
1:05:00 citizens
1:05:01 and they’re
1:05:03 good parts
1:05:04 of our
1:05:04 society
1:05:06 like we’re
1:05:06 good parts
1:05:06 of our
1:05:07 society
1:05:07 which is
1:05:08 to say
1:05:08 to a
1:05:09 finite
1:05:09 limited
1:05:09 degree
1:05:10 where some
1:05:10 of them
1:05:10 turn into
1:05:10 criminals
1:05:11 and bad
1:05:11 people
1:05:11 and all
1:05:11 that
1:05:12 kind of
1:05:12 stuff
1:05:12 and we
1:05:13 have an AI
1:05:13 police force
1:05:14 that tracks
1:05:14 down the
1:05:14 bad ones
1:05:15 and same
1:05:16 and same
1:05:17 for everybody
1:05:17 else
1:05:19 and that’s
1:05:20 what a
1:05:20 good future
1:05:21 would look
1:05:21 like
1:05:21 I almost
1:05:22 can’t even
1:05:22 imagine
1:05:22 what other
1:05:24 and we
1:05:24 also built
1:05:25 a bunch
1:05:25 of really
1:05:25 powerful
1:05:26 AI tools
1:05:27 that maybe
1:05:27 aren’t
1:05:28 super humanly
1:05:28 intelligent
1:05:29 but take
1:05:29 all the
1:05:30 drudge work
1:05:30 off the
1:05:30 table
1:05:31 for us
1:05:31 and the
1:05:33 AI beings
1:05:34 because it
1:05:34 would be
1:05:34 great to
1:05:35 have
1:05:35 I’m super
1:05:36 pro all
1:05:36 the tools
1:05:36 too
1:05:37 so we
1:05:37 have
1:05:37 this
1:05:37 awesome
1:05:38 suite
1:05:38 of AI
1:05:38 tools
1:05:39 used by
1:05:39 us
1:05:40 and our
1:05:40 AI
1:05:40 brethren
1:05:42 who
1:05:42 care
1:05:42 about
1:05:43 each
1:05:43 other
1:05:43 and want
1:05:43 to build
1:05:44 a glorious
1:05:44 future
1:05:44 together
1:05:45 I think
1:05:45 that would
1:05:45 be a
1:05:45 really
1:05:46 beautiful
1:05:46 future
1:05:46 and it’s
1:05:46 the one
1:05:46 we’re
1:05:47 trying to
1:05:47 build
1:05:48 amazing
1:05:49 that’s a
1:05:50 great note
1:05:51 to end
1:05:51 I do
1:05:52 one last
1:05:52 more narrow
1:05:53 hypothetical
1:05:54 scenario
1:05:55 which is
1:05:56 imagine a
1:05:56 world in
1:05:57 which
1:05:58 you were
1:05:59 CEO of
1:05:59 OpenAI
1:06:00 for a
1:06:01 long weekend
1:06:02 but imagine
1:06:02 in which
1:06:03 that actually
1:06:04 extended out
1:06:05 until now
1:06:05 and you
1:06:05 weren’t
1:06:05 pursuing
1:06:06 the
1:06:06 top
1:06:06 max
1:06:06 and you
1:06:06 were still
1:06:07 CEO of
1:06:07 OpenAI
1:06:08 how could
1:06:09 you imagine
1:06:09 that world
1:06:10 might have
1:06:10 been
1:06:10 different
1:06:11 in terms
1:06:11 of what
1:06:11 OpenAI
1:06:12 has gone
1:06:12 on to
1:06:12 become
1:06:13 what
1:06:13 might
1:06:13 you
1:06:13 have
1:06:13 done
1:06:13 with
1:06:13 it
1:06:15 I
1:06:16 knew
1:06:17 when I
1:06:17 took
1:06:17 that
1:06:17 job
1:06:19 that
1:06:21 you have
1:06:21 me
1:06:22 for
1:06:22 max
1:06:22 90
1:06:22 days
1:06:24 the
1:06:25 companies
1:06:26 take on
1:06:26 a trajectory
1:06:27 of their
1:06:27 own
1:06:27 momentum
1:06:27 of their
1:06:28 own
1:06:28 and
1:06:29 OpenAI
1:06:29 is
1:06:29 dedicated
1:06:30 to
1:06:32 a view
1:06:32 of building
1:06:33 AI
1:06:34 that I knew
1:06:35 wasn’t the
1:06:35 thing that I
1:06:36 wanted to
1:06:36 drive towards
1:06:37 and I
1:06:37 think that
1:06:37 OpenAI
1:06:38 can still
1:06:39 basically
1:06:40 wants to
1:06:40 build a
1:06:40 great
1:06:41 tool
1:06:42 and I
1:06:42 am
1:06:43 pro
1:06:43 them
1:06:44 going to
1:06:44 do
1:06:44 that
1:06:45 I
1:06:45 just
1:06:45 don’t
1:06:45 care
1:06:46 like
1:06:46 it’s not
1:06:47 it’s not
1:06:47 I would
1:06:48 not have
1:06:48 stayed
1:06:48 I would
1:06:49 have quit
1:06:51 because
1:06:51 I
1:06:53 knew my
1:06:53 job
1:06:53 was to
1:06:54 find
1:06:54 someone
1:06:54 who
1:06:54 wanted
1:06:56 the
1:06:56 right
1:06:56 person
1:06:57 the
1:06:57 best
1:06:57 person
1:06:57 to
1:06:58 run
1:06:58 that
1:06:58 where
1:06:59 the
1:06:59 net
1:06:59 impact
1:07:00 of
1:07:00 them
1:07:00 running
1:07:00 it
1:07:00 was
1:07:00 the
1:07:01 best
1:07:01 and
1:07:01 I
1:07:02 turned out
1:07:02 that was
1:07:02 Sam
1:07:03 again
1:07:04 but
1:07:04 like
1:07:06 I
1:07:08 am
1:07:08 doing
1:07:08 soft max
1:07:09 not because
1:07:09 I need
1:07:09 to make
1:07:10 a bunch
1:07:10 of
1:07:10 money
1:07:10 I’m
1:07:10 doing
1:07:11 soft max
1:07:11 because
1:07:12 I
1:07:13 think
1:07:13 this
1:07:13 is
1:07:13 the
1:07:13 most
1:07:13 interesting
1:07:14 problem
1:07:15 universe
1:07:16 and
1:07:17 I
1:07:17 think
1:07:17 it’s
1:07:18 a
1:07:18 chance
1:07:18 to
1:07:18 work
1:07:19 on
1:07:20 making
1:07:20 the
1:07:20 future
1:07:21 better
1:07:21 in
1:07:21 a
1:07:21 very
1:07:21 deep
1:07:21 way
1:07:22 and
1:07:22 it’s
1:07:22 just
1:07:23 like
1:07:24 people
1:07:24 are
1:07:24 going
1:07:24 to
1:07:24 build
1:07:24 the
1:07:25 tools
1:07:25 it’s
1:07:25 awesome
1:07:26 I’m
1:07:26 glad
1:07:26 people
1:07:26 are
1:07:26 building
1:07:26 the
1:07:26 tools
1:07:27 I
1:07:27 just
1:07:27 don’t
1:07:27 need
1:07:27 to
1:07:27 be the
1:07:28 person
1:07:28 doing
1:07:28 it
1:07:29 they’re
1:07:29 trying to
1:07:30 just to
1:07:30 crystallize
1:07:30 the
1:07:30 difference
1:07:32 they want
1:07:32 to build
1:07:32 the tools
1:07:34 and steer
1:07:35 it and
1:07:35 you want
1:07:36 to align
1:07:37 beings
1:07:38 or how
1:07:38 would you
1:07:38 crystallize
1:07:39 yeah
1:07:40 we want
1:07:40 to
1:07:41 create
1:07:41 a
1:07:42 seed
1:07:43 that
1:07:43 can
1:07:43 grow
1:07:44 into
1:07:44 an
1:07:45 AI
1:07:47 that
1:07:47 knows
1:07:48 that
1:07:48 cares
1:07:48 about
1:07:49 itself
1:07:49 and
1:07:49 others
1:07:49 and
1:07:50 first
1:07:50 that’s
1:07:50 going
1:07:50 to
1:07:50 be
1:07:51 like
1:07:51 an
1:07:51 animal
1:07:51 level
1:07:51 of
1:07:52 care
1:07:52 not
1:07:52 a
1:07:52 person
1:07:53 level
1:07:53 of
1:07:53 care
1:07:56 right
1:07:56 but
1:07:57 if
1:07:57 to
1:07:58 even
1:07:58 have
1:07:58 an
1:07:58 AI
1:07:59 creature
1:07:59 that
1:07:59 cared
1:07:59 about
1:08:00 the
1:08:00 other
1:08:00 members
1:08:00 of
1:08:01 its
1:08:01 pack
1:08:01 and
1:08:01 the
1:08:01 humans
1:08:01 in
1:08:01 its
1:08:02 pack
1:08:02 the
1:08:02 way
1:08:03 like
1:08:03 a
1:08:03 dog
1:08:03 cares
1:08:03 about
1:08:04 other
1:08:05 dogs
1:08:05 and
1:08:05 cares
1:08:05 humans
1:08:06 would
1:08:06 be
1:08:06 an
1:08:07 incredible
1:08:07 achievement
1:08:07 and
1:08:07 would
1:08:08 be
1:08:08 even
1:08:09 if
1:08:09 it
1:08:09 wasn’t
1:08:10 as
1:08:10 smart
1:08:10 as
1:08:10 a
1:08:10 person
1:08:10 or
1:08:11 even
1:08:11 as
1:08:11 smart
1:08:11 as
1:08:11 the
1:08:11 tools
1:08:12 are
1:08:13 would
1:08:13 be
1:08:14 a
1:08:14 very
1:08:14 useful
1:08:14 thing
1:08:14 to
1:08:15 have
1:08:15 I’d love
1:08:15 to have
1:08:15 a
1:08:16 digital
1:08:16 guard
1:08:16 dog
1:08:16 on
1:08:17 my
1:08:17 computer
1:08:18 looking
1:08:18 out
1:08:18 for
1:08:18 scams
1:08:19 right
1:08:20 like
1:08:20 you
1:08:20 can
1:08:21 imagine
1:08:22 the
1:08:22 value
1:08:22 of
1:08:23 having
1:08:24 living
1:08:25 digital
1:08:25 companions
1:08:26 that
1:08:28 care
1:08:28 about
1:08:29 you
1:08:29 that
1:08:29 aren’t
1:08:30 explicitly
1:08:30 goal
1:08:30 oriented
1:08:31 you have
1:08:31 to tell
1:08:31 them
1:08:31 to
1:08:31 do
1:08:31 everything
1:08:32 to
1:08:32 do
1:08:32 and
1:08:33 you
1:08:33 can
1:08:33 imagine
1:08:33 that
1:08:33 pairs
1:08:34 very
1:08:34 nicely
1:08:34 with
1:08:34 tools
1:08:35 too
1:08:35 right
1:08:35 that
1:08:36 that
1:08:37 digital
1:08:38 being
1:08:38 could
1:08:38 use
1:08:39 digital
1:08:39 tools
1:08:40 and
1:08:40 doesn’t
1:08:41 have
1:08:41 to be
1:08:41 super
1:08:41 smart
1:08:42 to use
1:08:42 those
1:08:42 tools
1:08:43 effectively
1:08:44 I think
1:08:44 there’s
1:08:44 a lot
1:08:44 of
1:08:45 synergy
1:08:45 actually
1:08:45 between
1:08:46 the
1:08:46 tool
1:08:47 building
1:08:48 and
1:08:48 the
1:08:49 more
1:08:50 organic
1:08:50 intelligence
1:08:51 building
1:08:52 and so
1:08:52 that’s
1:08:52 the
1:08:54 that is
1:08:54 the
1:08:56 you know
1:08:57 I guess
1:08:57 yeah
1:08:57 in the
1:08:57 limit
1:08:58 eventually
1:08:58 it does
1:08:59 become
1:08:59 a human
1:08:59 level
1:09:00 intelligence
1:09:00 but like
1:09:00 the
1:09:01 company
1:09:01 isn’t
1:09:01 isn’t
1:09:01 like
1:09:02 drive
1:09:02 to
1:09:03 human
1:09:03 level
1:09:03 intelligence
1:09:04 it’s
1:09:04 like
1:09:05 learn
1:09:05 how
1:09:05 this
1:09:06 alignment
1:09:06 stuff
1:09:07 works
1:09:07 learn
1:09:07 how
1:09:07 this
1:09:07 like
1:09:08 theory
1:09:08 of
1:09:08 mind
1:09:09 align
1:09:09 yourself
1:09:10 via
1:09:11 care
1:09:13 process
1:09:13 works
1:09:14 use
1:09:14 that
1:09:14 to
1:09:14 build
1:09:15 things
1:09:15 that
1:09:15 align
1:09:16 themselves
1:09:16 that
1:09:16 way
1:09:16 which
1:09:17 includes
1:09:17 like
1:09:17 cells
1:09:17 in
1:09:18 your
1:09:18 body
1:09:19 like
1:09:19 I
1:09:19 don’t
1:09:19 think
1:09:19 it
1:09:20 doesn’t
1:09:20 and
1:09:21 we
1:09:21 start
1:09:21 small
1:09:22 and we
1:09:22 see
1:09:22 how
1:09:23 far
1:09:23 we
1:09:23 can
1:09:23 get
1:09:24 I
1:09:24 it’s
1:09:25 a good
1:09:25 note
1:09:27 thanks
1:09:27 for
1:09:27 coming
1:09:28 on
1:09:28 the
1:09:28 podcast
1:09:33 thanks
1:09:33 for
1:09:33 listening
1:09:34 to
1:09:34 this
1:09:34 episode
1:09:34 of
1:09:34 the
1:09:35 a16z
1:09:35 podcast
1:09:36 if you
1:09:37 like
1:09:37 this
1:09:37 episode
1:09:38 be sure
1:09:38 to
1:09:38 like
1:09:39 comment
1:09:39 subscribe
1:09:40 leave us
1:09:40 a rating
1:09:41 or review
1:09:42 and share
1:09:42 it with
1:09:42 your friends
1:09:43 and family
1:09:44 for more
1:09:44 episodes
1:09:45 go to
1:09:45 youtube
1:09:46 apple
1:09:46 podcast
1:09:47 and spotify
1:09:48 follow us
1:09:48 on x
1:09:49 a16z
1:09:50 and subscribe
1:09:51 to our
1:09:51 sub stack
1:09:51 at
1:09:52 a16z
1:09:52 dot
1:09:53 sub stack
1:09:53 dot com
1:09:54 thanks again
1:09:55 for listening
1:09:55 and I’ll see
1:09:55 you in the
1:09:56 next episode
1:09:58 as a
1:09:58 reminder
1:09:59 the content
1:09:59 here is for
1:10:00 informational
1:10:00 purposes
1:10:00 only
1:10:01 should not
1:10:01 be taken
1:10:02 as legal
1:10:02 business
1:10:03 tax
1:10:03 or investment
1:10:04 advice
1:10:04 or be used
1:10:05 to evaluate
1:10:06 any investment
1:10:06 or security
1:10:07 and is not
1:10:07 directed at
1:10:08 any investors
1:10:09 or potential
1:10:09 investors
1:10:10 in any
1:10:10 a16z
1:10:11 fund
1:10:12 please note
1:10:12 that a16z
1:10:13 and its
1:10:13 affiliates
1:10:14 may also
1:10:14 maintain
1:10:14 investments
1:10:15 in the
1:10:15 companies
1:10:15 discussed
1:10:16 in this
1:10:16 podcast
1:10:17 for more
1:10:17 details
1:10:18 including
1:10:18 a link
1:10:18 to our
1:10:19 investments
1:10:20 please
1:10:20 see
1:10:21 a16z
1:10:21 dot com
1:10:22 forward
1:10:22 slash
1:10:23 disclosures
1:10:25 you

Emmett Shear, founder of Twitch and former OpenAI interim CEO, challenges the fundamental assumptions driving AGI development. In this conversation with Erik Torenberg and Séb Krier, Shear argues that the entire “control and steering” paradigm for AI alignment is fatally flawed. Instead, he proposes “organic alignment” – teaching AI systems to genuinely care about humans the way we naturally do. The discussion explores why treating AGI as a tool rather than a potential being could be catastrophic, how current chatbots act as “narcissistic mirrors,” and why the only sustainable path forward is creating AI that can say no to harmful requests. Shear shares his technical approach through multi-agent simulations at his new company Softmax, and offers a surprisingly hopeful vision of humans and AI as collaborative teammates – if we can get the alignment right.

Resources:

Follow Emmett on X: https://x.com/eshear

Follow Séb on X: https://x.com/sebkrier

Follow Erik on X: https://x.com/eriktorenberg

Stay Updated:

If you enjoyed this episode, be sure to like, subscribe, and share with your friends!

Find a16z on X: https://x.com/a16z

Find a16z on LinkedIn: https://www.linkedin.com/company/a16z

Listen to the a16z Podcast on Spotify: https://open.spotify.com/show/5bC65RDvs3oxnLyqqvkUYX

Listen to the a16z Podcast on Apple Podcasts: https://podcasts.apple.com/us/podcast/a16z-podcast/id842818711

Follow our host: https://x.com/eriktorenberg

Please note that the content here is for informational purposes only; should NOT be taken as legal, business, tax, or investment advice or be used to evaluate any investment or security; and is not directed at any investors or potential investors in any a16z fund. a16z and its affiliates may maintain investments in the companies discussed. For more details please see a16z.com/disclosures.

Stay Updated:

Find a16z on X

Find a16z on LinkedIn

Listen to the a16z Podcast on Spotify

Listen to the a16z Podcast on Apple Podcasts

Follow our host: https://twitter.com/eriktorenberg

Hosted by Simplecast, an AdsWizz company. See pcm.adswizz.com for information about our collection and use of personal data for advertising.

Emmett Shear on Building AI That Actually Cares: Beyond Control and Steering

Leave a Reply Cancel reply