AI transcript
0:00:01 This is an iHeart podcast.
0:00:41 If I were going to pick one paper from the past decade that had the biggest impact on the world, I would choose one called Attention is All You Need, published in 2017.
0:00:45 That paper basically invented transformer models.
0:00:53 You’ve almost certainly used a transformer model if you have used ChatGPT or Gemini or Claude or DeepSeek.
0:00:57 In fact, the T in ChatGPT stands for transformer.
0:01:08 And transformer models have turned out to be wildly useful, not just at generating language, but also at everything from generating images to predicting what proteins will look like.
0:01:16 In fact, transformers are so ubiquitous and so powerful that it’s easy to forget that some guy just thought them up.
0:01:22 But in fact, some guy did just think up transformers, and I’m talking to him today on the show.
0:01:34 I’m Jacob Goldstein, and this is What’s Your Problem, the show where I talk to people who are trying to make technological progress.
0:01:36 My guest today is Jakob Uskorej.
0:01:41 And just to be clear, Jakob was one of several co-authors on that transformer paper.
0:01:46 And on top of that, lots of other researchers were working on related things at the same time.
0:01:48 So a lot of people were working on this.
0:01:52 But the key idea did seem to come from Jakob.
0:01:55 Today, Jakob is the CEO of Inceptive.
0:02:00 That’s a company that he co-founded to use AI to develop new kinds of medicine.
0:02:03 And the company is particularly focused on RNA.
0:02:08 We talked about his work at Inceptive in the second part of our conversation.
0:02:12 In the first part, we talked about his work on transformer models.
0:02:21 At the time he started working on the idea for transformers—this is around a decade ago now—there were a couple of big problems with existing language models.
0:02:23 For one thing, they were slow.
0:02:29 They were, in fact, so slow that they could not even keep up with all the new training data that was becoming available.
0:02:31 A second problem?
0:02:35 They struggled with what are called long-range dependencies.
0:02:41 Basically, in language, that’s relationships between words that are far apart from each other in a sentence.
0:02:50 So to start, I asked Jakob for an example we could use to discuss these problems, and also how he came up with his big idea for how to solve them.
0:02:54 So pick a sentence that’s going to be a good object lesson for us.
0:02:59 Okay, so we could have—the frog didn’t cross the road because it was too tired.
0:03:01 Okay, so we got our sentence.
0:03:02 Yep.
0:03:11 How would the sort of big, powerful, but slow-to-train algorithm in 2015 have processed that sentence?
0:03:16 So basically, it would have walked through that sentence word by word.
0:03:19 And so it would walk through the sentence left to right.
0:03:25 The frog did not cross the road because it was too tired.
0:03:29 Which is logical, which is how I would think a system would work.
0:03:31 It’s more or less how we read, right?
0:03:34 It’s how we read, but it’s not necessarily how we understand.
0:03:35 Uh-huh.
0:03:43 That is actually one of the integral, I would say, for what we then—how we then went about trying to speed this all up.
0:03:44 I love that.
0:03:45 I want you to say more about it.
0:03:48 When you say it’s not how we understand, what do you mean?
0:04:00 So, on one hand, right, linearity of time forces us to almost always feel that we’re communicating language in order and just linearly.
0:04:08 It actually turns out that that’s not really how we read, not even in terms of our saccades, in terms of our eye movements.
0:04:11 We actually do jump back and forth quite a bit while reading.
0:04:12 Uh-huh.
0:04:21 And if you look at conversations, you also have highly nonlinear elements where there’s repetition, there’s reference, there’s basically different flavors of interruption.
0:04:27 But sure, by and large, right, we would say we certainly write them left to right, right?
0:04:33 So, if you write a proper text, you don’t write it as you would read it, and you also don’t write it as you would talk about it.
0:04:36 You do write it in one linear order.
0:04:47 Now, as we read this and as we understand this, we actually form groups of words that then form meaning, right?
0:04:51 So, an example of that is, you know, adjective noun, right?
0:04:54 It’s—or say, in this case, an article noun.
0:04:56 It’s not a frog, it’s the frog, right?
0:05:00 We could have also said it’s the green frog or the lazy frog.
0:05:01 Right.
0:05:02 Language has a structure, right?
0:05:07 And there are—things can modify other things, and things can modify the modifiers.
0:05:08 Exactly, exactly.
0:05:16 But the interesting thing now is that structure, as a tree-structured, clean hierarchy, only tells you half the story.
0:05:24 There’s so many exceptions where statistical dependencies, where modification actually happens at a distance.
0:05:30 So, okay, so just to bring this back to your sample sentence, the frog didn’t cross the road because it was too tired.
0:05:34 That word it is actually quite far from the word frog.
0:05:39 And if you’re an AI going from left to right, you may well get confused there, right?
0:05:43 You may think it refers to road instead of to frog.
0:05:48 So this is one of the problems you were trying to solve.
0:05:57 And then the other one you were mentioning before, which is these models were just slow, because after each word, the model just recalculates what everything means.
0:05:58 And that just takes a long time.
0:06:00 They can’t go fast enough.
0:06:00 Exactly.
0:06:07 It takes a long time, and it doesn’t play to the strengths of the computers, of the accelerators that we’re using there.
0:06:13 And when you say accelerators, I know Google has their own chips, but basically we mean GPUs now, right?
0:06:14 We mean GPUs.
0:06:17 We mean the chips that NVIDIA sells.
0:06:19 What is the nature of those particular chips?
0:06:19 Exactly.
0:06:32 So the nature of those particular chips is that instead of doing a broad variety of complex computations in sequence, they are incredibly good.
0:06:37 They excel at performing many, many, many simple computations in parallel.
0:07:06 And so what this hierarchical or semi-hierarchical nature of language enables you to do is instead of having, so to speak, one place where you read the current word, you could now imagine you actually read every, you look at everything at the same time, and you apply many simple operations at the same time to each position in your sentence.
0:07:08 Uh-huh. So this is the big idea.
0:07:12 I just want to pause here because this is it, right? This is the breakthrough happening.
0:07:12 Yes.
0:07:20 It’s basically, what if instead of reading the sentence one word at a time from left to right, we read the whole thing all at once?
0:07:28 All at once. And now the problem is, clearly something’s got to give, right? So there’s no free lunch in that sense.
0:07:33 You have to now simplify what you can do at every position when you do this all in parallel.
0:07:34 Uh-huh.
0:07:41 But you can now afford to do this a bunch of times after another and revise it over time or over these steps.
0:07:57 And so instead of walking through the sentence from beginning to end, rather, an average sentence has like 20 words or so, average sentence in prose, instead of walking those 20 positions, what you’re doing is you’re looking at every word at the same time, but in a simpler way.
0:08:12 But now you can do that maybe five or six times, revising your understanding. And that turns out is faster, way faster on GPUs. And because of this hierarchical nature of language, it’s also better.
0:08:34 So you have this idea. And as I read the little note on the paper, it was in fact your idea. I know you were working with a team, but the paper credits you with the idea. So let’s take this idea, this basic idea of look at the whole input sentence all at once a few times and apply it to our frog sentence. Give me that frog sentence again.
0:08:37 The frog did not cross the road because it was too tired.
0:08:43 Good. Tired is good because that’s unambiguous. Hot could be either one. It could be the road or the frog, right?
0:08:48 Hot could be either one, exactly, yes. In fact, hot could actually be either one.
0:08:52 And non-referential, and non-referential because it was too hot outside.
0:08:56 Outside, it could be any of three things, the weather or the frog or the road.
0:08:56 Exactly.
0:09:08 I love that. Tired solves the problem. So your model, this new way of doing things, how does it parse that sentence? What does it do?
0:09:19 So basically, let’s look at the word it and look at it in every single step of these, you know, say a handful of times repeated operation.
0:09:28 Imagine you’re looking at this word it, that’s the one that you are now trying to understand better, and you now compare it to every other word in the sentence.
0:09:39 So you compare it to the, to frog, to did not cross the road because too and tired, was too and tired.
0:09:57 And initially, in the first pass already, a very simple insight the model can fairly easily learn is that it could be strongly informed by frog, by road.
0:10:06 by nothing, but not so, by to or by the, or maybe only to a certain extent by was.
0:10:15 But if you want to know more about what it denotes, then it could be, you know, it could be informed by all of these.
0:10:21 And just to be clear, that sort of understanding arises because it has trained in this way on lots of data.
0:10:30 It’s encountering a new sentence after reading lots of other sentences with lots of pronouns with different possible antecedents, yeah.
0:10:31 Exactly, exactly.
0:10:43 So now, the interesting thing is that which of the two it actually refers to doesn’t depend only on what those other two words are.
0:10:48 And this is why you need these subsequent steps because, so let’s start with the first step.
0:10:56 So what now happens is that, say the model identifies frog and road could have a lot to do with the word it.
0:11:03 So now you basically copy some information from both frog and road over to it.
0:11:12 And you don’t just copy it, you kind of transform it also on the way, but you refine your understanding of it.
0:11:13 And this is all learned.
0:11:17 This is not given by rules or, you know, in any way pre-specified.
0:11:21 Right, just by training on lots of libraries.
0:11:22 Just by training, this emerges, exactly.
0:11:28 And so that sort of the meaning of it after this first step is kind of influenced by both frog and road.
0:11:30 Yes, both frog and road.
0:11:35 Okay, so now we repeat this operation again.
0:11:41 And we now know that it is unsure, or the model basically now has this kind of superposition, right?
0:11:43 It could be road, it could be frog.
0:11:46 But now, in the next step, it also looks at tired.
0:11:54 And somehow the model has learned that when it means something inanimate, that tired is not the thing.
0:12:00 And so maybe in context of tired, it is more likely to refer to frog.
0:12:10 And now you know, well, it is more likely, or now maybe the model has figured out already, maybe it needs a bit more, a few more iterations,
0:12:15 that it is most likely to refer to frog because of the presence of tired.
0:12:17 So it has solved the problem.
0:12:18 But it has solved the problem.
0:12:23 So you have this idea, you try it out.
0:12:26 There’s a detail that you mentioned that’s kind of fun, and we kind of skipped it.
0:12:31 But you mentioned that another one of the co-authors, who has also gone on to do very big things,
0:12:35 was about to leave Google when you sort of want to test this idea.
0:12:40 And that fact that he was about to leave Google was actually important to the history of this idea.
0:12:41 Tell me about that.
0:12:42 It was important.
0:12:51 So this Ilya Abdullah-Sushin, he was, at the time that this started to gain any kind of speed,
0:12:55 Ilya was managing a good chunk of my organization.
0:13:04 And the moment he really made the decision to leave the company, he had to wait, ultimately, for his co-founder
0:13:07 and for them to then actually get going together in earnest.
0:13:13 And so he had a few months where he knew, and I also knew, that he was about to leave.
0:13:20 And where, you know, the right thing would, of course, be to transition his team to another manager,
0:13:26 which we did immediately, but where he then suddenly was in a position of having nothing to lose.
0:13:33 And yet, quite some time left to play with Google’s resources and do cool stuff with interesting people.
0:13:41 And so that’s one of those moments where suddenly your appetite for risk as a researcher just spikes, right?
0:13:46 Because you have, for a few more months, you have these resources at your disposal,
0:13:49 you’ve transitioned your responsibilities.
0:13:53 At that stage, you’re just like, okay, let’s try this crazy shit.
0:14:00 And that’s literally, in so many ways, was one of the integral catalysts.
0:14:07 Because that also enabled this kind of mindset of, we’re going for this now.
0:14:11 Whatever the reason, it still, you know, affects other people.
0:14:16 And so there were others who joined that collaboration really, really early on,
0:14:21 who I feel were much more excited and, as a result, much more likely to really work on this
0:14:23 and to really give it their all.
0:14:30 Because of his, you know, nothing left to lose, I’m going to go for this attitude at this point, right?
0:14:34 Was there a moment when you realized it worked?
0:14:35 There were actually a few moments.
0:14:43 And it’s interesting because, on one hand, right, it’s a very gradual thing, right?
0:14:50 And initially, actually, it took us many months to get to the point where we saw significant first signs of life,
0:14:54 of this not just being a curiosity, but really being something that would end up being competitive.
0:14:57 So there certainly was a moment when that started.
0:15:05 There was another moment when we, for the first time, had one machine translation challenge,
0:15:09 one language pair of the W&T task, as it’s called,
0:15:14 where our score, our model performed better than any other single model.
0:15:19 The point in time when I think all of us realized this is special
0:15:25 was when we not only had the best one in one of these tasks, but in multiple.
0:15:29 And we didn’t just have the best number.
0:15:32 We also, at that point, were able to establish that we’ve gotten there
0:15:37 with about 10 times less energy or training compute spend.
0:15:38 Wow.
0:15:41 So you do one-tenth the work, and you get a better result.
0:15:43 One-tenth the work, and you get a better result,
0:15:47 not just across one specific challenge, but across multiple,
0:15:50 including the hardest, or one of the harder ones, right?
0:15:56 And then, at that stage, we were still improving rapidly.
0:16:00 And then you realize, okay, this is for real.
0:16:07 Because it wasn’t that we had to squeeze those last little bits and pieces of gain out of it.
0:16:10 It was still improving fairly rapidly,
0:16:15 to the point where actually, by the time we actually published the paper,
0:16:18 we, again, reduced the compute requirements,
0:16:21 not quite by an entire order of magnitude, but almost, right?
0:16:26 So it still was getting faster and better at a pretty rapid rate.
0:16:32 So we had, in the paper, we had some results that were those roughly 10x faster on eight GPUs.
0:16:35 And what we demonstrated, in terms of quality, on those eight GPUs,
0:16:40 by the time we actually published the paper properly, we were able to do with one GPU.
0:16:45 One GPU, meaning one chip of the kind that people buy 100,000 of now?
0:16:46 To build a data center?
0:16:47 Exactly.
0:16:55 So the paper, actually, at the end, mentions other possible uses beyond language for this technology.
0:17:00 It mentions images, audio, and video, I think, explicitly.
0:17:03 How much were you thinking about that at the time?
0:17:07 Was that just like an afterthought, or were you like, hey, wait a minute, it’s not just language?
0:17:12 By the time it was actually published at a conference, not just the preprint, by December,
0:17:17 we had initial models on other modalities, on generating images.
0:17:23 We had the first, at that time, they were not performing that well yet, but they were rapidly
0:17:24 getting better.
0:17:30 We had the first prototypes, actually, of models working on genomic data, working on protein
0:17:30 structure.
0:17:32 That’s good foreshadowing.
0:17:33 Good foreshadowing as well, exactly.
0:17:40 But then we ended up, for a variety of reasons, we ended up, at first, focusing on applications
0:17:41 in computer vision.
0:17:46 The paper comes out, you’re working on these other applications, you’re presenting the paper,
0:17:48 it’s published in various forms.
0:17:50 What’s the response like?
0:18:00 It was interesting because the response built in deep learning AI circles, basically, between
0:18:07 the preprint that I think came out in, I want to say, June 2017, and then the actual publication,
0:18:13 to the extent that by the time the poster session happened at the conference, there was quite
0:18:20 a crowd at the poster, so we had to be shoved out of the hall in which the poster session
0:18:25 happened by the security, and had very hoarse voices by the end of the evening.
0:18:30 You guys were like the Beatles of the AI conference?
0:18:36 I wouldn’t say that because we weren’t the Beatles, because it was really, it was still
0:18:37 very specific.
0:18:39 You were more the cool hipster band.
0:18:40 You were the it hipster band.
0:18:42 Certainly more the cool hipster band.
0:18:46 But it was an interesting experience because there were some folks, including some greats
0:18:50 in the field, who came by and said, wow, this is cool.
0:18:55 What has happened since has been wild, it seems.
0:18:57 Wild, to say the least, yes.
0:18:59 Is it surprising to you?
0:19:02 Of course, many aspects are surprising, for sure.
0:19:12 We definitely saw pretty early on, already back in 2018, 2019, that something really exciting
0:19:13 was happening here.
0:19:17 Now, I’m still surprised by it.
0:19:25 With the advent of ChatGPT, something that didn’t go way beyond those language models that
0:19:34 we had already seen a few years before was suddenly the world’s fastest growing consumer product.
0:19:35 Ever, right?
0:19:36 I think ever.
0:19:36 Ever.
0:19:37 Yes.
0:19:42 And by the way, GBT stands for Generative Pre-Train Transformer, right?
0:19:44 Transformer is your word.
0:19:44 That’s right.
0:19:50 So there’s an interesting, I don’t know, business side to this, right?
0:19:52 Which is, you were working for Google when you came up with this.
0:19:55 Google presumably owned the idea.
0:19:56 Yep.
0:19:59 Had intellectual property around the idea.
0:20:00 Has filed many a patent.
0:20:03 Was it just a choice Google made to let everybody use it?
0:20:09 Like, when you see the fastest growing consumer product in the history of the world, not only
0:20:13 built on this idea, but using the name, like, and it’s a different company.
0:20:14 That was five years later.
0:20:15 Five years later.
0:20:17 But a patent’s good for more than five years.
0:20:18 Is that a choice?
0:20:20 Is that a strategic choice?
0:20:21 What’s going on there?
0:20:29 So the choice to do it in the first place, to publish it in the first place, is really
0:20:36 based on and rooted in a deep conviction of Google at the time, and I’m actually pretty
0:20:43 sure it still is the case, that it is actually, these developments are the tide that floats all
0:20:45 boats, that lifts all boats.
0:20:45 Uh-huh.
0:20:47 Like a belief in progress.
0:20:49 A belief in progress.
0:20:49 Exactly.
0:20:50 Like a good old-fashioned belief in…
0:21:00 Now, it’s also the case that at the time, organizationally, that specific research arm was unusually separated
0:21:02 from the product organizations.
0:21:10 And the reason why brain, or in general, the deep learning groups, were more separated was
0:21:12 in part historical.
0:21:18 Namely, that when they started out, there were no applications, and the technology was not ready
0:21:19 for being applied.
0:21:26 And so it’s completely understandable and just, you know, a consequence of organic developments
0:21:34 that when this technology suddenly is on the cusp of being incredibly impactful, you’re probably
0:21:41 still underutilizing it internally and potentially also not yet treating it in the same way as you
0:21:45 would have maybe otherwise treated previous trade secrets, for example.
0:21:52 Because it feels like this out there research project, not like what’s going to be this consumer
0:21:53 product.
0:22:04 And to be fair, it took OpenAI, in this case, a fair amount of time to then turn this into this
0:22:04 product.
0:22:09 And most of that time, it also, from their vantage point, wasn’t a product, right?
0:22:18 So up until all the way through ChatGPT, OpenAI published all of their GPT developments, maybe
0:22:22 not all, but, you know, a very large fraction of their work on this.
0:22:24 Yeah, their early models, the whole models were open.
0:22:24 Exactly.
0:22:30 They were more true to their name, really, really also believing in the same thing.
0:22:37 And it was only really after ChatGPT and after this, to them also surprise, to a certain extent,
0:22:44 success, that they started to become more closed as well when it comes to scientific developments
0:22:44 in this space.
0:22:49 We’ll be back in just a minute.
0:23:04 Run a business and not thinking about podcasting?
0:23:05 Think again.
0:23:10 More Americans listen to podcasts than ad-supported streaming music from Spotify and Pandora.
0:23:14 And as the number one podcaster, iHeart’s twice as large as the next two combined.
0:23:17 So whatever your customers listen to, they’ll hear your message.
0:23:21 Plus, only iHeart can extend your message to audiences across broadcast radio.
0:23:23 Think podcasting can help your business?
0:23:24 Think iHeart.
0:23:26 Streaming, radio, and podcasting.
0:23:29 Call 844-844-iHeart to get started.
0:23:32 That’s 844-844-iHeart.
0:23:35 Let’s talk about your company.
0:23:37 When did you decide to start Inceptive?
0:23:45 The decision took a while and was influenced by events that happened over the course of
0:23:52 about three months, two to three months in late 2020, starting with the birth of my first
0:23:53 child.
0:23:57 So when Amre was born, two things happened.
0:24:03 Number one, witnessing a pregnancy and a birth during a pandemic where there’s a pathogen that’s
0:24:04 rapidly spreading.
0:24:08 And so all of that was a pretty daunting experience.
0:24:10 And everything went great.
0:24:21 But having this new human in my arms also really made me question if I couldn’t more directly
0:24:24 affect people’s lives positively with my work.
0:24:31 And so I was at the time quite confident that indirectly it would have effect also on things
0:24:33 like medicine, biology, et cetera.
0:24:39 But I was wondering, couldn’t this happen more directly if I focused more on it?
0:24:45 The next thing that happened was that AlphaFold 2 results at CASP-14 were published.
0:24:50 CASP-14 is this biannual challenge for protein structure prediction and some other related problems.
0:24:52 This is the protein folding problem.
0:24:54 And this is the protein folding problem, exactly.
0:24:57 So machine learning solving the protein folding problem, which had been a problem for decades,
0:25:01 given a chain of amino acids to predict the 3D structure of a protein.
0:25:02 Precisely.
0:25:05 And humans failed and machine learning succeeded.
0:25:06 Just amazing.
0:25:07 Yes.
0:25:08 It’s a great example.
0:25:14 Humans failed despite the fact that we actually understand the physics fundamentally, but we
0:25:19 still couldn’t create models that were good enough using our conceptual understanding
0:25:20 of the processes involved.
0:25:23 Yeah, you would think an algorithm would work on that one, right?
0:25:26 You would just think an old school set of rules.
0:25:28 Like, we know what the molecules look like.
0:25:30 We know the laws of physics.
0:25:33 It’s amazing that we couldn’t predict it that way, right?
0:25:36 All you want to know is what shape is the protein going to be?
0:25:37 You know all of the constituent parts.
0:25:39 You know every atom in it.
0:25:43 And you still couldn’t predict it with a set of rules, but AI, machine learning, could.
0:25:45 Amazing.
0:25:45 Yes.
0:25:46 And it is amazing.
0:25:51 Actually, when you put it like this, it’s important to point out that when we say we
0:25:54 understand it, we make massive oversimplifying assumptions.
0:25:59 Because we ignore all the other players that are present when the protein folds.
0:26:04 We ignore a lot of the kinetics of it because we say we know the structure, but the truth
0:26:08 is we don’t know all the wiggling and all the shenanigans that happen on the way there,
0:26:08 right?
0:26:14 And we don’t know about, you know, chaperone proteins that are there to influence the folding.
0:26:16 We don’t know around all sorts of other aspects.
0:26:17 I’m doing the physics one.
0:26:20 I’m doing the assume a frictionless plane version of protein.
0:26:21 Precisely.
0:26:22 Which is why it didn’t work.
0:26:25 And the beauty is that deep learning doesn’t need to make this assumption.
0:26:27 AI doesn’t need to make this assumption.
0:26:28 AI just looks at data.
0:26:35 And it can look at more data than any human or even humanity eventually could look at together.
0:26:41 It’s such a good example problem to demonstrate that these models are ready for prime time in
0:26:45 this field and ready for lots of applications, not just one or two, but many.
0:26:46 And so that happens.
0:26:48 They’ll be sold, exactly.
0:26:56 And then the third thing was that these COVID mRNA vaccines came out with astonishing 90 plus
0:26:57 percent efficacy.
0:26:57 So fast also.
0:26:58 Out of the gate.
0:27:02 How fast and how good they were is still so underrated.
0:27:03 Underrated.
0:27:07 At the beginning of the pandemic, people were like, it’ll be two or three years, and if they’re
0:27:08 60 percent effective, that’ll be great.
0:27:09 Exactly.
0:27:10 And so-
0:27:11 Everybody forgets that.
0:27:11 Everybody forgets it.
0:27:16 And when you look at it, this is a molecule family that was for, you know, most of the
0:27:20 most of the time that we’ve known about it, since the 60s, I suppose, we’ve treated it
0:27:24 like an addicted stepchild of molecular biology.
0:27:26 Because we’ve massed-
0:27:26 You’re talking about RNA in general?
0:27:27 RNA.
0:27:28 RNA in general.
0:27:28 Yeah.
0:27:30 Everybody loves DNA, right?
0:27:32 DNA is the movie star.
0:27:33 Exactly.
0:27:33 Exactly, exactly.
0:27:40 Even though now, looking back, DNA is merely, you know, the place where life takes its notes,
0:27:42 maybe the hard drive and the memory.
0:27:43 It’s the book, right?
0:27:43 It’s the book, right?
0:27:44 It’s the book.
0:27:50 So, but at the end of the day, it’s this molecule family that was about to save, you know, depending
0:27:54 on the estimate, tens of millions of lives, and in rapid time.
0:28:01 So all these things hold, but we have no training data to apply anything like alpha-fold to this
0:28:02 specific molecule family.
0:28:03 No training data to speak of.
0:28:07 We had 200,000 known protein structures at the time.
0:28:12 I believe, maybe optimistically, we had maybe 1,200 known RNA structures.
0:28:18 And on top of that, it was also fairly clear that for RNA, going directly to function would
0:28:22 be much, much more important because it’s, in a certain sense, a weak, less strongly structured
0:28:25 molecule and other aspects of the molecule might play a bigger role.
0:28:33 And then, on top of that, the attention that generative AI was receiving overall, also now
0:28:37 in the field of pharma or of medicine, was building.
0:28:45 And so I ended up finding myself in a conversation where a very, I’d say, wise, long-time mentor
0:28:52 of mine pointed out that, you know, maybe 10 years from now or so, somebody could tell my
0:28:58 daughter that there was this perfect storm where this macromolecule with no training data was
0:29:02 about to save the world and could do so much more in the direction of positively impacting
0:29:03 people’s lives.
0:29:05 We didn’t have training data.
0:29:07 It would be very expensive to create it.
0:29:11 But using the technology that I’ve been, or technologies that I’ve been working on for
0:29:16 the last, I don’t know, 10 plus years, and the ability, because of the attention that
0:29:22 people were now giving to AI in this field, the ability to raise quite a bit of money,
0:29:31 I, in that position, chose to stay back at my cushy dream job in big tech and not actually
0:29:35 take this opportunity to really positively impact people’s lives.
0:29:39 And that idea was not one I was willing to entertain.
0:29:44 You couldn’t just coast it out at Google and let somebody else go figure out RNA.
0:29:44 Yeah.
0:29:46 And it’s not just RNA.
0:29:49 I think RNA is a great starting point at the end of the day.
0:29:57 But building models that learn from, first of all, all the publicly available data that we
0:30:01 can possibly get our hands on, but also from data that we can reasonably effectively create
0:30:08 in our own lab, how to design molecules for specific functions, is something that now is
0:30:15 within reach and that will, in the next years and in the years to come, have completely transformational
0:30:18 impact on how we even think about what medicines are.
0:30:26 That any opportunity to speed this up, to make this happen, even just a day sooner than it could
0:30:29 have otherwise happened, is incredibly valuable, in my opinion.
0:30:36 As you’re talking about this idea that the absence of training data seems to be at the
0:30:36 center of it, right?
0:30:40 It seems to be the core problem, which makes sense, right?
0:30:44 Like, the reason language works so well is basically because of the internet.
0:30:48 I know now we’re going beyond it, but it just happened to be that there was this incredibly
0:30:52 giant set of natural language that became available.
0:30:54 We don’t have anything like that for RNA.
0:31:00 So are you, I mean, it’s kind of step one at Inceptive, creating the data?
0:31:02 Is that kind of what’s happening?
0:31:08 So step one at Inceptive is learning to use all the data, or was, I think we’ve made a lot
0:31:12 of progress in that direction, learning to use all the data that is available already, and
0:31:15 identify what other data we’re missing.
0:31:20 And then see how far we can get with just the publicly available data, and at the same
0:31:23 time scale up generating our own data.
0:31:29 And it turns out that actually, because of the nature of evolution, because of how evolution
0:31:38 isn’t actually incentivized to really explore the entire space of possibilities, it is almost
0:31:44 always a given that if you are trying to design exceptional molecules, especially ones that
0:31:52 are not, say, you know, natural formats, you are basically guaranteed to need novel training
0:31:52 data.
0:31:53 Yeah.
0:31:57 Basically, you’re saying you build RNAs that don’t exist in the world, that have therapeutic
0:32:01 uses, and there’s no, kind of definitionally, no training data for that, because they don’t
0:32:01 exist.
0:32:07 The funny thing is, we have a few of them, and so we have existence proofs of RNA molecules,
0:32:17 for example, RNA viruses, that actually exhibit incredibly complex, different functions in ourselves
0:32:23 that do all sorts of things that we don’t usually like, but if we could use those, you know, for
0:32:30 good, if we could use those, you know, in ways that would actually be aimed at fighting disease
0:32:36 rather than creating them, those kinds of functions, even just a small subset of them, would really
0:32:37 transform medicine already.
0:32:38 And so we know it’s possible.
0:32:40 What are you dreaming of when you say that?
0:32:41 What are you thinking of, specifically?
0:32:41 Okay.
0:32:49 So, for example, right, one estimate is that in order for COVID to infect you, you would need
0:32:55 potentially as few as five COVID genomes inside your organism.
0:32:55 That’s already it.
0:32:57 Five viral particles?
0:32:58 Five viral particles.
0:32:58 Yeah.
0:33:00 You inhale those.
0:33:03 You wouldn’t have to inject it.
0:33:05 You wouldn’t even have to swallow it.
0:33:06 You inhaled them.
0:33:10 What if we could have a medicine that worked as well as a disease, is a version of your dream.
0:33:10 Exactly.
0:33:11 Exactly.
0:33:18 So, at the end of the day, right, this medicine is able to spread in your body only into certain
0:33:20 types of organs and tissues and cells.
0:33:24 It does certain things there that are really quite complex, right?
0:33:25 Changing the cells’ behavior.
0:33:26 Yeah.
0:33:31 Again, not usually in this case in favorable ways, but still in ways that wouldn’t have
0:33:36 to be modified that much in order to potentially be exactly what you would need for a complex
0:33:37 multifactorial medicine.
0:33:42 And if you could make all of that happen by just inhaling five of those molecules, then,
0:33:45 again, that would completely change how you think about medicine, right?
0:33:50 You have viruses that aren’t immediately active, but that are inactive for long periods of time
0:33:51 in your organism.
0:33:58 And only under certain conditions, say, under certain immune conditions, really start being
0:33:59 reactivated.
0:34:06 Why can’t we have medicines that work in a similar way, where you actually, not only in
0:34:11 a vaccination sense, but where you take a medicine for a genetic predisposition for a
0:34:15 certain disease, that you are able to design a medicine that you can take and that waits until
0:34:17 the disease actually starts to develop.
0:34:21 And only then, and only where that disease then starts to develop, becomes active and
0:34:22 actually facts it.
0:34:25 And potentially also then alarms the doctor through a blood test.
0:34:28 Like for cancer cells or something.
0:34:33 So you have some kind of prophylactic medicine in your body, and it is encoded in such a way
0:34:38 that it just hangs out there like herpes, to take a pathological example.
0:34:39 For example, yes.
0:34:42 And only in certain settings does it do anything.
0:34:46 And those settings are, if you see a cancer cell, destroy it.
0:34:47 Otherwise, just sit there.
0:34:48 Precisely.
0:34:53 And if you can design those also in ways where you can just make them all go away when you
0:34:58 take a, say, a completely harmless small molecule, and that’s, again, entirely feasible.
0:34:59 Sure.
0:35:01 So, I mean, you’re dreaming big.
0:35:04 These are wonderful big, you know, science fiction-y dreams, and I hope you figure them
0:35:05 out.
0:35:09 On a practical level, what’s happening at the company right now?
0:35:10 How many people work there?
0:35:10 What are they doing?
0:35:12 And what have they figured out so far?
0:35:13 We’re around 40.
0:35:17 What we’re doing is really exactly what we just talked about.
0:35:25 We’re basically scaling data generation experiments in our lab that allow us to assess a variety
0:35:32 of different functions of different, mostly RNA molecules, actually mostly mRNA molecules at the
0:35:38 moment, that are relevant to a pretty broad variety of different diseases.
0:35:45 And so, this ranges from things like infectious disease vaccines to cell therapies that can be applied
0:35:49 in oncology or against autoimmune disease.
0:35:56 We have mRNAs that we hope will eventually be effective in enzyme replacement as enzyme replacement
0:36:00 therapies for families of, or a large family of rare diseases.
0:36:02 And the list goes on.
0:36:10 And so, we’re creating this, or growing this training data set that eventually, on top of
0:36:17 foundation and models that we pre-trained on all publicly available data, allow us to tune
0:36:25 those foundation models towards designing exceptional molecules for exactly those applications and many
0:36:26 more sharing similar properties.
0:36:33 So, you basically build new mRNA molecules and test them, and then you give that data to
0:36:38 your model, and presumably it tells you what to build next, or it helps you figure out what to
0:36:39 build next.
0:36:40 It’s sort of a loop in that way?
0:36:46 The models are definitely one interesting source for proposals, if you wish, for what to synthesize
0:36:48 and test next.
0:36:49 They’re not the only such source.
0:36:55 So, we basically also explore kind of in maybe less guided or heuristically guided ways.
0:36:57 But, exactly.
0:37:00 So, in some of the cases, it’s really quite iterative.
0:37:06 For some of those functions and for some of those modalities and diseases or disease targets,
0:37:12 we’re actually already at a point where our models can spit out entirely novel molecules that
0:37:18 really are unlike anything they’ve ever seen or we’ve ever seen in nature, that very consistently
0:37:24 perform quite favorably compared to pretty strong baselines by incumbents in the field.
0:37:31 When you say perform quite favorably compared to baselines by incumbents in the field, I mean,
0:37:35 does that on some level mean better than what experts would think up?
0:37:40 Better than what experts can think of and also better than more traditional machine learning
0:37:41 tools can easily produce.
0:37:48 It’s like that famous moment in the Go match when AlphaGo made some move that, like, no human
0:37:49 being ever would have thought of.
0:37:50 Move 37.
0:37:51 Yes.
0:37:59 So, I would say we’ve long passed the Move 37 in the sense that our understanding of the
0:38:06 underlying biological phenomena is so incomplete that for most of the things that we’re able
0:38:09 to design for, we don’t really understand why they happen.
0:38:10 Huh.
0:38:13 When you say we, do you mean at Inceptive or do you mean just medicine in general?
0:38:15 I would say just medicine in general.
0:38:16 Okay.
0:38:21 So, Inceptive is doing this very kind of high-level work, right?
0:38:24 I mean, building what will hopefully be the foundation.
0:38:28 What’s the right amount of time in the future to ask about?
0:38:29 When will we know if it works?
0:38:31 Do you think five years?
0:38:40 So, the general idea of using generative AI and similar techniques to generate therapeutics,
0:38:46 there are some things in clinical trials that were largely designed with AI.
0:38:55 As far as I know, we’re still, maybe now we have the first trials just now starting for
0:38:58 molecules that were truly entirely designed by AI.
0:39:01 As opposed to sort of selected from a library?
0:39:03 Selected, influenced, exactly.
0:39:06 Selected, adjusted, tuned, tweaked, et cetera, right?
0:39:10 So, that’s really still only happening just now.
0:39:10 Okay.
0:39:17 But we will see, I believe, the first success or a first success of such molecules, certainly
0:39:18 within the next five years.
0:39:21 What about more narrowly the project at Inceptive?
0:39:23 It’s a similar timeframe.
0:39:30 We should be able to get molecules into the clinic in the next few years, certainly in the
0:39:31 next handful of years.
0:39:40 Now, these will not be molecules with, where the objective that we used in their design is,
0:39:45 you know, even remotely as complex or the, you know, kind of the different functions that
0:39:52 we’re designing for are not going to be even remotely as diverse as, say, what you would find in,
0:39:55 because we used this example earlier in RNA virus.
0:39:58 These will really be more, you know, simpler.
0:40:04 Those will be molecules that don’t do things that we couldn’t possibly have done before,
0:40:11 but that do them much better in ways that are more accessible, in ways that come with less side
0:40:11 effects.
0:40:15 What biotech largely is, is they make protein drugs.
0:40:21 And so if you could make an mRNA drug where you put the mRNA into the body and the body makes the protein,
0:40:25 it wouldn’t be some crazy sleeper cell that sits in your body for 20 years or whatever.
0:40:29 But it might be a more practical alternative to today’s biotech drugs.
0:40:30 Absolutely.
0:40:35 So you’ve had a kind of crash course in biology in the last few years.
0:40:35 Yes.
0:40:41 And I’m curious, like, what is, what is something that has been particularly compelling or surprising
0:40:44 or interesting to you that you have learned about biology?
0:40:46 There are countless things.
0:40:57 The biggest one or the red thread across many of them is really just how effective life is
0:41:07 at finding solutions to problems that, on one hand, are incredibly robust, surprisingly robust,
0:41:16 and on the other hand, are so different from how we would design solutions to similar problems.
0:41:17 Aha.
0:41:23 That really, this comes back to this idea that we might just not be particularly well-equipped
0:41:30 in terms of cognitive capabilities to understand biology, that basically, you know, we are,
0:41:34 we would never think to do it this way.
0:41:38 And how we think to do it is oftentimes much more brittle.
0:41:40 Aha.
0:41:41 Brittle is an interesting world.
0:41:45 Less resilient, less able to persist under different conditions.
0:41:46 Exactly.
0:41:46 Exactly.
0:41:49 I mean, you know, we still haven’t built machines that can fix themselves, for one.
0:41:53 Which is fundamentally the miracle of being a human being.
0:41:54 Which is fundamentally the miracle of life.
0:41:56 I’m still here after going through all this.
0:41:56 Exactly.
0:41:57 Exactly.
0:41:57 Exactly.
0:42:00 And so, and of course, this is true across the scales, right?
0:42:05 From, you know, single cells all the way to complex organisms like ourselves.
0:42:18 And really just how many also very different kinds of solutions life has found and or constantly
0:42:19 is finding.
0:42:22 And you see this all over the place.
0:42:30 And it’s both daunting, humbling, but also incredibly inspiring when it comes to applying
0:42:31 AI in this area.
0:42:37 Because again, I think that at least so far, it’s the best tool and maybe actually the only
0:42:47 tool we have so far in face of this kind of complexity, really design interventions that go way beyond
0:42:51 what we were able to do or are able to do just based on our own conceptual understanding.
0:42:57 We’ll be back in a minute with the lightning round.
0:43:13 Run a business and not thinking about podcasting?
0:43:13 Think again.
0:43:18 More Americans listen to podcasts than ad-supported streaming music from Spotify and Pandora.
0:43:22 And as the number one podcaster, iHeart’s twice as large as the next two combined.
0:43:25 So whatever your customers listen to, they’ll hear your message.
0:43:30 Plus, only iHeart can extend your message to audiences across broadcast radio.
0:43:31 Think podcasting can help your business?
0:43:33 Think iHeart.
0:43:35 Streaming, radio, and podcasting.
0:43:38 Call 844-844-IHEART to get started.
0:43:40 That’s 844-844-IHEART.
0:43:43 Let’s finish with the lightning round.
0:43:50 As an inventor of the Transformer model, are there particular possible uses of it that
0:43:53 worry you slash make you sad?
0:44:03 I am quite concerned about the P-Doom, Doomerism, whatever you want to call it, existential fear
0:44:11 instilling rhetoric that is in some cases actually also promoted by people, by entities in the space.
0:44:14 So just to be clear, you’re not worried about the existential risk.
0:44:17 You’re worried about people talking about the existential risk.
0:44:27 I’m worried about the existential risk being inflated, or the perception being inflated to the extent
0:44:34 that we actually don’t look enough at some of the much more concrete and much more immediate risks.
0:44:39 I’m not going to say that the existential risk is zero, but that would be silly.
0:44:44 What is a concrete and immediate risk that is, you think, under-discussed?
0:44:52 These large-scale models are such effective tools in manipulating people in large numbers already
0:44:59 today, and it’s happening everywhere for many, many different purposes by, in some cases, benevolent,
0:45:06 and in many cases, malevolent actors that I really firmly believe we need to look much more
0:45:14 at things like enabling cryptographic certification of human-generated content, because doing that
0:45:19 with the machine-generated content is not going to work, but we definitely can cryptographically
0:45:21 certify human-generated content as such.
0:45:25 Basically, watermarking or something, some way to say, a human made this.
0:45:26 Exactly.
0:45:31 What would you be working on if you were not working in biology, on drug development?
0:45:32 Education.
0:45:37 Using artificial intelligence to democratize access to education.
0:45:43 What have you seen that has been impressive or compelling to you in that regard?
0:45:46 There are lots of little examples so far, and really countless.
0:45:53 It’s what’s happening at the Khan Academy, there are many examples of AI applied to education
0:45:56 problems in places like China, for example.
0:46:02 You have a bunch of very compelling examples in fiction, a book I really like by a guy named
0:46:08 Neil Stevenson, The Diamond Age, or Young Lady’s Illustrated Primer, that I recommend if
0:46:08 you just want to…
0:46:10 Everybody in AI talks about that.
0:46:11 Well, now they do, yeah.
0:46:13 Yeah, well, now they do.
0:46:15 You liked it before it was cool, I’m sure.
0:46:20 At one point, I thought it was really, really important to ensure that Neil Stevenson knows
0:46:27 that we are about to be able to build the primer, and so I ended up having coffee with
0:46:28 him to tell him.
0:46:29 Oh, that’s great.
0:46:36 So, at the end of the day, maybe the biggest inspiration there is my daughter.
0:46:45 She’s four and a half now, and I think she could, today, read, she can read okay, but she could
0:46:52 read, you know, grade school level if she had access to, you know, an AI tutor teaching her
0:46:53 how to read.
0:46:54 Does your daughter use AI?
0:46:57 Use, you know, AI chatbots?
0:47:05 Not directly without me, but we’ve actually used ChatGPT to implement an AI reading tutor
0:47:08 that works reasonably well.
0:47:11 I mean, we basically, you know, kind of as they call it now, vibe coding.
0:47:15 We vibe coded, and Amway wasn’t there for all of it.
0:47:17 It took some time, but she was there for some of it.
0:47:19 Oh, you vibe coded it with her?
0:47:24 Yeah, well, I mean, she was there, she, you know, she witnessed a good chunk of it, yes.
0:47:26 Although she was more interested in the image generation parts.
0:47:30 But yeah, we have a sketch of one that she quite enjoys.
0:47:35 So, that’s kind of like the extent of her at this age using AI directly.
0:47:48 Jakob Uskoreit is the CEO and co-founder of Inceptive, and the co-author of the paper,
0:47:49 attention is all you need.
0:47:55 Just a quick note, this is our last episode before a break of a couple of weeks,
0:47:57 and then we’ll be back with more episodes.
0:48:01 Please email us at problematpushkin.fm.
0:48:04 We are always looking for new guests for the show.
0:48:08 Today’s show was produced by Trina Menino and Gabriel Hunter-Chang.
0:48:13 It was edited by Alexandra Gerritsen and engineered by Sarah Brugger.
0:48:22 This is an iHeart Podcast.
0:00:41 If I were going to pick one paper from the past decade that had the biggest impact on the world, I would choose one called Attention is All You Need, published in 2017.
0:00:45 That paper basically invented transformer models.
0:00:53 You’ve almost certainly used a transformer model if you have used ChatGPT or Gemini or Claude or DeepSeek.
0:00:57 In fact, the T in ChatGPT stands for transformer.
0:01:08 And transformer models have turned out to be wildly useful, not just at generating language, but also at everything from generating images to predicting what proteins will look like.
0:01:16 In fact, transformers are so ubiquitous and so powerful that it’s easy to forget that some guy just thought them up.
0:01:22 But in fact, some guy did just think up transformers, and I’m talking to him today on the show.
0:01:34 I’m Jacob Goldstein, and this is What’s Your Problem, the show where I talk to people who are trying to make technological progress.
0:01:36 My guest today is Jakob Uskorej.
0:01:41 And just to be clear, Jakob was one of several co-authors on that transformer paper.
0:01:46 And on top of that, lots of other researchers were working on related things at the same time.
0:01:48 So a lot of people were working on this.
0:01:52 But the key idea did seem to come from Jakob.
0:01:55 Today, Jakob is the CEO of Inceptive.
0:02:00 That’s a company that he co-founded to use AI to develop new kinds of medicine.
0:02:03 And the company is particularly focused on RNA.
0:02:08 We talked about his work at Inceptive in the second part of our conversation.
0:02:12 In the first part, we talked about his work on transformer models.
0:02:21 At the time he started working on the idea for transformers—this is around a decade ago now—there were a couple of big problems with existing language models.
0:02:23 For one thing, they were slow.
0:02:29 They were, in fact, so slow that they could not even keep up with all the new training data that was becoming available.
0:02:31 A second problem?
0:02:35 They struggled with what are called long-range dependencies.
0:02:41 Basically, in language, that’s relationships between words that are far apart from each other in a sentence.
0:02:50 So to start, I asked Jakob for an example we could use to discuss these problems, and also how he came up with his big idea for how to solve them.
0:02:54 So pick a sentence that’s going to be a good object lesson for us.
0:02:59 Okay, so we could have—the frog didn’t cross the road because it was too tired.
0:03:01 Okay, so we got our sentence.
0:03:02 Yep.
0:03:11 How would the sort of big, powerful, but slow-to-train algorithm in 2015 have processed that sentence?
0:03:16 So basically, it would have walked through that sentence word by word.
0:03:19 And so it would walk through the sentence left to right.
0:03:25 The frog did not cross the road because it was too tired.
0:03:29 Which is logical, which is how I would think a system would work.
0:03:31 It’s more or less how we read, right?
0:03:34 It’s how we read, but it’s not necessarily how we understand.
0:03:35 Uh-huh.
0:03:43 That is actually one of the integral, I would say, for what we then—how we then went about trying to speed this all up.
0:03:44 I love that.
0:03:45 I want you to say more about it.
0:03:48 When you say it’s not how we understand, what do you mean?
0:04:00 So, on one hand, right, linearity of time forces us to almost always feel that we’re communicating language in order and just linearly.
0:04:08 It actually turns out that that’s not really how we read, not even in terms of our saccades, in terms of our eye movements.
0:04:11 We actually do jump back and forth quite a bit while reading.
0:04:12 Uh-huh.
0:04:21 And if you look at conversations, you also have highly nonlinear elements where there’s repetition, there’s reference, there’s basically different flavors of interruption.
0:04:27 But sure, by and large, right, we would say we certainly write them left to right, right?
0:04:33 So, if you write a proper text, you don’t write it as you would read it, and you also don’t write it as you would talk about it.
0:04:36 You do write it in one linear order.
0:04:47 Now, as we read this and as we understand this, we actually form groups of words that then form meaning, right?
0:04:51 So, an example of that is, you know, adjective noun, right?
0:04:54 It’s—or say, in this case, an article noun.
0:04:56 It’s not a frog, it’s the frog, right?
0:05:00 We could have also said it’s the green frog or the lazy frog.
0:05:01 Right.
0:05:02 Language has a structure, right?
0:05:07 And there are—things can modify other things, and things can modify the modifiers.
0:05:08 Exactly, exactly.
0:05:16 But the interesting thing now is that structure, as a tree-structured, clean hierarchy, only tells you half the story.
0:05:24 There’s so many exceptions where statistical dependencies, where modification actually happens at a distance.
0:05:30 So, okay, so just to bring this back to your sample sentence, the frog didn’t cross the road because it was too tired.
0:05:34 That word it is actually quite far from the word frog.
0:05:39 And if you’re an AI going from left to right, you may well get confused there, right?
0:05:43 You may think it refers to road instead of to frog.
0:05:48 So this is one of the problems you were trying to solve.
0:05:57 And then the other one you were mentioning before, which is these models were just slow, because after each word, the model just recalculates what everything means.
0:05:58 And that just takes a long time.
0:06:00 They can’t go fast enough.
0:06:00 Exactly.
0:06:07 It takes a long time, and it doesn’t play to the strengths of the computers, of the accelerators that we’re using there.
0:06:13 And when you say accelerators, I know Google has their own chips, but basically we mean GPUs now, right?
0:06:14 We mean GPUs.
0:06:17 We mean the chips that NVIDIA sells.
0:06:19 What is the nature of those particular chips?
0:06:19 Exactly.
0:06:32 So the nature of those particular chips is that instead of doing a broad variety of complex computations in sequence, they are incredibly good.
0:06:37 They excel at performing many, many, many simple computations in parallel.
0:07:06 And so what this hierarchical or semi-hierarchical nature of language enables you to do is instead of having, so to speak, one place where you read the current word, you could now imagine you actually read every, you look at everything at the same time, and you apply many simple operations at the same time to each position in your sentence.
0:07:08 Uh-huh. So this is the big idea.
0:07:12 I just want to pause here because this is it, right? This is the breakthrough happening.
0:07:12 Yes.
0:07:20 It’s basically, what if instead of reading the sentence one word at a time from left to right, we read the whole thing all at once?
0:07:28 All at once. And now the problem is, clearly something’s got to give, right? So there’s no free lunch in that sense.
0:07:33 You have to now simplify what you can do at every position when you do this all in parallel.
0:07:34 Uh-huh.
0:07:41 But you can now afford to do this a bunch of times after another and revise it over time or over these steps.
0:07:57 And so instead of walking through the sentence from beginning to end, rather, an average sentence has like 20 words or so, average sentence in prose, instead of walking those 20 positions, what you’re doing is you’re looking at every word at the same time, but in a simpler way.
0:08:12 But now you can do that maybe five or six times, revising your understanding. And that turns out is faster, way faster on GPUs. And because of this hierarchical nature of language, it’s also better.
0:08:34 So you have this idea. And as I read the little note on the paper, it was in fact your idea. I know you were working with a team, but the paper credits you with the idea. So let’s take this idea, this basic idea of look at the whole input sentence all at once a few times and apply it to our frog sentence. Give me that frog sentence again.
0:08:37 The frog did not cross the road because it was too tired.
0:08:43 Good. Tired is good because that’s unambiguous. Hot could be either one. It could be the road or the frog, right?
0:08:48 Hot could be either one, exactly, yes. In fact, hot could actually be either one.
0:08:52 And non-referential, and non-referential because it was too hot outside.
0:08:56 Outside, it could be any of three things, the weather or the frog or the road.
0:08:56 Exactly.
0:09:08 I love that. Tired solves the problem. So your model, this new way of doing things, how does it parse that sentence? What does it do?
0:09:19 So basically, let’s look at the word it and look at it in every single step of these, you know, say a handful of times repeated operation.
0:09:28 Imagine you’re looking at this word it, that’s the one that you are now trying to understand better, and you now compare it to every other word in the sentence.
0:09:39 So you compare it to the, to frog, to did not cross the road because too and tired, was too and tired.
0:09:57 And initially, in the first pass already, a very simple insight the model can fairly easily learn is that it could be strongly informed by frog, by road.
0:10:06 by nothing, but not so, by to or by the, or maybe only to a certain extent by was.
0:10:15 But if you want to know more about what it denotes, then it could be, you know, it could be informed by all of these.
0:10:21 And just to be clear, that sort of understanding arises because it has trained in this way on lots of data.
0:10:30 It’s encountering a new sentence after reading lots of other sentences with lots of pronouns with different possible antecedents, yeah.
0:10:31 Exactly, exactly.
0:10:43 So now, the interesting thing is that which of the two it actually refers to doesn’t depend only on what those other two words are.
0:10:48 And this is why you need these subsequent steps because, so let’s start with the first step.
0:10:56 So what now happens is that, say the model identifies frog and road could have a lot to do with the word it.
0:11:03 So now you basically copy some information from both frog and road over to it.
0:11:12 And you don’t just copy it, you kind of transform it also on the way, but you refine your understanding of it.
0:11:13 And this is all learned.
0:11:17 This is not given by rules or, you know, in any way pre-specified.
0:11:21 Right, just by training on lots of libraries.
0:11:22 Just by training, this emerges, exactly.
0:11:28 And so that sort of the meaning of it after this first step is kind of influenced by both frog and road.
0:11:30 Yes, both frog and road.
0:11:35 Okay, so now we repeat this operation again.
0:11:41 And we now know that it is unsure, or the model basically now has this kind of superposition, right?
0:11:43 It could be road, it could be frog.
0:11:46 But now, in the next step, it also looks at tired.
0:11:54 And somehow the model has learned that when it means something inanimate, that tired is not the thing.
0:12:00 And so maybe in context of tired, it is more likely to refer to frog.
0:12:10 And now you know, well, it is more likely, or now maybe the model has figured out already, maybe it needs a bit more, a few more iterations,
0:12:15 that it is most likely to refer to frog because of the presence of tired.
0:12:17 So it has solved the problem.
0:12:18 But it has solved the problem.
0:12:23 So you have this idea, you try it out.
0:12:26 There’s a detail that you mentioned that’s kind of fun, and we kind of skipped it.
0:12:31 But you mentioned that another one of the co-authors, who has also gone on to do very big things,
0:12:35 was about to leave Google when you sort of want to test this idea.
0:12:40 And that fact that he was about to leave Google was actually important to the history of this idea.
0:12:41 Tell me about that.
0:12:42 It was important.
0:12:51 So this Ilya Abdullah-Sushin, he was, at the time that this started to gain any kind of speed,
0:12:55 Ilya was managing a good chunk of my organization.
0:13:04 And the moment he really made the decision to leave the company, he had to wait, ultimately, for his co-founder
0:13:07 and for them to then actually get going together in earnest.
0:13:13 And so he had a few months where he knew, and I also knew, that he was about to leave.
0:13:20 And where, you know, the right thing would, of course, be to transition his team to another manager,
0:13:26 which we did immediately, but where he then suddenly was in a position of having nothing to lose.
0:13:33 And yet, quite some time left to play with Google’s resources and do cool stuff with interesting people.
0:13:41 And so that’s one of those moments where suddenly your appetite for risk as a researcher just spikes, right?
0:13:46 Because you have, for a few more months, you have these resources at your disposal,
0:13:49 you’ve transitioned your responsibilities.
0:13:53 At that stage, you’re just like, okay, let’s try this crazy shit.
0:14:00 And that’s literally, in so many ways, was one of the integral catalysts.
0:14:07 Because that also enabled this kind of mindset of, we’re going for this now.
0:14:11 Whatever the reason, it still, you know, affects other people.
0:14:16 And so there were others who joined that collaboration really, really early on,
0:14:21 who I feel were much more excited and, as a result, much more likely to really work on this
0:14:23 and to really give it their all.
0:14:30 Because of his, you know, nothing left to lose, I’m going to go for this attitude at this point, right?
0:14:34 Was there a moment when you realized it worked?
0:14:35 There were actually a few moments.
0:14:43 And it’s interesting because, on one hand, right, it’s a very gradual thing, right?
0:14:50 And initially, actually, it took us many months to get to the point where we saw significant first signs of life,
0:14:54 of this not just being a curiosity, but really being something that would end up being competitive.
0:14:57 So there certainly was a moment when that started.
0:15:05 There was another moment when we, for the first time, had one machine translation challenge,
0:15:09 one language pair of the W&T task, as it’s called,
0:15:14 where our score, our model performed better than any other single model.
0:15:19 The point in time when I think all of us realized this is special
0:15:25 was when we not only had the best one in one of these tasks, but in multiple.
0:15:29 And we didn’t just have the best number.
0:15:32 We also, at that point, were able to establish that we’ve gotten there
0:15:37 with about 10 times less energy or training compute spend.
0:15:38 Wow.
0:15:41 So you do one-tenth the work, and you get a better result.
0:15:43 One-tenth the work, and you get a better result,
0:15:47 not just across one specific challenge, but across multiple,
0:15:50 including the hardest, or one of the harder ones, right?
0:15:56 And then, at that stage, we were still improving rapidly.
0:16:00 And then you realize, okay, this is for real.
0:16:07 Because it wasn’t that we had to squeeze those last little bits and pieces of gain out of it.
0:16:10 It was still improving fairly rapidly,
0:16:15 to the point where actually, by the time we actually published the paper,
0:16:18 we, again, reduced the compute requirements,
0:16:21 not quite by an entire order of magnitude, but almost, right?
0:16:26 So it still was getting faster and better at a pretty rapid rate.
0:16:32 So we had, in the paper, we had some results that were those roughly 10x faster on eight GPUs.
0:16:35 And what we demonstrated, in terms of quality, on those eight GPUs,
0:16:40 by the time we actually published the paper properly, we were able to do with one GPU.
0:16:45 One GPU, meaning one chip of the kind that people buy 100,000 of now?
0:16:46 To build a data center?
0:16:47 Exactly.
0:16:55 So the paper, actually, at the end, mentions other possible uses beyond language for this technology.
0:17:00 It mentions images, audio, and video, I think, explicitly.
0:17:03 How much were you thinking about that at the time?
0:17:07 Was that just like an afterthought, or were you like, hey, wait a minute, it’s not just language?
0:17:12 By the time it was actually published at a conference, not just the preprint, by December,
0:17:17 we had initial models on other modalities, on generating images.
0:17:23 We had the first, at that time, they were not performing that well yet, but they were rapidly
0:17:24 getting better.
0:17:30 We had the first prototypes, actually, of models working on genomic data, working on protein
0:17:30 structure.
0:17:32 That’s good foreshadowing.
0:17:33 Good foreshadowing as well, exactly.
0:17:40 But then we ended up, for a variety of reasons, we ended up, at first, focusing on applications
0:17:41 in computer vision.
0:17:46 The paper comes out, you’re working on these other applications, you’re presenting the paper,
0:17:48 it’s published in various forms.
0:17:50 What’s the response like?
0:18:00 It was interesting because the response built in deep learning AI circles, basically, between
0:18:07 the preprint that I think came out in, I want to say, June 2017, and then the actual publication,
0:18:13 to the extent that by the time the poster session happened at the conference, there was quite
0:18:20 a crowd at the poster, so we had to be shoved out of the hall in which the poster session
0:18:25 happened by the security, and had very hoarse voices by the end of the evening.
0:18:30 You guys were like the Beatles of the AI conference?
0:18:36 I wouldn’t say that because we weren’t the Beatles, because it was really, it was still
0:18:37 very specific.
0:18:39 You were more the cool hipster band.
0:18:40 You were the it hipster band.
0:18:42 Certainly more the cool hipster band.
0:18:46 But it was an interesting experience because there were some folks, including some greats
0:18:50 in the field, who came by and said, wow, this is cool.
0:18:55 What has happened since has been wild, it seems.
0:18:57 Wild, to say the least, yes.
0:18:59 Is it surprising to you?
0:19:02 Of course, many aspects are surprising, for sure.
0:19:12 We definitely saw pretty early on, already back in 2018, 2019, that something really exciting
0:19:13 was happening here.
0:19:17 Now, I’m still surprised by it.
0:19:25 With the advent of ChatGPT, something that didn’t go way beyond those language models that
0:19:34 we had already seen a few years before was suddenly the world’s fastest growing consumer product.
0:19:35 Ever, right?
0:19:36 I think ever.
0:19:36 Ever.
0:19:37 Yes.
0:19:42 And by the way, GBT stands for Generative Pre-Train Transformer, right?
0:19:44 Transformer is your word.
0:19:44 That’s right.
0:19:50 So there’s an interesting, I don’t know, business side to this, right?
0:19:52 Which is, you were working for Google when you came up with this.
0:19:55 Google presumably owned the idea.
0:19:56 Yep.
0:19:59 Had intellectual property around the idea.
0:20:00 Has filed many a patent.
0:20:03 Was it just a choice Google made to let everybody use it?
0:20:09 Like, when you see the fastest growing consumer product in the history of the world, not only
0:20:13 built on this idea, but using the name, like, and it’s a different company.
0:20:14 That was five years later.
0:20:15 Five years later.
0:20:17 But a patent’s good for more than five years.
0:20:18 Is that a choice?
0:20:20 Is that a strategic choice?
0:20:21 What’s going on there?
0:20:29 So the choice to do it in the first place, to publish it in the first place, is really
0:20:36 based on and rooted in a deep conviction of Google at the time, and I’m actually pretty
0:20:43 sure it still is the case, that it is actually, these developments are the tide that floats all
0:20:45 boats, that lifts all boats.
0:20:45 Uh-huh.
0:20:47 Like a belief in progress.
0:20:49 A belief in progress.
0:20:49 Exactly.
0:20:50 Like a good old-fashioned belief in…
0:21:00 Now, it’s also the case that at the time, organizationally, that specific research arm was unusually separated
0:21:02 from the product organizations.
0:21:10 And the reason why brain, or in general, the deep learning groups, were more separated was
0:21:12 in part historical.
0:21:18 Namely, that when they started out, there were no applications, and the technology was not ready
0:21:19 for being applied.
0:21:26 And so it’s completely understandable and just, you know, a consequence of organic developments
0:21:34 that when this technology suddenly is on the cusp of being incredibly impactful, you’re probably
0:21:41 still underutilizing it internally and potentially also not yet treating it in the same way as you
0:21:45 would have maybe otherwise treated previous trade secrets, for example.
0:21:52 Because it feels like this out there research project, not like what’s going to be this consumer
0:21:53 product.
0:22:04 And to be fair, it took OpenAI, in this case, a fair amount of time to then turn this into this
0:22:04 product.
0:22:09 And most of that time, it also, from their vantage point, wasn’t a product, right?
0:22:18 So up until all the way through ChatGPT, OpenAI published all of their GPT developments, maybe
0:22:22 not all, but, you know, a very large fraction of their work on this.
0:22:24 Yeah, their early models, the whole models were open.
0:22:24 Exactly.
0:22:30 They were more true to their name, really, really also believing in the same thing.
0:22:37 And it was only really after ChatGPT and after this, to them also surprise, to a certain extent,
0:22:44 success, that they started to become more closed as well when it comes to scientific developments
0:22:44 in this space.
0:22:49 We’ll be back in just a minute.
0:23:04 Run a business and not thinking about podcasting?
0:23:05 Think again.
0:23:10 More Americans listen to podcasts than ad-supported streaming music from Spotify and Pandora.
0:23:14 And as the number one podcaster, iHeart’s twice as large as the next two combined.
0:23:17 So whatever your customers listen to, they’ll hear your message.
0:23:21 Plus, only iHeart can extend your message to audiences across broadcast radio.
0:23:23 Think podcasting can help your business?
0:23:24 Think iHeart.
0:23:26 Streaming, radio, and podcasting.
0:23:29 Call 844-844-iHeart to get started.
0:23:32 That’s 844-844-iHeart.
0:23:35 Let’s talk about your company.
0:23:37 When did you decide to start Inceptive?
0:23:45 The decision took a while and was influenced by events that happened over the course of
0:23:52 about three months, two to three months in late 2020, starting with the birth of my first
0:23:53 child.
0:23:57 So when Amre was born, two things happened.
0:24:03 Number one, witnessing a pregnancy and a birth during a pandemic where there’s a pathogen that’s
0:24:04 rapidly spreading.
0:24:08 And so all of that was a pretty daunting experience.
0:24:10 And everything went great.
0:24:21 But having this new human in my arms also really made me question if I couldn’t more directly
0:24:24 affect people’s lives positively with my work.
0:24:31 And so I was at the time quite confident that indirectly it would have effect also on things
0:24:33 like medicine, biology, et cetera.
0:24:39 But I was wondering, couldn’t this happen more directly if I focused more on it?
0:24:45 The next thing that happened was that AlphaFold 2 results at CASP-14 were published.
0:24:50 CASP-14 is this biannual challenge for protein structure prediction and some other related problems.
0:24:52 This is the protein folding problem.
0:24:54 And this is the protein folding problem, exactly.
0:24:57 So machine learning solving the protein folding problem, which had been a problem for decades,
0:25:01 given a chain of amino acids to predict the 3D structure of a protein.
0:25:02 Precisely.
0:25:05 And humans failed and machine learning succeeded.
0:25:06 Just amazing.
0:25:07 Yes.
0:25:08 It’s a great example.
0:25:14 Humans failed despite the fact that we actually understand the physics fundamentally, but we
0:25:19 still couldn’t create models that were good enough using our conceptual understanding
0:25:20 of the processes involved.
0:25:23 Yeah, you would think an algorithm would work on that one, right?
0:25:26 You would just think an old school set of rules.
0:25:28 Like, we know what the molecules look like.
0:25:30 We know the laws of physics.
0:25:33 It’s amazing that we couldn’t predict it that way, right?
0:25:36 All you want to know is what shape is the protein going to be?
0:25:37 You know all of the constituent parts.
0:25:39 You know every atom in it.
0:25:43 And you still couldn’t predict it with a set of rules, but AI, machine learning, could.
0:25:45 Amazing.
0:25:45 Yes.
0:25:46 And it is amazing.
0:25:51 Actually, when you put it like this, it’s important to point out that when we say we
0:25:54 understand it, we make massive oversimplifying assumptions.
0:25:59 Because we ignore all the other players that are present when the protein folds.
0:26:04 We ignore a lot of the kinetics of it because we say we know the structure, but the truth
0:26:08 is we don’t know all the wiggling and all the shenanigans that happen on the way there,
0:26:08 right?
0:26:14 And we don’t know about, you know, chaperone proteins that are there to influence the folding.
0:26:16 We don’t know around all sorts of other aspects.
0:26:17 I’m doing the physics one.
0:26:20 I’m doing the assume a frictionless plane version of protein.
0:26:21 Precisely.
0:26:22 Which is why it didn’t work.
0:26:25 And the beauty is that deep learning doesn’t need to make this assumption.
0:26:27 AI doesn’t need to make this assumption.
0:26:28 AI just looks at data.
0:26:35 And it can look at more data than any human or even humanity eventually could look at together.
0:26:41 It’s such a good example problem to demonstrate that these models are ready for prime time in
0:26:45 this field and ready for lots of applications, not just one or two, but many.
0:26:46 And so that happens.
0:26:48 They’ll be sold, exactly.
0:26:56 And then the third thing was that these COVID mRNA vaccines came out with astonishing 90 plus
0:26:57 percent efficacy.
0:26:57 So fast also.
0:26:58 Out of the gate.
0:27:02 How fast and how good they were is still so underrated.
0:27:03 Underrated.
0:27:07 At the beginning of the pandemic, people were like, it’ll be two or three years, and if they’re
0:27:08 60 percent effective, that’ll be great.
0:27:09 Exactly.
0:27:10 And so-
0:27:11 Everybody forgets that.
0:27:11 Everybody forgets it.
0:27:16 And when you look at it, this is a molecule family that was for, you know, most of the
0:27:20 most of the time that we’ve known about it, since the 60s, I suppose, we’ve treated it
0:27:24 like an addicted stepchild of molecular biology.
0:27:26 Because we’ve massed-
0:27:26 You’re talking about RNA in general?
0:27:27 RNA.
0:27:28 RNA in general.
0:27:28 Yeah.
0:27:30 Everybody loves DNA, right?
0:27:32 DNA is the movie star.
0:27:33 Exactly.
0:27:33 Exactly, exactly.
0:27:40 Even though now, looking back, DNA is merely, you know, the place where life takes its notes,
0:27:42 maybe the hard drive and the memory.
0:27:43 It’s the book, right?
0:27:43 It’s the book, right?
0:27:44 It’s the book.
0:27:50 So, but at the end of the day, it’s this molecule family that was about to save, you know, depending
0:27:54 on the estimate, tens of millions of lives, and in rapid time.
0:28:01 So all these things hold, but we have no training data to apply anything like alpha-fold to this
0:28:02 specific molecule family.
0:28:03 No training data to speak of.
0:28:07 We had 200,000 known protein structures at the time.
0:28:12 I believe, maybe optimistically, we had maybe 1,200 known RNA structures.
0:28:18 And on top of that, it was also fairly clear that for RNA, going directly to function would
0:28:22 be much, much more important because it’s, in a certain sense, a weak, less strongly structured
0:28:25 molecule and other aspects of the molecule might play a bigger role.
0:28:33 And then, on top of that, the attention that generative AI was receiving overall, also now
0:28:37 in the field of pharma or of medicine, was building.
0:28:45 And so I ended up finding myself in a conversation where a very, I’d say, wise, long-time mentor
0:28:52 of mine pointed out that, you know, maybe 10 years from now or so, somebody could tell my
0:28:58 daughter that there was this perfect storm where this macromolecule with no training data was
0:29:02 about to save the world and could do so much more in the direction of positively impacting
0:29:03 people’s lives.
0:29:05 We didn’t have training data.
0:29:07 It would be very expensive to create it.
0:29:11 But using the technology that I’ve been, or technologies that I’ve been working on for
0:29:16 the last, I don’t know, 10 plus years, and the ability, because of the attention that
0:29:22 people were now giving to AI in this field, the ability to raise quite a bit of money,
0:29:31 I, in that position, chose to stay back at my cushy dream job in big tech and not actually
0:29:35 take this opportunity to really positively impact people’s lives.
0:29:39 And that idea was not one I was willing to entertain.
0:29:44 You couldn’t just coast it out at Google and let somebody else go figure out RNA.
0:29:44 Yeah.
0:29:46 And it’s not just RNA.
0:29:49 I think RNA is a great starting point at the end of the day.
0:29:57 But building models that learn from, first of all, all the publicly available data that we
0:30:01 can possibly get our hands on, but also from data that we can reasonably effectively create
0:30:08 in our own lab, how to design molecules for specific functions, is something that now is
0:30:15 within reach and that will, in the next years and in the years to come, have completely transformational
0:30:18 impact on how we even think about what medicines are.
0:30:26 That any opportunity to speed this up, to make this happen, even just a day sooner than it could
0:30:29 have otherwise happened, is incredibly valuable, in my opinion.
0:30:36 As you’re talking about this idea that the absence of training data seems to be at the
0:30:36 center of it, right?
0:30:40 It seems to be the core problem, which makes sense, right?
0:30:44 Like, the reason language works so well is basically because of the internet.
0:30:48 I know now we’re going beyond it, but it just happened to be that there was this incredibly
0:30:52 giant set of natural language that became available.
0:30:54 We don’t have anything like that for RNA.
0:31:00 So are you, I mean, it’s kind of step one at Inceptive, creating the data?
0:31:02 Is that kind of what’s happening?
0:31:08 So step one at Inceptive is learning to use all the data, or was, I think we’ve made a lot
0:31:12 of progress in that direction, learning to use all the data that is available already, and
0:31:15 identify what other data we’re missing.
0:31:20 And then see how far we can get with just the publicly available data, and at the same
0:31:23 time scale up generating our own data.
0:31:29 And it turns out that actually, because of the nature of evolution, because of how evolution
0:31:38 isn’t actually incentivized to really explore the entire space of possibilities, it is almost
0:31:44 always a given that if you are trying to design exceptional molecules, especially ones that
0:31:52 are not, say, you know, natural formats, you are basically guaranteed to need novel training
0:31:52 data.
0:31:53 Yeah.
0:31:57 Basically, you’re saying you build RNAs that don’t exist in the world, that have therapeutic
0:32:01 uses, and there’s no, kind of definitionally, no training data for that, because they don’t
0:32:01 exist.
0:32:07 The funny thing is, we have a few of them, and so we have existence proofs of RNA molecules,
0:32:17 for example, RNA viruses, that actually exhibit incredibly complex, different functions in ourselves
0:32:23 that do all sorts of things that we don’t usually like, but if we could use those, you know, for
0:32:30 good, if we could use those, you know, in ways that would actually be aimed at fighting disease
0:32:36 rather than creating them, those kinds of functions, even just a small subset of them, would really
0:32:37 transform medicine already.
0:32:38 And so we know it’s possible.
0:32:40 What are you dreaming of when you say that?
0:32:41 What are you thinking of, specifically?
0:32:41 Okay.
0:32:49 So, for example, right, one estimate is that in order for COVID to infect you, you would need
0:32:55 potentially as few as five COVID genomes inside your organism.
0:32:55 That’s already it.
0:32:57 Five viral particles?
0:32:58 Five viral particles.
0:32:58 Yeah.
0:33:00 You inhale those.
0:33:03 You wouldn’t have to inject it.
0:33:05 You wouldn’t even have to swallow it.
0:33:06 You inhaled them.
0:33:10 What if we could have a medicine that worked as well as a disease, is a version of your dream.
0:33:10 Exactly.
0:33:11 Exactly.
0:33:18 So, at the end of the day, right, this medicine is able to spread in your body only into certain
0:33:20 types of organs and tissues and cells.
0:33:24 It does certain things there that are really quite complex, right?
0:33:25 Changing the cells’ behavior.
0:33:26 Yeah.
0:33:31 Again, not usually in this case in favorable ways, but still in ways that wouldn’t have
0:33:36 to be modified that much in order to potentially be exactly what you would need for a complex
0:33:37 multifactorial medicine.
0:33:42 And if you could make all of that happen by just inhaling five of those molecules, then,
0:33:45 again, that would completely change how you think about medicine, right?
0:33:50 You have viruses that aren’t immediately active, but that are inactive for long periods of time
0:33:51 in your organism.
0:33:58 And only under certain conditions, say, under certain immune conditions, really start being
0:33:59 reactivated.
0:34:06 Why can’t we have medicines that work in a similar way, where you actually, not only in
0:34:11 a vaccination sense, but where you take a medicine for a genetic predisposition for a
0:34:15 certain disease, that you are able to design a medicine that you can take and that waits until
0:34:17 the disease actually starts to develop.
0:34:21 And only then, and only where that disease then starts to develop, becomes active and
0:34:22 actually facts it.
0:34:25 And potentially also then alarms the doctor through a blood test.
0:34:28 Like for cancer cells or something.
0:34:33 So you have some kind of prophylactic medicine in your body, and it is encoded in such a way
0:34:38 that it just hangs out there like herpes, to take a pathological example.
0:34:39 For example, yes.
0:34:42 And only in certain settings does it do anything.
0:34:46 And those settings are, if you see a cancer cell, destroy it.
0:34:47 Otherwise, just sit there.
0:34:48 Precisely.
0:34:53 And if you can design those also in ways where you can just make them all go away when you
0:34:58 take a, say, a completely harmless small molecule, and that’s, again, entirely feasible.
0:34:59 Sure.
0:35:01 So, I mean, you’re dreaming big.
0:35:04 These are wonderful big, you know, science fiction-y dreams, and I hope you figure them
0:35:05 out.
0:35:09 On a practical level, what’s happening at the company right now?
0:35:10 How many people work there?
0:35:10 What are they doing?
0:35:12 And what have they figured out so far?
0:35:13 We’re around 40.
0:35:17 What we’re doing is really exactly what we just talked about.
0:35:25 We’re basically scaling data generation experiments in our lab that allow us to assess a variety
0:35:32 of different functions of different, mostly RNA molecules, actually mostly mRNA molecules at the
0:35:38 moment, that are relevant to a pretty broad variety of different diseases.
0:35:45 And so, this ranges from things like infectious disease vaccines to cell therapies that can be applied
0:35:49 in oncology or against autoimmune disease.
0:35:56 We have mRNAs that we hope will eventually be effective in enzyme replacement as enzyme replacement
0:36:00 therapies for families of, or a large family of rare diseases.
0:36:02 And the list goes on.
0:36:10 And so, we’re creating this, or growing this training data set that eventually, on top of
0:36:17 foundation and models that we pre-trained on all publicly available data, allow us to tune
0:36:25 those foundation models towards designing exceptional molecules for exactly those applications and many
0:36:26 more sharing similar properties.
0:36:33 So, you basically build new mRNA molecules and test them, and then you give that data to
0:36:38 your model, and presumably it tells you what to build next, or it helps you figure out what to
0:36:39 build next.
0:36:40 It’s sort of a loop in that way?
0:36:46 The models are definitely one interesting source for proposals, if you wish, for what to synthesize
0:36:48 and test next.
0:36:49 They’re not the only such source.
0:36:55 So, we basically also explore kind of in maybe less guided or heuristically guided ways.
0:36:57 But, exactly.
0:37:00 So, in some of the cases, it’s really quite iterative.
0:37:06 For some of those functions and for some of those modalities and diseases or disease targets,
0:37:12 we’re actually already at a point where our models can spit out entirely novel molecules that
0:37:18 really are unlike anything they’ve ever seen or we’ve ever seen in nature, that very consistently
0:37:24 perform quite favorably compared to pretty strong baselines by incumbents in the field.
0:37:31 When you say perform quite favorably compared to baselines by incumbents in the field, I mean,
0:37:35 does that on some level mean better than what experts would think up?
0:37:40 Better than what experts can think of and also better than more traditional machine learning
0:37:41 tools can easily produce.
0:37:48 It’s like that famous moment in the Go match when AlphaGo made some move that, like, no human
0:37:49 being ever would have thought of.
0:37:50 Move 37.
0:37:51 Yes.
0:37:59 So, I would say we’ve long passed the Move 37 in the sense that our understanding of the
0:38:06 underlying biological phenomena is so incomplete that for most of the things that we’re able
0:38:09 to design for, we don’t really understand why they happen.
0:38:10 Huh.
0:38:13 When you say we, do you mean at Inceptive or do you mean just medicine in general?
0:38:15 I would say just medicine in general.
0:38:16 Okay.
0:38:21 So, Inceptive is doing this very kind of high-level work, right?
0:38:24 I mean, building what will hopefully be the foundation.
0:38:28 What’s the right amount of time in the future to ask about?
0:38:29 When will we know if it works?
0:38:31 Do you think five years?
0:38:40 So, the general idea of using generative AI and similar techniques to generate therapeutics,
0:38:46 there are some things in clinical trials that were largely designed with AI.
0:38:55 As far as I know, we’re still, maybe now we have the first trials just now starting for
0:38:58 molecules that were truly entirely designed by AI.
0:39:01 As opposed to sort of selected from a library?
0:39:03 Selected, influenced, exactly.
0:39:06 Selected, adjusted, tuned, tweaked, et cetera, right?
0:39:10 So, that’s really still only happening just now.
0:39:10 Okay.
0:39:17 But we will see, I believe, the first success or a first success of such molecules, certainly
0:39:18 within the next five years.
0:39:21 What about more narrowly the project at Inceptive?
0:39:23 It’s a similar timeframe.
0:39:30 We should be able to get molecules into the clinic in the next few years, certainly in the
0:39:31 next handful of years.
0:39:40 Now, these will not be molecules with, where the objective that we used in their design is,
0:39:45 you know, even remotely as complex or the, you know, kind of the different functions that
0:39:52 we’re designing for are not going to be even remotely as diverse as, say, what you would find in,
0:39:55 because we used this example earlier in RNA virus.
0:39:58 These will really be more, you know, simpler.
0:40:04 Those will be molecules that don’t do things that we couldn’t possibly have done before,
0:40:11 but that do them much better in ways that are more accessible, in ways that come with less side
0:40:11 effects.
0:40:15 What biotech largely is, is they make protein drugs.
0:40:21 And so if you could make an mRNA drug where you put the mRNA into the body and the body makes the protein,
0:40:25 it wouldn’t be some crazy sleeper cell that sits in your body for 20 years or whatever.
0:40:29 But it might be a more practical alternative to today’s biotech drugs.
0:40:30 Absolutely.
0:40:35 So you’ve had a kind of crash course in biology in the last few years.
0:40:35 Yes.
0:40:41 And I’m curious, like, what is, what is something that has been particularly compelling or surprising
0:40:44 or interesting to you that you have learned about biology?
0:40:46 There are countless things.
0:40:57 The biggest one or the red thread across many of them is really just how effective life is
0:41:07 at finding solutions to problems that, on one hand, are incredibly robust, surprisingly robust,
0:41:16 and on the other hand, are so different from how we would design solutions to similar problems.
0:41:17 Aha.
0:41:23 That really, this comes back to this idea that we might just not be particularly well-equipped
0:41:30 in terms of cognitive capabilities to understand biology, that basically, you know, we are,
0:41:34 we would never think to do it this way.
0:41:38 And how we think to do it is oftentimes much more brittle.
0:41:40 Aha.
0:41:41 Brittle is an interesting world.
0:41:45 Less resilient, less able to persist under different conditions.
0:41:46 Exactly.
0:41:46 Exactly.
0:41:49 I mean, you know, we still haven’t built machines that can fix themselves, for one.
0:41:53 Which is fundamentally the miracle of being a human being.
0:41:54 Which is fundamentally the miracle of life.
0:41:56 I’m still here after going through all this.
0:41:56 Exactly.
0:41:57 Exactly.
0:41:57 Exactly.
0:42:00 And so, and of course, this is true across the scales, right?
0:42:05 From, you know, single cells all the way to complex organisms like ourselves.
0:42:18 And really just how many also very different kinds of solutions life has found and or constantly
0:42:19 is finding.
0:42:22 And you see this all over the place.
0:42:30 And it’s both daunting, humbling, but also incredibly inspiring when it comes to applying
0:42:31 AI in this area.
0:42:37 Because again, I think that at least so far, it’s the best tool and maybe actually the only
0:42:47 tool we have so far in face of this kind of complexity, really design interventions that go way beyond
0:42:51 what we were able to do or are able to do just based on our own conceptual understanding.
0:42:57 We’ll be back in a minute with the lightning round.
0:43:13 Run a business and not thinking about podcasting?
0:43:13 Think again.
0:43:18 More Americans listen to podcasts than ad-supported streaming music from Spotify and Pandora.
0:43:22 And as the number one podcaster, iHeart’s twice as large as the next two combined.
0:43:25 So whatever your customers listen to, they’ll hear your message.
0:43:30 Plus, only iHeart can extend your message to audiences across broadcast radio.
0:43:31 Think podcasting can help your business?
0:43:33 Think iHeart.
0:43:35 Streaming, radio, and podcasting.
0:43:38 Call 844-844-IHEART to get started.
0:43:40 That’s 844-844-IHEART.
0:43:43 Let’s finish with the lightning round.
0:43:50 As an inventor of the Transformer model, are there particular possible uses of it that
0:43:53 worry you slash make you sad?
0:44:03 I am quite concerned about the P-Doom, Doomerism, whatever you want to call it, existential fear
0:44:11 instilling rhetoric that is in some cases actually also promoted by people, by entities in the space.
0:44:14 So just to be clear, you’re not worried about the existential risk.
0:44:17 You’re worried about people talking about the existential risk.
0:44:27 I’m worried about the existential risk being inflated, or the perception being inflated to the extent
0:44:34 that we actually don’t look enough at some of the much more concrete and much more immediate risks.
0:44:39 I’m not going to say that the existential risk is zero, but that would be silly.
0:44:44 What is a concrete and immediate risk that is, you think, under-discussed?
0:44:52 These large-scale models are such effective tools in manipulating people in large numbers already
0:44:59 today, and it’s happening everywhere for many, many different purposes by, in some cases, benevolent,
0:45:06 and in many cases, malevolent actors that I really firmly believe we need to look much more
0:45:14 at things like enabling cryptographic certification of human-generated content, because doing that
0:45:19 with the machine-generated content is not going to work, but we definitely can cryptographically
0:45:21 certify human-generated content as such.
0:45:25 Basically, watermarking or something, some way to say, a human made this.
0:45:26 Exactly.
0:45:31 What would you be working on if you were not working in biology, on drug development?
0:45:32 Education.
0:45:37 Using artificial intelligence to democratize access to education.
0:45:43 What have you seen that has been impressive or compelling to you in that regard?
0:45:46 There are lots of little examples so far, and really countless.
0:45:53 It’s what’s happening at the Khan Academy, there are many examples of AI applied to education
0:45:56 problems in places like China, for example.
0:46:02 You have a bunch of very compelling examples in fiction, a book I really like by a guy named
0:46:08 Neil Stevenson, The Diamond Age, or Young Lady’s Illustrated Primer, that I recommend if
0:46:08 you just want to…
0:46:10 Everybody in AI talks about that.
0:46:11 Well, now they do, yeah.
0:46:13 Yeah, well, now they do.
0:46:15 You liked it before it was cool, I’m sure.
0:46:20 At one point, I thought it was really, really important to ensure that Neil Stevenson knows
0:46:27 that we are about to be able to build the primer, and so I ended up having coffee with
0:46:28 him to tell him.
0:46:29 Oh, that’s great.
0:46:36 So, at the end of the day, maybe the biggest inspiration there is my daughter.
0:46:45 She’s four and a half now, and I think she could, today, read, she can read okay, but she could
0:46:52 read, you know, grade school level if she had access to, you know, an AI tutor teaching her
0:46:53 how to read.
0:46:54 Does your daughter use AI?
0:46:57 Use, you know, AI chatbots?
0:47:05 Not directly without me, but we’ve actually used ChatGPT to implement an AI reading tutor
0:47:08 that works reasonably well.
0:47:11 I mean, we basically, you know, kind of as they call it now, vibe coding.
0:47:15 We vibe coded, and Amway wasn’t there for all of it.
0:47:17 It took some time, but she was there for some of it.
0:47:19 Oh, you vibe coded it with her?
0:47:24 Yeah, well, I mean, she was there, she, you know, she witnessed a good chunk of it, yes.
0:47:26 Although she was more interested in the image generation parts.
0:47:30 But yeah, we have a sketch of one that she quite enjoys.
0:47:35 So, that’s kind of like the extent of her at this age using AI directly.
0:47:48 Jakob Uskoreit is the CEO and co-founder of Inceptive, and the co-author of the paper,
0:47:49 attention is all you need.
0:47:55 Just a quick note, this is our last episode before a break of a couple of weeks,
0:47:57 and then we’ll be back with more episodes.
0:48:01 Please email us at problematpushkin.fm.
0:48:04 We are always looking for new guests for the show.
0:48:08 Today’s show was produced by Trina Menino and Gabriel Hunter-Chang.
0:48:13 It was edited by Alexandra Gerritsen and engineered by Sarah Brugger.
0:48:22 This is an iHeart Podcast.
Jakob Uszkoreit is the CEO and co-founder of Inceptive, a biotech start-up. He’s also a co-author of “Attention is All You Need,” the paper that created transformer models. Today, transformers power chatbots like ChatGPT and Claude. They’ve also led to breakthroughs in everything from generating images to predicting the structure of proteins.
On today’s show, Jakob talks about the invention of transformer models. And he discusses how he’s using those models to try to invent new kinds of medicine, with a particular focus on RNA.
See omnystudio.com/listener for privacy information.
Leave a Reply