AI transcript
0:00:11 Hello, and welcome to the NVIDIA AI podcast.
0:00:14 I’m your host, Noah Kravitz.
0:00:15 The intersection of AI and biology is one of the most fascinating and
0:00:19 promising areas of modern technology and research.
0:00:22 My guest today is working at the leading edge of this field in his role as CTO of
0:00:26 Basecamp Research. Basecamp, who’s a member of the NVIDIA Inception Program for
0:00:30 startups, is leveraging their unprecedented knowledge of the natural world
0:00:35 to create better food, better medicines, and better products for the planet.
0:00:39 Basecamp has collected an unprecedented data set,
0:00:41 capturing orders of magnitude more diverse biological data than any public
0:00:45 resources, and they’re leveraging this data for deep learning and
0:00:48 GNI applications. Here to shed light on what that means for his company and
0:00:52 for all of us is Phil Lorenz, Chief Technology Officer at Basecamp.
0:00:57 Phil, thanks so much for taking the time to join the podcast.
0:01:00 >> Thanks so much for having me.
0:01:01 >> How’s your trip been so far?
0:01:02 We’re recording on, I guess this is day three of GTC, so
0:01:05 you’ve been in town for a few days. How’s the conference?
0:01:08 >> It’s amazing. It’s great to see some friends who I haven’t seen in a while.
0:01:12 So that’s actually been great meeting with folks from NVIDIA that we’ve been
0:01:16 working with for a while. And yeah, I mean, lots of great networking opportunities,
0:01:20 including people that are really far outside the life science industry.
0:01:24 But it’s great to see what everyone else is doing, so really exciting, yeah.
0:01:28 >> It is, it’s nice to be back in person after several years here.
0:01:31 So let’s start with the basics.
0:01:33 Maybe you can tell us what Basecamp research is, how you were founded, what you do.
0:01:38 >> Yeah, of course. Maybe just take a step back with respect to why we’re doing what
0:01:42 we’re doing and how we thought of this.
0:01:44 I guess if you think about kind of the life science,
0:01:47 it’s probably one of the most exciting domains to apply AI to, I think.
0:01:50 And we obviously have a lot of kind of human clinical data collected
0:01:54 in the last few years and decades.
0:01:56 But when it comes to kind of life on Earth and biology as a whole, we actually
0:02:00 haven’t because there’s probably about 10 to the 26 species out there.
0:02:04 Which is a lot. And we’ve sequenced a few million.
0:02:08 And so if you kind of make that comparison in terms of what we know about life on
0:02:12 Earth, that’s about five drops of water compared to the Atlantic Ocean,
0:02:15 which is what we don’t know.
0:02:16 So if that’s the kind of place you’re starting with and
0:02:19 everything the life science industry has ever built,
0:02:22 is based on that tiny knowledge of life on Earth, that kind of slice.
0:02:26 We thought that if you want to do deep learning for the life sciences really well,
0:02:30 there’s exciting algorithms being built, exciting architectures that we can do.
0:02:34 But at the same time, we feel like, okay, there’s actually big data problem to begin with.
0:02:39 And so to do this from first principles, what we’ve done over the last two or
0:02:44 three years is we’ve built partnerships with nature parks across five continents.
0:02:49 Including places like the Antarctic and Rainforest and Volcanic Islands and
0:02:54 you name it.
0:02:55 And we have professional explorers that go to these places and
0:02:58 they sequence the biodiversity in these areas.
0:03:01 Lot of microbes, because that’s where the greatest diversity of life on Earth is.
0:03:06 And we do this in partnerships with these nature parks.
0:03:09 >> When you say sequence the biodiversity,
0:03:12 am I getting that right, sequence the biodiversity that they found?
0:03:15 What does that mean for kind of a lay person?
0:03:17 >> Absolutely, yeah, realizing that I’m a heavy life scientist.
0:03:20 >> I’m saying the lay person, I really mean me.
0:03:22 >> No, that’s completely fair.
0:03:23 Sequence basically means every organism has a genome, has a DNA.
0:03:27 And so sequencing the genomes of all of these unknown organisms.
0:03:32 >> Okay, and out in, is that just to kind of take attention for a second,
0:03:36 can that be done out in the field remotely?
0:03:39 Or are we at the point with gene sequencing where you can do that on site or
0:03:42 how does that work?
0:03:43 >> Yeah, so when the founders started Basecam,
0:03:45 they actually did that on site.
0:03:47 They were on an ice cap in Iceland with a solar powered tent.
0:03:52 That’s how the company started.
0:03:53 Now, because we are doing this at scale, we’re extracting the DNA there and
0:03:57 then we’re sequencing the DNA at a much larger scale in Europe.
0:04:01 Yeah, and I think what’s exciting is, again, this is underappreciated home,
0:04:06 the vastness of unknown life on Earth.
0:04:08 We now have a database just within two years that has samples collected from
0:04:14 the Antarctic and volcanic islands and all the life that exists in these places.
0:04:20 That’s now a few orders of magnitudes, more diverse than all public data combined.
0:04:25 And what we’re doing on top of this is not just collect hundreds of millions of
0:04:30 new protein or DNA sequences, but also the chemical environment,
0:04:34 the geological environment, and connected all of that information together in
0:04:39 a knowledge graph that now has about six billion relationships.
0:04:42 And so that really gives us a good information of entirely a new,
0:04:46 never seen before information that has never existed before.
0:04:49 >> So to go back to your analogy about drops of water in the ocean,
0:04:53 if we were at sort of five drops of water compared to the ocean of knowledge,
0:04:58 how much more knowledge have you been able to accrue in these past couple of years?
0:05:02 >> Yeah, we’re probably still a few orders of magnitudes away from the Atlantic Ocean.
0:05:09 We’re trying to be maybe a cup of water getting close to that.
0:05:13 That’s kind of the goal in the next few months.
0:05:16 >> That’s a lot of progress, that’s amazing.
0:05:17 And so what do you do with the data?
0:05:19 >> There’s several things, I mean, we have a huge,
0:05:21 there’s just a huge engineering effort to just annotate all of this data,
0:05:24 to organize all of this data because that’s, it’s now bigger than most public
0:05:28 database and that’s actually a big undertaking.
0:05:31 The exciting application really is to see, okay,
0:05:33 there’s some architectures for deep learning and
0:05:35 biology such as structure prediction of proteins.
0:05:38 And now that we have so much more diverse data,
0:05:40 what we can do is actually leverage some of the algorithms or
0:05:43 architectures that exist and apply that to our data advantage.
0:05:46 And so one thing that we’ve now built is called base fold,
0:05:49 which is using a similar architecture to alpha fold.
0:05:52 But we can actually be up to six times more accurate because we have so
0:05:55 much more diverse and additional sequence information.
0:05:58 And so that’s something that’s really exciting because there’s obviously a lot
0:06:02 of work and effort being done on using a different algorithm,
0:06:07 different methods, but actually data,
0:06:09 especially in the life science makes such a big difference.
0:06:11 And so that’s something that we’re really excited to use.
0:06:14 >> What the right way to get into this is,
0:06:15 I want to ask about what your sort of day-to-day life is as the CTO.
0:06:19 But then I’m also curious, what happens?
0:06:21 Are you working with partners across academia, industry to leverage the data,
0:06:28 to create better medicines, better foods,
0:06:30 those kinds of things we talked about the intro?
0:06:33 So either way, talk us through kind of what you do as CTO.
0:06:37 And then maybe from there we can talk about some of the partnerships.
0:06:40 >> My role is that I do a lot of very different things at the same time.
0:06:43 The one thing I’m not doing anymore at all is coding.
0:06:46 >> [LAUGH]
0:06:47 >> I did that at the start, but that’s kind of the one thing I’m not really doing.
0:06:50 But that’s probably a good thing.
0:06:52 I think my role has kind of three, four main things.
0:06:55 The first is actually making sure that the data collection process and
0:07:00 how we enter that into our database.
0:07:02 We have a genomics team, that’s amazing.
0:07:03 And they’re doing incredible work dealing with all of this data and
0:07:07 having really high quality annotations for that.
0:07:10 That’s kind of one thing.
0:07:11 And the data engineering, how we organize all of our infrastructure,
0:07:14 the deep learning work, applying this data advantage to the most exciting
0:07:18 AI applications, and then the product team.
0:07:20 And that’s actually what you mentioned with respect to what our
0:07:24 partnerships look like with therapeutics or biotech companies.
0:07:28 Some of them, they will ask us, do you have a protein that can do this
0:07:32 function, like an enzyme that can break down plastic?
0:07:34 And then we work with them on that.
0:07:35 >> Right.
0:07:36 >> Or sometimes gene editing systems that will cure genetic diseases.
0:07:41 And we have these in our database occurring naturally.
0:07:44 But because again of our data advantage,
0:07:46 we can use generative algorithms to actually generate these assets.
0:07:50 And then they license them from us, and then they use them for
0:07:53 downstream clinical applications or whatever that might be.
0:07:56 >> I’m familiar with the basic idea of gene editing, but not much beyond that.
0:08:02 If a company comes to you and for instances they need an enzyme
0:08:05 that can break down plastics.
0:08:07 >> Yeah.
0:08:07 >> And let’s say, I don’t know if that occurs in the natural world or not.
0:08:11 But let’s say it doesn’t.
0:08:13 What can you do then?
0:08:14 How does that work?
0:08:15 >> So there’s kind of multiple ways in which we look at this.
0:08:20 The first one is, let’s say someone wants an enzyme to create plastic.
0:08:23 In many cases, that assumption of let’s say,
0:08:26 we’re not sure whether this exists in nature or not.
0:08:29 It’s actually that assumption is made on what we know from public data.
0:08:32 And actually, we are making new biological discoveries based on our data set all the time.
0:08:37 So that’s kind of one argument.
0:08:39 But then there’s another way of thinking about this, is that because of the data
0:08:43 advantage, when we use deep learning to optimize or
0:08:46 generate these enzymes for a specific function.
0:08:50 Because we have explored sequence space and evolution so much more,
0:08:53 we can actually understand how to get towards even these unnatural reactions.
0:08:58 Much easier in many situations.
0:09:00 >> So you said it’s been about two a little over two years.
0:09:02 >> Yeah. >> You’ve been gathering the data sets.
0:09:04 How much has AI, the technology that’s available that compute as well as data,
0:09:12 how much has that world changed in those couple of years relative to you thinking
0:09:19 about, well, we’re collecting this enormous amount of data and it’s fantastic.
0:09:24 And it can open so many doors.
0:09:26 But do we have enough compute?
0:09:28 Do we have the right algorithms to be able to work with it?
0:09:31 From the outside, and especially we talk a lot about generative AI these days.
0:09:35 For the mainstream, that world has exploded in that time period.
0:09:39 But for the kind of work you’re doing, has the change been as dramatic?
0:09:42 >> Absolutely, there were a couple of situations a few years ago even where we
0:09:49 had maybe not the same size of the data set, but the same foundational architecture
0:09:53 with the long genome context and all of this metadata that we collect.
0:09:58 Where I was actually thinking, oh, damn, I don’t think there’s an architecture out
0:10:01 there that can deal with our data.
0:10:03 And that is slowly starting to change.
0:10:06 And I think one of the most exciting things in kind of the deep learning applied
0:10:10 to biological tasks in the past few months and years is that what all people have
0:10:14 done is thinking about what additional biological context can I include into my
0:10:20 language model architecture or something.
0:10:21 So not just use something from a different domain and
0:10:25 force that architecture onto a biological question, but think about how can I change
0:10:29 that model architecture in a way that represents biology much better.
0:10:33 And I think that kind of trend in the last few months is really exciting.
0:10:37 And it yields much better results as well, and that’s great.
0:10:39 >> Are you taking off the shelf models and fine training them for your,
0:10:44 I mean fine tuning, excuse me, for your own use?
0:10:47 Or are you building models from scratch?
0:10:50 How does that work?
0:10:52 >> We’re doing both.
0:10:53 So on the folding problem, we use pretty much alpha false architecture that exists
0:10:57 because I think it’s pretty good and it works.
0:11:00 And for that, it’s purely just doing a much better job because of the data that we have.
0:11:05 We’ve also built our own architectures and models one for annotation.
0:11:10 There’s a lot of what are called functional dark matter sequences where we have a sequence,
0:11:14 but we have no idea what it does.
0:11:16 And if you have sequences from the Antarctic that have never been seen before,
0:11:19 it’s actually important for us to be able to computationally say what they do.
0:11:22 And so for that, we’ve developed some contrastive deep learning algorithms
0:11:27 to annotate them at a pretty high accuracy and we presented that in Europe’s last year.
0:11:31 So we kind of do both.
0:11:32 It depends what we feel like is worth building something from scratch
0:11:37 versus just leveraging our data advantage.
0:11:39 But even when we built something from scratch,
0:11:41 we’re always leveraging our data advantage as well.
0:11:43 So it’s kind of, we’re doing both, whatever works.
0:11:46 >> I’m speaking with Phil Lorenz.
0:11:48 Phil is the Chief Technology Officer at Basecamp Research.
0:11:51 And we’re speaking high above the show floor here at GTC 2024,
0:11:56 our podcast recording area has a nice window view of the show floor coming to life this morning.
0:12:02 Phil, you came from an academic background.
0:12:05 You were at University of Oxford before joining Basecamp.
0:12:08 Maybe you can walk us through your journey a little bit.
0:12:11 And then we can talk a little bit about what’s the same,
0:12:13 what’s different from moving from academia into your role now.
0:12:17 >> Yeah, definitely. I’m a traditional life scientist at heart.
0:12:20 You have a lot of people in healthcare lifesigns now
0:12:23 that have a computer science background walking into that.
0:12:25 And that’s amazing. That’s incredible.
0:12:27 I’m a little bit from the kind of molecular biology, traditional background,
0:12:33 which is what I did during my undergrad.
0:12:34 And then moved towards more kind of deep learning applied to
0:12:38 genomics sequencing data for my PhD.
0:12:41 I discovered a couple of new human genes and transcription start site
0:12:45 using deep learning during my PhD.
0:12:47 So that’s kind of where I worked on for a while.
0:12:49 One thing I always really cared about, whatever you do in the life science
0:12:53 of healthcare industry, is thinking about the problem you’re trying to solve first.
0:12:58 And then going backwards and thinking, OK, what kind of technologies, tools
0:13:02 can you use to address that problem?
0:13:04 That’s kind of always thinking about what you’re trying to do in that kind of
0:13:07 order of events. And that’s still how I think about this now.
0:13:10 Even though I wouldn’t really say I’m a traditional life scientist anymore,
0:13:14 but always thinking about what you’re trying to solve and then what do you need
0:13:17 to do in that kind of philosophy is still something I think about quite a lot.
0:13:21 Even though my PhD was quite applied, so it wasn’t too academic,
0:13:25 which is maybe a good thing.
0:13:28 But yeah, that’s kind of how I came into what I’m doing now.
0:13:31 Did you learn the code when you were younger, kind of out of an interest
0:13:34 in computer science and learning how to code?
0:13:37 Or was it more of a in your work in the life sciences?
0:13:40 You kind of hit a point where you realized, oh, this will be faster
0:13:43 if I learn how to write scripts.
0:13:44 It was almost, I started coding almost out of a necessity.
0:13:49 When I got my first kind of big data sets and at some point I was like,
0:13:54 yeah, I’m not going to use Excel for that.
0:13:56 So it was almost kind of, I wouldn’t say I was forced to, but at some point I was like,
0:14:00 well, it’s just going to make everything more efficient faster and so on.
0:14:03 So it was kind of great because I felt like I was coding always with a purpose
0:14:08 to do something specific for what I wanted to do with my project.
0:14:11 So in that way, I kind of always thought really motivated to go after it.
0:14:16 So that’s kind of good.
0:14:17 So how long have you been at Basecamp now?
0:14:19 Basically from the very beginning for almost three years now.
0:14:22 OK.
0:14:23 And how big is your team now?
0:14:25 How has it grown out over that time?
0:14:27 The company is a whole about 32 people.
0:14:30 My team is 15, but a bit less than half the company.
0:14:34 Can you tell a story or kind of explain a discovery along the way at Basecamp
0:14:41 that will blow our listeners’ minds?
0:14:44 Is there something about sequences from Antarctica,
0:14:48 or something undiscovered about the way ecosystems work in the desert,
0:14:54 or for that matter here in San Jose, I don’t know.
0:14:56 But something that just really sticks out is like, you know…
0:15:00 Yeah, I mean, one thing that I still think is a nice thing to share
0:15:05 is actually from the very beginning, the very first few days of Basecamp
0:15:10 was, I mentioned this earlier,
0:15:12 but the two founders of Basecamp, Glenn and Ollie,
0:15:15 and big coolers to them having this kind of vision,
0:15:17 they love exploring the world.
0:15:19 They go out in the wild, they go climbing, they go up mountains, whatever.
0:15:23 I’m more fragile, I like to stay behind the screen.
0:15:26 And so at some point, a couple of months before they started Basecamp,
0:15:29 they spent over a month on an ice cap, fully off-grid in Iceland.
0:15:34 And Glenn did a lot of sequencing, DNA sequencing during his PhD.
0:15:38 So he brought these kind of mini flow cells with him,
0:15:41 these portable sequencing devices.
0:15:43 And he was just sequencing ice caps and see what was in there.
0:15:46 You probably don’t expect much life to be happening there.
0:15:49 And so they came back with this data and they were like,
0:15:52 “Oh, Phil, you do a lot of this analysis and you do coding in your PhD.
0:15:54 Can you have a look at what’s in there?”
0:15:56 And I analyzed this data, I annotated this,
0:16:00 and something like 97% of it has never been seen before in any public database.
0:16:05 This was completely novel, had 0% similarity to anything that’s ever been seen before.
0:16:10 And that was just a random spot, Glenn did, somewhere on an ice cap.
0:16:13 We didn’t look for something new, it was just random.
0:16:16 And so from that, we just realized,
0:16:18 “Oh my God, the vastness of life on Earth is just so huge.”
0:16:21 And the opportunities lost in the life science industry by not leveraging this data more systematically,
0:16:26 that’s kind of, was a great kind of origin story for us to realize.
0:16:30 Like, let’s do this systematically and with deep learning in mind,
0:16:33 because a lot of the data that we have in the life sciences is us,
0:16:37 you know, having kind of all these academic endeavors sequencing here or there,
0:16:41 and fingers crossed that they’ve collected the right data, right?
0:16:43 And so we’ve kind of made this systematic and in partnership with all of these nature parks,
0:16:47 which is exciting.
0:16:49 Yeah, that’s amazing.
0:16:50 From the technological side, we talked a little bit before about the infrastructure,
0:16:54 the compute, the technique, sort of, I don’t want to say catching up to,
0:16:58 but sort of keeping pace with, you know, the size of your data set along the way.
0:17:04 What are some other AI machine learning related challenges that you’ve encountered at base camp
0:17:10 that you got passed or perhaps that you’re sort of, you know, still grinding on now?
0:17:15 Yeah, I mean, one thing that I am most excited by that has been kind of addressed in the last few months
0:17:20 is how to deal with bigger context sizes.
0:17:23 So that’s kind of something in the language model field,
0:17:26 there’s a couple of architectures that were developed,
0:17:30 especially, I think, in Stanford last year, like Hyena and Mamba,
0:17:33 that I’m super excited by because what we’re collecting is not just protein sequences,
0:17:38 but these really long range genomic context windows that that’s not really that common in other public data.
0:17:44 And so can I ask you to explain what that means?
0:17:46 Yes, I mean, when people sequence environmental data, not like a strain from a Petri dish,
0:17:52 but kind of these wild environmental samples where you might have 10,000 species
0:17:57 in a tiny, tiny piece of soil or something.
0:17:59 When you sequence that, usually what happens in public data is you get maybe a few thousand base pairs.
0:18:04 And if you’re lucky, there’s one gene on there.
0:18:06 What we’ve done is we’ve really tried to optimize this process in a way where we get near full genome.
0:18:12 So the entire genome of every single organism of that.
0:18:15 So hundreds of thousands of base pairs with tens of thousands of genes or whatever that might be.
0:18:20 And so with that, we’re actually understanding a lot more about the interaction of all of these genes,
0:18:25 what they do to work together and also understand more complex behavior.
0:18:28 For example, how you can use this for gene editing or therapeutic applications.
0:18:33 But modeling this with language models hasn’t really been that straightforward
0:18:37 because there wasn’t really that many architectures out there to deal with that kind of information.
0:18:41 And with Hyena and Mamba, we’re now really excited that this is now possible
0:18:45 and that we have the data sets that we can apply this to.
0:18:48 So that’s something I think in terms of dealing with long context,
0:18:51 probably the most exciting development for very selfish reasons, basically.
0:18:55 But I’m super excited about that.
0:18:56 Very cool, very cool.
0:18:58 So you kind of hinted at this in that answer when you mentioned gene editing.
0:19:03 What are some of the applications going forward for the work that Basecamp’s using?
0:19:09 And then I don’t know and I’m not asking you to compare to competitors or what have you.
0:19:14 But as the available data sets sampled from biology, from nature, grow,
0:19:21 as they continue to grow, what are some of the implications and applications
0:19:25 for everyday folks like me downstream from the work that you’re doing?
0:19:29 No, I think the reason I think gene editing and at some point gene writing technologies
0:19:34 is going to change not just medicine but health in general.
0:19:39 We accumulate millions of mutations every day just by existing,
0:19:43 by breathing and eating, and some of them are non-significant,
0:19:46 some of them are bad, some of them may be good.
0:19:48 But for us to be able to accurately change them or write new DNA into the human genome
0:19:53 to make a change, that’s really I think kind of the next wave
0:19:58 of big therapeutic changes that we can make.
0:20:01 And the machines that can do this actually often are derived
0:20:04 from the way bacteria and viruses fight with each other.
0:20:07 So bacteria and phages, those are viruses that infect bacteria.
0:20:11 They kind of have biological warfare going against each other in the wild, in nature.
0:20:16 It’s not something you can measure in a petri dish with a sterile strain or whatever,
0:20:19 but in nature, they have warfare against each other.
0:20:22 And those machines that enact this warfare, those gene editing systems,
0:20:26 CRISPR/CasNucleases, it’s like one of the kind of major headlines that came out of that.
0:20:31 There’s hundreds of millions of these systems that haven’t been discovered yet
0:20:34 and leveraging them in a way that we do, but also in a way
0:20:38 that at some point by having enough of these that we can generate them
0:20:42 or design them through language models, for example.
0:20:44 That’s something that a development I think is really exciting.
0:20:48 You know, I’m abstracting to the level that I can comprehend.
0:20:51 But the last bit that you said, I was thinking, so my kids, maybe my grandkids
0:20:56 might be able to prompt a model and edit their genes or rewrite their genes.
0:21:02 And I know it’s maybe not quite like that, but is that a future we’re headed towards?
0:21:07 I think, I mean, I live in Europe, so there’s always a lot of regulations to think about, maybe?
0:21:12 So I don’t know, I come from this all of it.
0:21:15 No, but I think joking is that I think one of the things I can definitely imagine
0:21:19 is that if the way we monitor our DNA and our mutations
0:21:24 and the way we can address these mutations, let’s say in 20, 30 years time,
0:21:28 is something we can do in real time, I can imagine, where because of, you know,
0:21:33 sequencing in the body as we live and breathe,
0:21:36 where through some device, we detect a harmful mutation and being able to fix it within two hours.
0:21:42 Like, this sounds crazy, but I do think this is kind of where this is going in 20, 30 years.
0:21:47 And that’s kind of the science fiction scenario for therapeutics and gene technologies.
0:21:52 Amazing.
0:21:53 NVIDIA Inception, you’re part of it.
0:21:55 I’m not asking you to plug anything, but how’s that been?
0:21:58 And, you know, being sort of a startup on the leading edge of life sciences
0:22:05 must be, you know, in some ways, similar to other startups with similar concerns
0:22:09 around growth and funding and, you know, keeping things running and all that kind of stuff.
0:22:15 But I’m sure there’s something unique to being a startup,
0:22:19 working on, you know, discovering novel science.
0:22:22 What’s that like and what’s it been like working with Inception?
0:22:25 It’s been amazing.
0:22:26 We’ve been working with NVIDIA for two years, almost two years now.
0:22:30 I think it’s really exciting because a lot is happening.
0:22:33 And so, I mean, sometimes I open, you know, BioArchive or PubMed or something.
0:22:39 It’s like, damn, can everyone please stop publishing?
0:22:41 There’s just so much happening.
0:22:42 But I actually think it’s exciting because everyone has their strengths, everyone.
0:22:47 And by having these networks of lots of life science companies and everyone has a different product,
0:22:52 they have a different strategy.
0:22:54 And so actually, some people think like, oh, are these, you know, startups all competitive with each other?
0:22:59 And in some cases, maybe, but I’m actually a lot more excited by the fact
0:23:02 that what’s really happening is we’re all growing the field.
0:23:05 We’re all growing the market.
0:23:07 And so some people offer a software, some people offer an asset.
0:23:10 Some people will offer a service, whatever it might be.
0:23:12 And so actually, the products are different, the technologies are different.
0:23:15 And so just the space growing as a whole is something that’s super exciting.
0:23:19 And NVIDIA and Inception, they’re connecting everyone and making it happen.
0:23:22 And so that’s something I’m super excited about.
0:23:24 That’s fantastic.
0:23:25 So you mentioned coming up with something of a traditional,
0:23:29 I hesitate to call it old school because as we’re sitting at the table,
0:23:32 I’m not going to guess, but I know you’re far more younger than I am.
0:23:35 So if you’re old school and I want to think about what I am.
0:23:38 But coming up with more of a traditional life sciences background
0:23:42 and then kind of moving to a place where you moved into applying it and using technology in that way.
0:23:47 Jensen said something in the media a couple of weeks ago about giving advice to young folks,
0:23:53 you know, to focus on a domain and develop domain expertise.
0:23:57 Because, you know, the the computing language of the future is just speaking, right?
0:24:02 It’s natural language.
0:24:03 And so the tools will progress so that you can leverage them in your domain.
0:24:08 Right, yeah.
0:24:09 What advice would you give to a young life scientist,
0:24:12 somebody studying biology in high school or going into undergraduate work in this age
0:24:18 where everything’s moving so fast and technology is such a big part of it?
0:24:21 No, I guess, yeah, I mean, I speak to a lot of kind of biologists, but also computer scientists
0:24:27 that are flying through basically to the biologists.
0:24:28 My main my main advice is always do what you care about.
0:24:32 There’s a lot of biologists that have something they really care about,
0:24:35 but then they go into oncology because that’s where big farmers or whatever it is.
0:24:40 And that’s great.
0:24:41 But I actually think there’s so many there’s so many areas in the life science industry
0:24:46 where the problem is not that they’re not relevant.
0:24:48 The problem is that they’re not relevant yet because by making more discoveries,
0:24:52 we’re always going to find something that’s clinically relevant.
0:24:55 Like gene editing was found through studying bacterial immunology,
0:24:58 which is like no one ever thought was going to be relevant therapeutically a few years later.
0:25:02 Right.
0:25:02 So I think it’s always better to do something that you’re passionate about and make it relevant
0:25:07 rather than trying to find something that oh, this is what therapeutics cares about
0:25:10 and just running after that.
0:25:12 That’s my advice to biologists for computer scientists.
0:25:15 The main thing I find is when when we have people apply to base camp or interviews and so on,
0:25:21 often what I hear is people kind of saying, oh,
0:25:24 I really care about this specific type of diffusion model and I want to apply this.
0:25:28 And my counter argument often is kind of let’s let’s discuss what we’re trying to solve first
0:25:33 and then does that help or should we think about something else?
0:25:37 Or should it be a language model or should it be, you know, regression?
0:25:40 I don’t know.
0:25:41 But basically always this is often what when I speak to people from kind of a more technical background
0:25:47 always and this goes back to Jensen’s point about the domain to interact well with the domain,
0:25:52 the problem they’re trying to solve.
0:25:53 And then absolutely, yeah, talk about the technology and what it’s going to identify the problem
0:25:58 with the nation first and then apply the tools.
0:26:00 Yeah, it makes good sense.
0:26:01 One thing I haven’t really talked about much is you probably know about things like the New York Times
0:26:06 suing open AI because they didn’t ask for their data.
0:26:09 So one thing that we’re doing that’s different to any other life science organization is that
0:26:13 we’re not just asking all these nature parks for consent because all the public databases they never did.
0:26:18 So we don’t just ask them for consent.
0:26:19 What we’re also doing is when we license something to a partner or whatever,
0:26:24 we often do something like a revenue share with them to basically make sure that all the progress of life sciences
0:26:29 or the AI that comes out of all of this data, we share this with the stakeholders that originate to the data.
0:26:35 So to protect biodiversity, we also have a different model where there’s kind of good data governance for it.
0:26:42 I don’t know whether, yeah, but that’s kind of one thing our team does, yeah.
0:26:45 When you started approaching the nature parks about this, were they receptive or they confused?
0:26:52 Did they understand what you were talking about?
0:26:54 And I don’t mean to be disparaging to them, it’s unique.
0:26:57 So not to say the global south because that’s a lot of vastly different players,
0:27:02 but a majority of biodiversity is obviously like South America, Africa and so on.
0:27:06 They are on top of this.
0:27:07 There’s the Nagoya protocol and actually means you have to have to not just ask for consent but share benefits with them.
0:27:13 And so they are aware of this.
0:27:14 They’re almost waiting for the West to ask them and work with them.
0:27:18 So a lot of them are super on top of this.
0:27:20 Interesting, yeah.
0:27:21 We’re just the only ones.
0:27:22 We have a team that is part of the United Nations conventions on biological diversity.
0:27:26 So that’s a long title.
0:27:29 But they’re working with nature parks, local governments, national governments to make these deals.
0:27:34 And sometimes that takes months to have an agreement.
0:27:37 But that way we know that for every single data point in our database,
0:27:42 we don’t just have consent and permission of where that comes through.
0:27:45 But also when we see a commercial success through something, we can share some of that with them.
0:27:50 And that incentivizes obviously an even bigger data supply chain, which is exciting.
0:27:56 Phil, for listeners who want to find out more about what Basecamp Research is up to,
0:28:00 there’s a website, should they go there or where should they go?
0:28:03 Yeah, absolutely.
0:28:04 Our website, very simple, Basecamp-research.com.
0:28:08 I think we have LinkedIn, Twitter and so on as well.
0:28:10 You can email me if you want.
0:28:13 Fantastic.
0:28:14 Absolutely, yeah.
0:28:15 Great.
0:28:15 Well, thanks again for taking the time out of GTC to speak with us.
0:28:19 It goes without saying, but it’s fascinating, fascinating work you’re doing.
0:28:22 And I only understand it on the level of a couple of drops,
0:28:25 not the whole ocean, but can’t wait to see what the rest of the year holds for you and for Basecamp.
0:28:30 Thank you so much.
0:28:30 This has been great.
0:28:31 Cheers.
0:28:32 [MUSIC PLAYING]
0:28:35 [SUSPENSEFUL MUSIC]
0:28:39 [SUSPENSEFUL MUSIC]
0:28:42 [SUSPENSEFUL MUSIC]
0:28:46 [SUSPENSEFUL MUSIC]
0:28:50 [SUSPENSEFUL MUSIC]
0:28:54 [SUSPENSEFUL MUSIC]
0:28:58 [SUSPENSEFUL MUSIC]
0:29:02 [SUSPENSEFUL MUSIC]
0:29:06 [SUSPENSEFUL MUSIC]
0:29:10 [SUSPENSEFUL MUSIC]
0:29:13 [SUSPENSEFUL MUSIC]
0:29:17 [SUSPENSEFUL MUSIC]
0:29:20 [BLANK_AUDIO]
Basecamp Research is on a mission to capture the vastness of life on Earth at an unprecedented scale. Phil Lorenz, chief technology officer at Basecamp Research, discusses using AI and biodiversity data to advance fields like medicine and environmental conservation with host Noah Kravitz in this AI Podcast episode recorded live at the NVIDIA GTC global AI conference. Lorenz explains Basecamp’s systematic collection of biodiversity data in partnership with nature parks worldwide and its use of deep learning to analyze and apply it for use cases such as protein structure prediction and gene editing. He also emphasizes the importance of ethical data governance and touches on technological advancements that will help drive the future of AI in biology.