AI transcript
0:00:16 Hello, and welcome to the NVIDIA AI podcast. I’m your host, Noah Kravitz. You’ve heard of
0:00:20 language models, video models, reasoning models, and foundational models. And here on the podcast,
0:00:25 we’ve talked a lot about healthcare-specific AI models for things like protein structure
0:00:30 prediction. Well, today, we’re exploring disease models. The Cytoreason disease model is a
0:00:35 comprehensive model of human diseases that models and compares treatments in patient groups,
0:00:41 helping researchers of all levels make data-driven decisions across the drug development life cycle.
0:00:46 That brief description doesn’t really do justice to what disease models and Cytoreason as a company
0:00:52 are all about, but our guest is here to help. Shai Shen Or is co-founder and chief scientist at
0:00:57 Cytoreason, and professor of systems immunology and precision medicine at the Technion,
0:01:02 the Israel Institute of Technology. Shai is here to tell us all about Cytoreason,
0:01:07 how it got started, why the technology is so important, and what they’re trying to do.
0:01:12 And we’re grateful to have you here. So, Shai, welcome, and thanks for joining the AI podcast.
0:01:17 Oh, thanks, Noah. Pleasure to be here. Thank you for inviting me to speak about my favorite subject,
0:01:17 I guess.
0:01:22 Please, and just take it right from there. I don’t even need a more pointed question. Tell us
0:01:27 about your favorite subject, maybe start with your background a little bit, and then tell us how
0:01:30 Cytoreason came to be and what it’s all about.
0:01:38 Sure. So, yeah, I’ll go back, I guess, at this point, I can think of myself a bit of a dinosaur in
0:01:44 the space of doing computational biology, data science. I started back in the, I guess, late 20th
0:01:49 century, as they say, with the idea where basically, you know, we’re just starting, the human genome is
0:01:56 getting sequenced, and the realization that biology is making a leap from a one tube, one result
0:02:01 type field to one tube, a million results type field.
0:02:02 Okay.
0:02:08 And suddenly there’s room for, you know, what evolved to what I think now we think about is
0:02:14 data science and AI in the context of medicine and life sciences and healthcare.
0:02:21 And for me, that discovery of falling in love with biology, you can do, I was kind of doing
0:02:27 a lot of stuff around AI in the late 90s, as they say, very different space.
0:02:29 Eons ago, yeah.
0:02:35 But discovering that you can actually use the same kind of the same type of thinking, but
0:02:42 in a space such as life sciences and healthcare was to me profound and kind of changed my life
0:02:48 course, realizing that in this space, actually not only, you know, is the data interesting
0:02:54 and there’s a good, I think, humanitarian cause, but also are the AI challengers are profound
0:03:00 because this data is, I often call it deep more than big, right?
0:03:05 I have, you know, a big experiment is a million measurements on a hundred people, right?
0:03:10 And so there’s way more, you know, way more features of P is greater, greater than N type
0:03:14 and that brings really interesting problems.
0:03:19 And how do you build machine learning models that actually overcome this when there’s not
0:03:23 that much of a repeat kind of information to, to learn form.
0:03:24 Right.
0:03:27 And that, that brings in a lot of prior knowledge and we’ll get to talking about that, I guess.
0:03:28 Okay.
0:03:29 Yeah.
0:03:31 So that’s kind of, I came from this systems immunology.
0:03:33 I’m a faculty member at the Technion.
0:03:38 And I realized, I guess, in the 20 years I’ve been doing this type of work, I realized that
0:03:44 as biology were kind of fascinated with that two, one million results, I realized that we’re
0:03:52 actually in this amazing times where data is exploding, but value, insight from it does
0:03:54 not explode in the same rate, right?
0:03:59 The gap between data and insight, actually you can think about it like data is exponential,
0:04:00 insight is linear.
0:04:06 Every day, every day percent data utilized to give insight is lower.
0:04:10 And the question is, how do you overcome this?
0:04:16 And how do you develop these techniques to ultimately bridge what I call the data insight
0:04:16 gap?
0:04:20 And, and, and whereas biologists, molecular biologists have investing a huge amount in
0:04:25 making this amazing tools that can measure basically now every layer of human biology
0:04:31 comprehensively, the analytical side of this and the AI solutions for this have been missing.
0:04:38 The field is still largely a manual field where you give people some data, they sit in front
0:04:42 of their computer, you know, they try to figure out, they make some value and insight for this.
0:04:45 And I figured that’s not a sustainable solution.
0:04:52 And this field needs to move to ultimately build much larger integrative solutions that bring
0:04:59 in many different angles of machine learning, AI statistics, and so forth to ultimately bridge
0:04:59 this.
0:05:03 And it needs to be done in a way that’s all, you know, ultimately is reproducible and productized.
0:05:05 And that’s kind of what launched Cideries.
0:05:13 So I, we founded Cideries in 2016 with the aim of basically building a pharma AI company that
0:05:16 is not a biotech company that does not develop drugs.
0:05:23 It develops an analytical platform, an integrated AI solution to bridge the data inside gap.
0:05:25 So that’s, I guess, our origin story.
0:05:33 And so then if you’re not, if Cideries isn’t a biotech company, do you serve biotech companies?
0:05:34 Who are some of the customers?
0:05:40 Or maybe if that’s not the right way to get into it, tell us a little bit about what Cideries
0:05:40 offers.
0:05:47 So I think it’s actually a great place to get into it because this data inside gap exists
0:05:48 throughout life sciences.
0:05:52 Everywhere now, biology is a big or deep data field, as I said.
0:05:58 And the question now is, you know, okay, well, you know, where is it going to be the matter
0:06:01 the most to bridge the gap between the data and the inside?
0:06:09 And nowhere is it more important or cost effective and I think ultimately brings the right utility
0:06:11 to humanity than to close it in drug development.
0:06:13 I don’t know if you’re familiar with the numbers.
0:06:14 They’re horrendous, right?
0:06:22 If, you know, a drug today costs $2.5 billion to develop, most of that cost is actually, it
0:06:24 needs to overcome the failures.
0:06:25 It’s a failure business.
0:06:30 Most drugs that you try to develop, even if from the first drug that you put into a human,
0:06:35 the likelihood of actually failing is 90% to ultimately not making it, right?
0:06:38 And if you’re talking about many subdrugs, if you’re a pharma company and you need to be
0:06:43 developing and you have many different assets that’s being developed, well, you need some
0:06:48 kind of a scalable solution to ensure that your success rate over time grows.
0:06:52 And it’s, you know, it’s a no brainer that it needs to be a data-driven solution.
0:06:56 So Cytareason customers are some of the world’s largest pharma companies.
0:07:04 Pfizer, Sanofi are, you know, examples of companies that Cytareason has longstanding relationships
0:07:09 with, but the same problems that happen with Pfizer, you know, who’s developing hundreds of
0:07:11 molecules and layers, it happens in a biotech.
0:07:17 So anybody who’s developing a drug, and I would even argue diagnostics and so forth, has this
0:07:21 problem of how do I make decisions that are data-driven at scale?
0:07:21 Right.
0:07:31 And so do the Cytareason models allow researchers, pharmacists, to predict the effects of a drug
0:07:32 they’re working on?
0:07:34 How does it, how does that work?
0:07:35 Just kind of in lay terms.
0:07:36 Sure.
0:07:41 So, so, you know, in terms of the user base for the, for, for it’s really the Cytareason platform
0:07:42 is an enterprise solution.
0:07:47 You know, we’re trying to address the needs of data scientists who, you know, whose work
0:07:49 because of the data inside gap keeps growing.
0:07:50 Right.
0:07:51 Right.
0:07:56 And, and as you said, you’re, you’re working with some of the biggest pharma companies and
0:07:57 institutions in the world.
0:07:59 So this is a lot of data, big gap, getting bigger.
0:08:00 Correct.
0:08:00 Correct.
0:08:06 And so what a data scientist needed to do, you know, two, three years ago in a pharma
0:08:08 company, it’s like, keeps growing.
0:08:12 It’s like TEDx because the data keeps exploding and nobody wants to make a decision.
0:08:13 Yeah.
0:08:18 Having just suffered from the problem that they didn’t have time to take the most appropriate
0:08:21 data set and analyze it and figure out what it is that they need to do.
0:08:21 Right.
0:08:27 So it’s data scientists, it’s biologists who are not necessarily programming, though this
0:08:29 is becoming less of an issue, I think, now.
0:08:29 Yeah.
0:08:34 You can touch on that, but who are ultimately driving their particular drug programs and
0:08:40 need to make decisions in the, around those in the context of the competition, the standard
0:08:43 of cares, what other pharma are, so we’re doing.
0:08:49 And it then goes up to, you know, heads of therapeutic areas who need to choose what is the, you know,
0:08:53 not only do I want to develop this drug, what’s the right disease to go after?
0:08:55 Well, you know, there’s many diseases.
0:09:00 They’re only going to give me so many shots on goal to fail or succeed.
0:09:02 I need to make those choices, right?
0:09:06 Some of those considerations are commercial, but many of them are scientific.
0:09:07 And that’s what Sider Reason brings to the table.
0:09:10 And it goes on and on in that space.
0:09:14 You can think about portfolio management, people who make strategic decisions.
0:09:15 Those are the user base.
0:09:21 And Sider Reason basically brings in all the world’s molecular data in humans right now, integrates
0:09:28 it into a single model that allows us to learn from this and ultimately support decisions,
0:09:31 use cases, such as how do I prioritize a target?
0:09:34 Which target or combination of targets are prioritized?
0:09:37 Which diseases are I prioritized for my next trial?
0:09:42 Or what subpopulations should we be excluding or including from the trial?
0:09:43 Because they’ll succeed.
0:09:50 So those are really expensive decisions, complicated ones.
0:09:54 And we basically try to bring a yardstick to all the science, the molecular science that’s
0:09:55 out there.
0:09:56 Right.
0:09:57 It’s a big yardstick.
0:10:00 I want to get to how you decided to start building agentic workflows.
0:10:05 And if there was something specific, specific kind of challenge, whether on the science end
0:10:09 of things or in wrangling different types of data and that kind of thing.
0:10:14 But maybe walk us through a little bit kind of how you architected, how you built Sider Reason
0:10:19 and then, you know, describe now the agentic workflows you’re using and go into why a little
0:10:20 bit.
0:10:20 Yeah.
0:10:20 Sure.
0:10:23 I mean, I think in some ways I already gave a bit of a clue.
0:10:24 Yeah.
0:10:27 Because I described the data insight gap, right?
0:10:33 So imagine you live in a field and I’ll give you an example just to make this kind of real.
0:10:37 If you’re talking about the scientific literature, my field, I mentioned in the beginning, I’m
0:10:39 a systems immunologist.
0:10:40 I work on the immune system.
0:10:44 In immunology, every two minutes, a new paper comes out.
0:10:45 Wow.
0:10:45 Okay.
0:10:47 It’s kind of humbling, right?
0:10:48 Every two minutes.
0:10:53 And even if you’re going to argue that half of them are not worth reading, you’re still going
0:10:53 to run out of time.
0:11:00 Now, similarly, if you’re talking about single cell data, RNA-seq data, gene expression,
0:11:05 proteomics, all, you know, things that, you know, our audience may be familiar with or
0:11:06 not, it doesn’t matter.
0:11:11 That data is like, while I’m sitting here working, while we’re talking, there’s data coming out
0:11:15 that could be very valuable to drive my decision.
0:11:22 So Cider Reason as a company is, by its own definition of its goal and vision, needs to
0:11:26 somehow beat this exponential growth in data, right?
0:11:33 So it immediately says, I say this to every employee at Cider Reason, you know, say 80%
0:11:36 of your time you spend on whatever your job is.
0:11:43 20% you have to spend on how do I make my job obsolete and automated so because I have the
0:11:45 next challenge to do.
0:11:51 Because that data, if we don’t, we’re going to be beaten by the avalanche of data coming
0:11:51 in.
0:11:58 So that, if you think about this as a pitch for agentic AI, you know, it doesn’t get better
0:11:58 than that.
0:12:03 I’m just, I’m seeing a commercial in my head with like agents and doctor scrubs and
0:12:08 track shoes, you know, just running as fast as they can to stay ahead of the data, just
0:12:08 piling up.
0:12:09 Right, right.
0:12:10 But yeah, that’s, yeah.
0:12:10 Exactly.
0:12:15 So you basically are constantly in a game in which you need to make it faster.
0:12:18 Just, you know, it’s actually what’s called, you know, the, in evolution.
0:12:21 And then you remember Alice in Wonderland, the Red Queen?
0:12:22 Sure.
0:12:22 Right.
0:12:27 Where she said to Alice, you have to run just to stay in place.
0:12:27 Yeah.
0:12:27 Yep.
0:12:28 Right.
0:12:32 And it’s also an evolutionary principle of how, you know, viruses in the immune system
0:12:33 combat.
0:12:34 This is another topic.
0:12:36 I can talk to you about this another time.
0:12:37 But the Red Queen effect.
0:12:45 So this need for us to continuously run is a huge driver for automation, acceleration, and
0:12:53 I would even say the cognitive meta-analysis that we as humans need to do to somehow describe
0:12:56 to a machine how we make decisions so that we can automate them.
0:12:57 Right.
0:13:01 So with that in mind, you know, I think almost Saturday reason had the thought that we need
0:13:03 Agenda KI even before Agenda KI was there.
0:13:03 Right.
0:13:03 Yeah.
0:13:07 And of course, when it came around, we jumped on the bandwagon.
0:13:08 Yep.
0:13:08 Yep.
0:13:08 Yep.
0:13:08 Yep.
0:13:11 And so, so it starts at the earliest stages.
0:13:11 So it starts at the earliest stages.
0:13:13 I need to bring the data in.
0:13:14 Right.
0:13:18 To bring the data in, you could go the manual route, right?
0:13:24 Which is like to have people bring it in one by one, totally unsustainable given the data
0:13:25 keeps growing.
0:13:27 A paper every four minutes if we include the bad half.
0:13:28 Yeah.
0:13:29 That’s a lot of data.
0:13:29 Yeah.
0:13:32 Or, you know, data sets and so forth.
0:13:33 Like at every molecular level.
0:13:39 So you really cannot do this manually and strive to get that level, right?
0:13:43 You can build pipelines and automate that you want to process this.
0:13:48 And as soon as you do this more and more with kind of molecular level data, you realize there’s,
0:13:51 in biology, has a lot of these exceptions and outliers and so forth.
0:13:57 And so ultimately, a more appropriate solution is to teach a machine a workflow that may be
0:14:01 very complicated, where humans make decisions, but you can see it, and then start that
0:14:02 automating process.
0:14:04 Now, would it work perfectly from day one?
0:14:07 Depends on the complexity of the data that you’re going to.
0:14:12 But if you then, you know, you put a QC process that you start with manually and then
0:14:18 you make that automated as well and so forth, you can build processes that really accelerate
0:14:19 your data intake.
0:14:23 And that’s just the most obvious place where the agendic AI comes in, right?
0:14:27 It can come in other places around, you know, decision supports.
0:14:29 We’ll talk about this one.
0:14:35 So thinking about keeping up with the data, the literature in particular, were there specific
0:14:41 techniques or, you know, there are obvious advantages we’ve talked about, just agents being
0:14:46 able to go out and do the research and grab the data kind of obviously is a, you know, a game
0:14:52 changer, but other things that you discovered about working with agents to curate and review
0:14:56 the medical literature in particular that jump out at you?
0:14:59 Well, I think it’s a wonderful question.
0:15:01 I’ll answer, I think, on two angles.
0:15:08 So first of all, just to say why in such a data-rich field somebody needs literature, which
0:15:10 you can think about it as data, right?
0:15:11 You can say, well, that’s data, right?
0:15:17 But I actually want to uniquely identify that data from other data because I would argue that
0:15:23 literature is already at a stage of knowledge and biology has a lot of data that isn’t yet
0:15:23 knowledge.
0:15:27 Ciderism deals with all the data, but also the literature.
0:15:32 And the reason we need it is because of, well, A, people want the knowledge, right?
0:15:37 But the other side, which is more interesting, is when I describe that this is a deep data
0:15:42 field where there’s way more features than there are kind of measurements or samples and
0:15:48 so forth, the way that people make decisions, you basically cannot just stick this into, you
0:15:53 know, a machine learning model and it’s going to be basically an overfit, right?
0:15:58 And the way you kind of deal with this is actually by the integration of prior data, which comes
0:16:02 to the literature that allows you to narrow down the search space in a variety of fields.
0:16:03 Right, okay.
0:16:04 It has two advantages.
0:16:08 One advantage is that you do make, it’s easier for you to make discoveries.
0:16:12 And there’s another advantage, which is relates to our customer base.
0:16:17 And I think in general, on how people make decisions, large decisions in the face of uncertainty,
0:16:24 which is that they want to stand on the shoulders of giants or at least stand on some level of
0:16:25 confidence, right?
0:16:33 So being able to connect new novel discoveries, emerging phenomenas and so forth that the AI
0:16:40 model produced to knowledge that I, you know, I solidly believe in a firm is actually an important
0:16:46 thing for our customer base and for any scientist to actually make the leap, right?
0:16:50 Because it’s going to, the next stage could be, you know, it usually would be an experiment.
0:16:54 It can sometimes be a very expensive experiment.
0:16:57 And people, it’s not enough just to have a predictive model.
0:17:02 People are seeking from our, our customers are seeking from us and where we strive for it
0:17:04 to be a mechanistic model.
0:17:08 Explain to me why that prediction makes sense.
0:17:08 Right.
0:17:10 And give me trust in it.
0:17:12 And the literature brings that piece in, right?
0:17:17 So from an agendic and I perspective, that means also, for instance, as an example, confidence
0:17:21 scores on the literature are a really key thing for us, right?
0:17:23 Because this literature is complicated.
0:17:29 There’s not a huge amount of instances of any one event being, how sure are we that this
0:17:36 particular kind of description of a biological event is actually correct?
0:17:43 And that for us was a huge piece of entering of how we kind of been pushing the LLMs within
0:17:45 CiderEase and then the agendic AI workflows.
0:17:47 And it goes, I kind of mentioned, it goes everywhere here.
0:17:53 We need to be really sure, you know, gen AI is awesome and agendic AI, but we need to be,
0:17:54 we need to have the high quality.
0:17:57 And so we’ve been putting a lot of these guardrails, if you like.
0:17:57 Yeah.
0:17:59 Where do the confidence scores come from?
0:18:01 Are you, is CiderEase generating them?
0:18:02 Are they in the literature?
0:18:07 So you basically can come up with a variety of different techniques in which by sampling
0:18:09 the literature, right?
0:18:13 And also fitting, you can, you know, by sampling the literature, you can build that confidence
0:18:17 on one hand by putting an LLM RAG component, right?
0:18:22 So you’re actually doing retrieval augmented generation and kind of querying this to be more
0:18:24 certain about what it is that I’m looking for.
0:18:24 Right.
0:18:26 All of those, and there’s a variety of other techniques.
0:18:32 There’s also the kind of the, what we call biological expectations or bio-credibility in the end
0:18:33 to check ourselves on this.
0:18:35 And so that it’s a loop that keeps improving.
0:18:40 All of those are techniques that allow us to basically build the confidence that we need
0:18:41 for, for these heavy decisions.
0:18:47 On one hand, to leverage the necessity to leverage an AI and a genetic guide to basically move forward
0:18:48 and do this on a large scale.
0:18:50 And on the other hand, to ensure the confidence is high.
0:18:53 I’m speaking with Shai Shenor.
0:18:59 Shai is co-founder and chief scientist at Cytoreason, the company we’ve been talking about.
0:19:04 And he’s also professor of systems immunology and precision medicine at the Technion, Israel
0:19:05 Institute of Technology.
0:19:12 We’ve been talking about Cytoreason and just recently the importance of building trust in
0:19:13 the model’s output.
0:19:17 And, you know, it’s something that applies to generative AI in any situation.
0:19:23 But as you were saying, Shai, in biology, in precision medicine and drug discovery and
0:19:28 pharma and everything, these decisions are both, you know, literally can be life and death for
0:19:33 lots of people, but also quite expensive and involve, you know, saying go involves a lot of
0:19:35 resources being put to use, a lot of money being spent.
0:19:41 And it made me think, Shai, we’ve had a couple of conversations on the podcast with folks in
0:19:48 the protein sequence prediction and generation space and other drug discovery related spaces.
0:19:54 And I’m wondering about Cytoreason’s place in the workflow, in the researcher’s workflow or
0:19:56 the end user, whoever’s using it.
0:20:01 And, you know, when you mentioned about making these decisions and experiments being expensive,
0:20:03 I’ve talked to folks before.
0:20:11 I’ve read about folks using AI models, generative AI, to do sort of simulated experiments, right,
0:20:16 before moving to the wet lab, being able to run and kind of narrow down which ones are worth
0:20:18 the cost and the effort to do.
0:20:23 Are your customers using Cytoreason kind of in a same way or what does a workflow look like?
0:20:28 And then out of that, I wanted to ask you if there’s a time you can share with us where
0:20:34 Cytoreason’s workflow enabled something, you know, really unique from an end user.
0:20:36 So let me talk a little bit about the workflow if you could.
0:20:37 Sure.
0:20:42 So from a workflow perspective, there’s maybe two points to say.
0:20:47 And it sounds like you have been talking to some interesting people doing interesting stuff around
0:20:47 protein stockers.
0:20:50 So I’ll differentiate ourselves from it.
0:20:54 So Cytoreason is a company, and this is also interesting, I think, almost from the NVIDIA
0:20:57 kind of marketplace and kind of the company.
0:21:04 Cytoreason is a company that NVIDIA invested in, and I think we stand out uniquely within those.
0:21:10 Because if I look at the healthcare flow, right, there’s the chemistry of it, right?
0:21:15 There is the biology of it, and there’s the kind of clinical side of it.
0:21:15 Okay, right.
0:21:21 And if you look where AI has been playing a big role at this point, it’s certainly been in the
0:21:24 chemistry space, chemical structure, and I would put protein structures there as well.
0:21:29 Like from small molecules libraries to protein structures, there’s a huge amount that’s happening
0:21:36 with kind of, you know, NVIDIA GPUs and, you know, generative AI and so forth to basically
0:21:40 build those molecules, and of course, anything that gets, you know, built there, there’s a
0:21:43 simulation test, but ultimately somebody puts in experimental tests.
0:21:50 And experimental tests are usually, I would say, at early stages, they are what would be called
0:21:51 an in vitro experiment.
0:21:57 There’s no animal, there’s no human, you know, you’re just texting to see, well, okay, it was
0:22:03 this antibody that I just kind of simulated when I generated, does it hold the properties and
0:22:05 can it be a good, you know, a good direction.
0:22:11 On the clinical side, there’s also a lot of, I think, agentic AI happening with a lot of
0:22:17 kind of shortening, say, kind of recruitment for clinical trials and so forth, right?
0:22:20 There’s a lot of space happening in the electronic and medical record.
0:22:22 It’s a relatively well-defined space.
0:22:27 There’s a huge amount of almost like human operations that goes there, and then I think
0:22:29 agentic AI has been playing a big role in.
0:22:34 Side of reason is quite unique in that we’re focused on the biology side of things.
0:22:38 And biology, if you compare them to the two, is actually the biggest unsolved problem.
0:22:39 Yeah.
0:22:45 I would say today, if I look at pharma, the two big problems from a science perspective are
0:22:51 one is that we don’t have a good understanding of the biology, and you see it in clinical trials
0:22:56 that phase two, which is the first time we tested in humans, is where the biggest failure
0:22:57 rates are.
0:22:57 Right, okay.
0:22:58 Okay, so it tells you.
0:23:02 And then the second piece is the human diversity.
0:23:08 So the biology, it can be, you know, you and I may not have, you know, we may have the same
0:23:09 indication and so forth.
0:23:12 We actually may look very different and could be for different causes.
0:23:14 We still don’t have a good understanding of this.
0:23:16 That’s where side of reasons are playing.
0:23:22 And bringing AI there is, you know, the search base is way, way, way bigger than the chemists.
0:23:25 And so it’s an early stage to build on.
0:23:30 But it’s clearly the biggest problem in that I think where we’ll see companies going on.
0:23:32 And certainly that’s kind of where we’ve been kind of leading.
0:23:38 And NVIDIA kind of putting the, or, you know, I think our trust in us has been a huge thing
0:23:39 for us.
0:23:44 So, you know, I think if I look at that space, how are the users behaving there?
0:23:46 Well, first of all, they need to explore the disease biology.
0:23:49 And then they need to think about their use cases.
0:23:55 And again, the use cases is what would be a good target to choose from, given I need to
0:23:57 have it work in this particular disease.
0:24:02 And given that I know, you know, this is, you know, this disease already has a bunch of
0:24:04 standard cares that I need to beat.
0:24:04 Right.
0:24:07 And I know that there’s people who are not responding.
0:24:11 And what is it that’s about them that maybe I can target?
0:24:16 And there’s other companies that are developing and there may be, you know, out to market before
0:24:16 me.
0:24:22 So they need to, all of this commercial questions need to come into a scientist thinking about
0:24:26 disease biology and saying, where’s the niche that I can come in?
0:24:33 And so whether it’s target prioritization or bigger than this indication, choose the next
0:24:34 clinical trial.
0:24:35 Is it happening in RA?
0:24:37 Is it happening in Crohn’s disease?
0:24:39 Is it happening in, you know, in Alzheimer’s?
0:24:42 That’s not an easy, those are the use cases.
0:24:46 And so if you think about what side reason brings in, it tells this particular target is the best
0:24:48 priority to go for this disease.
0:24:51 Here’s a bunch of mechanisms why we think this is the case.
0:24:55 And the users can go and do small tests.
0:24:58 It’s very different from the protein structure ones I mentioned before.
0:25:05 But small tests that actually validate, you know, the top hypotheses, build confidence in
0:25:09 the AI prediction, and then you go and execute on it.
0:25:12 What’s the feedback been like from your users?
0:25:14 And I’m wondering, I mean, go anywhere you want with this.
0:25:19 I’m wondering if there are certain areas that have been brought to your attention to focus in
0:25:25 on whether it’s that the users have been kind of poking at a certain area and wanting more
0:25:31 functionality, or if maybe something you didn’t expect popped up and it’s a different path to
0:25:31 look at.
0:25:31 Sure.
0:25:35 So I think in general, so it’s a very interesting question.
0:25:37 There’s a lot of these, right?
0:25:39 So in general, the users want a lot.
0:25:40 Yeah, no, right.
0:25:43 What does any user want right now?
0:25:43 Yeah.
0:25:48 But I’ll just mention a few directions that you’ll just see how they themselves struggle,
0:25:48 right?
0:25:53 So on one hand, you can think about this as I’m invested in a particular asset.
0:25:58 I just paid or I invested a huge amount of hundreds of millions of dollars to manufacture
0:25:58 a drug.
0:26:00 And what I want to do is deepen.
0:26:07 I want to study that, get every possible layer and model every possible layer here that my
0:26:08 prediction is the best.
0:26:13 And on the other hand, orthogonal to this is you can say, and this is, you know, obviously
0:26:17 like a person who’s a program lead for that drug.
0:26:19 If you ask him, that’s what he wants to do.
0:26:25 Then you go to somebody who’s in charge of an AI strategy for a pharma company or is the
0:26:26 head of a therapeutic area.
0:26:28 And they say, well, that’s one drug.
0:26:35 I obviously, you know, it’s, I care about it, but I have a hundred drugs that I actually
0:26:40 am simultaneously developing and I need to evaluate them across tens, if not sometimes
0:26:41 hundreds of diseases.
0:26:43 We need scale, right?
0:26:50 Those two are orthogonal and you need to basically care to do it because to both, because the science
0:26:51 is always in the depth, right?
0:26:55 And the commercial problems are often in the bread, right?
0:26:56 And you need to do them, right?
0:26:59 Other pieces of challenges come from new data types.
0:27:04 Biology keeps inventing, or biologists, new measurement modalities.
0:27:10 So I can model a tissue and say, here’s the mRNA in it, or I can model a tissue and say,
0:27:16 well, I’ve modeled, I took the mRNA, I’ve developed methodologies to describe that this biopsy
0:27:17 actually is made up of cells.
0:27:23 And now new technology allows me to say, well, I can tell you the geographical position of
0:27:24 every cell and how they interact.
0:27:28 So as new technologies come out, well, let’s add them into the model.
0:27:32 And they never come out with a huge amount of data in this field that’s deep data.
0:27:39 It’s like, we have 10 samples that we generated with a new technology that each file is a
0:27:40 gigabyte of data.
0:27:47 And so again, it comes to this, how do I enter a prior in so that, you know, on one end, I’m
0:27:52 aware of the fact that I only have 10 samples and the world’s population diversity is bigger
0:27:54 than 10 patients for this, right?
0:27:57 On the other hand, I use this new technology to contain.
0:28:03 And so studies and models from a perspective, and the word model here is deceptive in some
0:28:03 sense.
0:28:07 We develop what I would call hybrid models, right?
0:28:11 So on one hand, we have services that are deep learning and LLM and so forth.
0:28:16 And on the other hand, we have places where, you know, it would be standard traditional statistics
0:28:22 and statistical learning and rule-based because the problem, the richness of the data is so
0:28:23 big.
0:28:26 Like, you know, there’s very few places in biology today.
0:28:30 You can just stick them into a deep learning model and you’ll get good performance, right?
0:28:33 Maybe it’s images and genetics and protein structure.
0:28:35 Everywhere else, there’s just not enough data.
0:28:37 And you need to somehow overcome these things.
0:28:43 And so we build our model is ultimately an integrative framework that calls a lot of different
0:28:49 services that has many different solutions to each tailored for the different components
0:28:50 and then integrates them.
0:28:55 What do you see the future of biomedicine?
0:29:01 What do you see the researcher, scientist, you know, sort of the job look like and specifically
0:29:06 the tools, right, in a few years, you know, whatever the timeframe is, two, three years,
0:29:10 five years, 10 years, whatever timeframe makes sense to you from what you’ve seen.
0:29:12 What do you see that role looking like?
0:29:18 And what do you see the technology component looking like in a couple of years?
0:29:23 And I’m thinking about both, you know, everything you described in the industry and balancing
0:29:27 research and science and all of these, you know, the data and everything, but also something
0:29:31 you said about what you say to your own employees when they, I don’t know if you said
0:29:36 when they start, but, you know, like you need to figure out how to automate, you know,
0:29:39 make yourself obsolete, automate what you’re doing away because there’s so much more we
0:29:40 have to do.
0:29:42 So what, yeah, where are we headed?
0:29:43 I think it’s a wonderful question.
0:29:47 Obviously, I will only claim this as my viewpoint.
0:29:48 Exactly.
0:29:49 Yeah.
0:29:54 I feel like, you know, I personally feel I’ve been, I’ve been blessed by that.
0:29:56 I encountered biology when I did.
0:29:56 Yeah.
0:30:01 And then I ended up in what is simultaneously an infinite field, right?
0:30:09 As I, we will not solve all of biology in my lifetime, even with agentic AI and so forth.
0:30:16 And on the other hand, a field that has been right to start thinking in a more, I often call
0:30:21 it engineering fashion, but a rule kind of basically building principles on which you can actually
0:30:23 teach machines to help you.
0:30:29 So from my perspective, as I look at this and I, you know, I think about the job of computational
0:30:34 biologists and the job of biologists and the job of clinicians, all of which are critical
0:30:37 to ultimately bring that healthcare to patients.
0:30:43 I think all of these people have been blessed with, with now solutions that allow them to take
0:30:47 yesterday’s thing, automate it to a level they could never imagine.
0:30:53 It was like a science fiction thing and then get busy with the next cool thing that they
0:30:54 couldn’t even imagine.
0:30:55 Yeah.
0:30:55 Right.
0:30:58 And, you know, I’ll give you another example in biology.
0:31:00 You know, you keep discovering new things.
0:31:02 It’s a field of unknown unknowns.
0:31:07 Oftentimes when I bring in data scientists who never had any exposure to biology, one of the
0:31:12 things they struggle with inside a reason is they expect to have a gold standard, like that
0:31:13 I know what the truth is.
0:31:15 And I’m like, well, we don’t know.
0:31:20 We, you know, we have, we have, we’re, we’re continuing to see vipses.
0:31:23 We’re in an unknown, unknown space.
0:31:23 Right.
0:31:28 And so, so I think the challenges, it’s not the only field in science in which this is the
0:31:31 case, but, but I think those, those challenges are amazing.
0:31:38 And actually the necessity for us, the obligation that we have, I think, to bring in AI and machine
0:31:43 learning in, to accelerate our, our ability to actually bring cures to people.
0:31:51 I see this as an obligation and I’m not afraid of the situation of, you know, of suddenly a
0:31:54 machine doing what it is because there’s always the next thing.
0:31:57 And, and, and it’s actually why I got into this, right.
0:31:58 Is the fascination with the discovery.
0:32:04 And so I think that’s a good way of giving hope to the publisher.
0:32:04 Absolutely.
0:32:05 Yeah.
0:32:05 No, no, no.
0:32:06 Absolutely.
0:32:11 Shy, I think that was a great place to end kind of a, um, an uplifting, I don’t want to
0:32:15 say hopeful because it implies, you know, a lack of hope in other situations, but like
0:32:20 you said, there’s, there’s no end of, of hard problems and cool things to do.
0:32:24 And so, um, you know, using the tools to get the old ones done faster so we can get to the
0:32:25 new stuff.
0:32:26 It’s a great way of looking at it.
0:32:27 Usually shy.
0:32:32 I ask, I kind of wrap these episodes by asking the guest where listeners can go to find out
0:32:34 more about everything we’re talking about.
0:32:37 And I definitely want to do that with you, but first I understand you’ve got a
0:32:38 podcast to plug.
0:32:39 I do.
0:32:40 I do.
0:32:40 I do.
0:32:41 You play the host role.
0:32:41 Yeah.
0:32:42 Tell us about it.
0:32:43 Yeah.
0:32:48 It’s, uh, thanks for mentioning, uh, it’s, uh, it’s, uh, it’s called tech on drugs and I
0:32:54 basically interview, uh, interesting people from walks of life and mostly scientists and
0:32:59 clinicians, I would say, who are coming up with new innovative technologies, uh, whether
0:33:04 it’s computational and sometimes they’re experimental as well that allow us to, you know, bring
0:33:05 drug development to the next stage.
0:33:10 And there’s a huge amount there about AI, uh, from all.
0:33:16 Well, like I was saying, we’ve, I, I, I’d heard of, you know, protein structure, uh, prediction
0:33:16 before.
0:33:17 Right.
0:33:19 So we’ve, we’ve talked a little bit about it on the pod.
0:33:22 So I imagine you have plenty of fertile ground to cover there.
0:33:24 Uh, tech on drugs, tech on drugs.
0:33:25 Yes.
0:33:25 On Spotify.
0:33:26 Okay.
0:33:27 And it’s available Spotify.
0:33:29 All the regular channels.
0:33:30 Yes.
0:33:30 Fantastic.
0:33:32 So check that out as well.
0:33:36 More information about Cytoreason, the website, cytoreason.com.
0:33:39 Is there a research blog, other social channels?
0:33:41 Do you cover all that on your podcast?
0:33:41 Right.
0:33:44 So not on the podcast, but there’s a website.
0:33:46 There’s a, we’re on LinkedIn, uh, quite actively.
0:33:51 And, and, uh, so that’s probably the best resources to get in touch with, uh, folks at
0:33:51 Cytoreason.
0:33:52 Great.
0:33:56 Well, Shai, uh, again, thank you so much for making the time to talk with us.
0:34:01 Like I said, it’s vital work as, as you kind of alluded to, and we both mentioned, you
0:34:05 know, coming up with ways to extend and improve people’s lives, but the energy you bring to
0:34:08 it and that sense of like, yeah, let’s get this done.
0:34:09 The next cool things around the corner.
0:34:10 I think it’s awesome.
0:34:11 It’s really inspiring.
0:34:14 And for me, you know, personally, I’ll carry it, carry that with me.
0:34:19 So thanks again for taking the time, all the best of luck to, to you and your teams.
0:34:19 Thank you so much.
0:34:20 So.
0:34:50 Thank you.
0:34:51 Thank you.
0:34:51 Thank you.
0:34:51 Thank you.
0:34:51 Thank you.
0:34:52 Thank you.
0:00:20 language models, video models, reasoning models, and foundational models. And here on the podcast,
0:00:25 we’ve talked a lot about healthcare-specific AI models for things like protein structure
0:00:30 prediction. Well, today, we’re exploring disease models. The Cytoreason disease model is a
0:00:35 comprehensive model of human diseases that models and compares treatments in patient groups,
0:00:41 helping researchers of all levels make data-driven decisions across the drug development life cycle.
0:00:46 That brief description doesn’t really do justice to what disease models and Cytoreason as a company
0:00:52 are all about, but our guest is here to help. Shai Shen Or is co-founder and chief scientist at
0:00:57 Cytoreason, and professor of systems immunology and precision medicine at the Technion,
0:01:02 the Israel Institute of Technology. Shai is here to tell us all about Cytoreason,
0:01:07 how it got started, why the technology is so important, and what they’re trying to do.
0:01:12 And we’re grateful to have you here. So, Shai, welcome, and thanks for joining the AI podcast.
0:01:17 Oh, thanks, Noah. Pleasure to be here. Thank you for inviting me to speak about my favorite subject,
0:01:17 I guess.
0:01:22 Please, and just take it right from there. I don’t even need a more pointed question. Tell us
0:01:27 about your favorite subject, maybe start with your background a little bit, and then tell us how
0:01:30 Cytoreason came to be and what it’s all about.
0:01:38 Sure. So, yeah, I’ll go back, I guess, at this point, I can think of myself a bit of a dinosaur in
0:01:44 the space of doing computational biology, data science. I started back in the, I guess, late 20th
0:01:49 century, as they say, with the idea where basically, you know, we’re just starting, the human genome is
0:01:56 getting sequenced, and the realization that biology is making a leap from a one tube, one result
0:02:01 type field to one tube, a million results type field.
0:02:02 Okay.
0:02:08 And suddenly there’s room for, you know, what evolved to what I think now we think about is
0:02:14 data science and AI in the context of medicine and life sciences and healthcare.
0:02:21 And for me, that discovery of falling in love with biology, you can do, I was kind of doing
0:02:27 a lot of stuff around AI in the late 90s, as they say, very different space.
0:02:29 Eons ago, yeah.
0:02:35 But discovering that you can actually use the same kind of the same type of thinking, but
0:02:42 in a space such as life sciences and healthcare was to me profound and kind of changed my life
0:02:48 course, realizing that in this space, actually not only, you know, is the data interesting
0:02:54 and there’s a good, I think, humanitarian cause, but also are the AI challengers are profound
0:03:00 because this data is, I often call it deep more than big, right?
0:03:05 I have, you know, a big experiment is a million measurements on a hundred people, right?
0:03:10 And so there’s way more, you know, way more features of P is greater, greater than N type
0:03:14 and that brings really interesting problems.
0:03:19 And how do you build machine learning models that actually overcome this when there’s not
0:03:23 that much of a repeat kind of information to, to learn form.
0:03:24 Right.
0:03:27 And that, that brings in a lot of prior knowledge and we’ll get to talking about that, I guess.
0:03:28 Okay.
0:03:29 Yeah.
0:03:31 So that’s kind of, I came from this systems immunology.
0:03:33 I’m a faculty member at the Technion.
0:03:38 And I realized, I guess, in the 20 years I’ve been doing this type of work, I realized that
0:03:44 as biology were kind of fascinated with that two, one million results, I realized that we’re
0:03:52 actually in this amazing times where data is exploding, but value, insight from it does
0:03:54 not explode in the same rate, right?
0:03:59 The gap between data and insight, actually you can think about it like data is exponential,
0:04:00 insight is linear.
0:04:06 Every day, every day percent data utilized to give insight is lower.
0:04:10 And the question is, how do you overcome this?
0:04:16 And how do you develop these techniques to ultimately bridge what I call the data insight
0:04:16 gap?
0:04:20 And, and, and whereas biologists, molecular biologists have investing a huge amount in
0:04:25 making this amazing tools that can measure basically now every layer of human biology
0:04:31 comprehensively, the analytical side of this and the AI solutions for this have been missing.
0:04:38 The field is still largely a manual field where you give people some data, they sit in front
0:04:42 of their computer, you know, they try to figure out, they make some value and insight for this.
0:04:45 And I figured that’s not a sustainable solution.
0:04:52 And this field needs to move to ultimately build much larger integrative solutions that bring
0:04:59 in many different angles of machine learning, AI statistics, and so forth to ultimately bridge
0:04:59 this.
0:05:03 And it needs to be done in a way that’s all, you know, ultimately is reproducible and productized.
0:05:05 And that’s kind of what launched Cideries.
0:05:13 So I, we founded Cideries in 2016 with the aim of basically building a pharma AI company that
0:05:16 is not a biotech company that does not develop drugs.
0:05:23 It develops an analytical platform, an integrated AI solution to bridge the data inside gap.
0:05:25 So that’s, I guess, our origin story.
0:05:33 And so then if you’re not, if Cideries isn’t a biotech company, do you serve biotech companies?
0:05:34 Who are some of the customers?
0:05:40 Or maybe if that’s not the right way to get into it, tell us a little bit about what Cideries
0:05:40 offers.
0:05:47 So I think it’s actually a great place to get into it because this data inside gap exists
0:05:48 throughout life sciences.
0:05:52 Everywhere now, biology is a big or deep data field, as I said.
0:05:58 And the question now is, you know, okay, well, you know, where is it going to be the matter
0:06:01 the most to bridge the gap between the data and the inside?
0:06:09 And nowhere is it more important or cost effective and I think ultimately brings the right utility
0:06:11 to humanity than to close it in drug development.
0:06:13 I don’t know if you’re familiar with the numbers.
0:06:14 They’re horrendous, right?
0:06:22 If, you know, a drug today costs $2.5 billion to develop, most of that cost is actually, it
0:06:24 needs to overcome the failures.
0:06:25 It’s a failure business.
0:06:30 Most drugs that you try to develop, even if from the first drug that you put into a human,
0:06:35 the likelihood of actually failing is 90% to ultimately not making it, right?
0:06:38 And if you’re talking about many subdrugs, if you’re a pharma company and you need to be
0:06:43 developing and you have many different assets that’s being developed, well, you need some
0:06:48 kind of a scalable solution to ensure that your success rate over time grows.
0:06:52 And it’s, you know, it’s a no brainer that it needs to be a data-driven solution.
0:06:56 So Cytareason customers are some of the world’s largest pharma companies.
0:07:04 Pfizer, Sanofi are, you know, examples of companies that Cytareason has longstanding relationships
0:07:09 with, but the same problems that happen with Pfizer, you know, who’s developing hundreds of
0:07:11 molecules and layers, it happens in a biotech.
0:07:17 So anybody who’s developing a drug, and I would even argue diagnostics and so forth, has this
0:07:21 problem of how do I make decisions that are data-driven at scale?
0:07:21 Right.
0:07:31 And so do the Cytareason models allow researchers, pharmacists, to predict the effects of a drug
0:07:32 they’re working on?
0:07:34 How does it, how does that work?
0:07:35 Just kind of in lay terms.
0:07:36 Sure.
0:07:41 So, so, you know, in terms of the user base for the, for, for it’s really the Cytareason platform
0:07:42 is an enterprise solution.
0:07:47 You know, we’re trying to address the needs of data scientists who, you know, whose work
0:07:49 because of the data inside gap keeps growing.
0:07:50 Right.
0:07:51 Right.
0:07:56 And, and as you said, you’re, you’re working with some of the biggest pharma companies and
0:07:57 institutions in the world.
0:07:59 So this is a lot of data, big gap, getting bigger.
0:08:00 Correct.
0:08:00 Correct.
0:08:06 And so what a data scientist needed to do, you know, two, three years ago in a pharma
0:08:08 company, it’s like, keeps growing.
0:08:12 It’s like TEDx because the data keeps exploding and nobody wants to make a decision.
0:08:13 Yeah.
0:08:18 Having just suffered from the problem that they didn’t have time to take the most appropriate
0:08:21 data set and analyze it and figure out what it is that they need to do.
0:08:21 Right.
0:08:27 So it’s data scientists, it’s biologists who are not necessarily programming, though this
0:08:29 is becoming less of an issue, I think, now.
0:08:29 Yeah.
0:08:34 You can touch on that, but who are ultimately driving their particular drug programs and
0:08:40 need to make decisions in the, around those in the context of the competition, the standard
0:08:43 of cares, what other pharma are, so we’re doing.
0:08:49 And it then goes up to, you know, heads of therapeutic areas who need to choose what is the, you know,
0:08:53 not only do I want to develop this drug, what’s the right disease to go after?
0:08:55 Well, you know, there’s many diseases.
0:09:00 They’re only going to give me so many shots on goal to fail or succeed.
0:09:02 I need to make those choices, right?
0:09:06 Some of those considerations are commercial, but many of them are scientific.
0:09:07 And that’s what Sider Reason brings to the table.
0:09:10 And it goes on and on in that space.
0:09:14 You can think about portfolio management, people who make strategic decisions.
0:09:15 Those are the user base.
0:09:21 And Sider Reason basically brings in all the world’s molecular data in humans right now, integrates
0:09:28 it into a single model that allows us to learn from this and ultimately support decisions,
0:09:31 use cases, such as how do I prioritize a target?
0:09:34 Which target or combination of targets are prioritized?
0:09:37 Which diseases are I prioritized for my next trial?
0:09:42 Or what subpopulations should we be excluding or including from the trial?
0:09:43 Because they’ll succeed.
0:09:50 So those are really expensive decisions, complicated ones.
0:09:54 And we basically try to bring a yardstick to all the science, the molecular science that’s
0:09:55 out there.
0:09:56 Right.
0:09:57 It’s a big yardstick.
0:10:00 I want to get to how you decided to start building agentic workflows.
0:10:05 And if there was something specific, specific kind of challenge, whether on the science end
0:10:09 of things or in wrangling different types of data and that kind of thing.
0:10:14 But maybe walk us through a little bit kind of how you architected, how you built Sider Reason
0:10:19 and then, you know, describe now the agentic workflows you’re using and go into why a little
0:10:20 bit.
0:10:20 Yeah.
0:10:20 Sure.
0:10:23 I mean, I think in some ways I already gave a bit of a clue.
0:10:24 Yeah.
0:10:27 Because I described the data insight gap, right?
0:10:33 So imagine you live in a field and I’ll give you an example just to make this kind of real.
0:10:37 If you’re talking about the scientific literature, my field, I mentioned in the beginning, I’m
0:10:39 a systems immunologist.
0:10:40 I work on the immune system.
0:10:44 In immunology, every two minutes, a new paper comes out.
0:10:45 Wow.
0:10:45 Okay.
0:10:47 It’s kind of humbling, right?
0:10:48 Every two minutes.
0:10:53 And even if you’re going to argue that half of them are not worth reading, you’re still going
0:10:53 to run out of time.
0:11:00 Now, similarly, if you’re talking about single cell data, RNA-seq data, gene expression,
0:11:05 proteomics, all, you know, things that, you know, our audience may be familiar with or
0:11:06 not, it doesn’t matter.
0:11:11 That data is like, while I’m sitting here working, while we’re talking, there’s data coming out
0:11:15 that could be very valuable to drive my decision.
0:11:22 So Cider Reason as a company is, by its own definition of its goal and vision, needs to
0:11:26 somehow beat this exponential growth in data, right?
0:11:33 So it immediately says, I say this to every employee at Cider Reason, you know, say 80%
0:11:36 of your time you spend on whatever your job is.
0:11:43 20% you have to spend on how do I make my job obsolete and automated so because I have the
0:11:45 next challenge to do.
0:11:51 Because that data, if we don’t, we’re going to be beaten by the avalanche of data coming
0:11:51 in.
0:11:58 So that, if you think about this as a pitch for agentic AI, you know, it doesn’t get better
0:11:58 than that.
0:12:03 I’m just, I’m seeing a commercial in my head with like agents and doctor scrubs and
0:12:08 track shoes, you know, just running as fast as they can to stay ahead of the data, just
0:12:08 piling up.
0:12:09 Right, right.
0:12:10 But yeah, that’s, yeah.
0:12:10 Exactly.
0:12:15 So you basically are constantly in a game in which you need to make it faster.
0:12:18 Just, you know, it’s actually what’s called, you know, the, in evolution.
0:12:21 And then you remember Alice in Wonderland, the Red Queen?
0:12:22 Sure.
0:12:22 Right.
0:12:27 Where she said to Alice, you have to run just to stay in place.
0:12:27 Yeah.
0:12:27 Yep.
0:12:28 Right.
0:12:32 And it’s also an evolutionary principle of how, you know, viruses in the immune system
0:12:33 combat.
0:12:34 This is another topic.
0:12:36 I can talk to you about this another time.
0:12:37 But the Red Queen effect.
0:12:45 So this need for us to continuously run is a huge driver for automation, acceleration, and
0:12:53 I would even say the cognitive meta-analysis that we as humans need to do to somehow describe
0:12:56 to a machine how we make decisions so that we can automate them.
0:12:57 Right.
0:13:01 So with that in mind, you know, I think almost Saturday reason had the thought that we need
0:13:03 Agenda KI even before Agenda KI was there.
0:13:03 Right.
0:13:03 Yeah.
0:13:07 And of course, when it came around, we jumped on the bandwagon.
0:13:08 Yep.
0:13:08 Yep.
0:13:08 Yep.
0:13:08 Yep.
0:13:11 And so, so it starts at the earliest stages.
0:13:11 So it starts at the earliest stages.
0:13:13 I need to bring the data in.
0:13:14 Right.
0:13:18 To bring the data in, you could go the manual route, right?
0:13:24 Which is like to have people bring it in one by one, totally unsustainable given the data
0:13:25 keeps growing.
0:13:27 A paper every four minutes if we include the bad half.
0:13:28 Yeah.
0:13:29 That’s a lot of data.
0:13:29 Yeah.
0:13:32 Or, you know, data sets and so forth.
0:13:33 Like at every molecular level.
0:13:39 So you really cannot do this manually and strive to get that level, right?
0:13:43 You can build pipelines and automate that you want to process this.
0:13:48 And as soon as you do this more and more with kind of molecular level data, you realize there’s,
0:13:51 in biology, has a lot of these exceptions and outliers and so forth.
0:13:57 And so ultimately, a more appropriate solution is to teach a machine a workflow that may be
0:14:01 very complicated, where humans make decisions, but you can see it, and then start that
0:14:02 automating process.
0:14:04 Now, would it work perfectly from day one?
0:14:07 Depends on the complexity of the data that you’re going to.
0:14:12 But if you then, you know, you put a QC process that you start with manually and then
0:14:18 you make that automated as well and so forth, you can build processes that really accelerate
0:14:19 your data intake.
0:14:23 And that’s just the most obvious place where the agendic AI comes in, right?
0:14:27 It can come in other places around, you know, decision supports.
0:14:29 We’ll talk about this one.
0:14:35 So thinking about keeping up with the data, the literature in particular, were there specific
0:14:41 techniques or, you know, there are obvious advantages we’ve talked about, just agents being
0:14:46 able to go out and do the research and grab the data kind of obviously is a, you know, a game
0:14:52 changer, but other things that you discovered about working with agents to curate and review
0:14:56 the medical literature in particular that jump out at you?
0:14:59 Well, I think it’s a wonderful question.
0:15:01 I’ll answer, I think, on two angles.
0:15:08 So first of all, just to say why in such a data-rich field somebody needs literature, which
0:15:10 you can think about it as data, right?
0:15:11 You can say, well, that’s data, right?
0:15:17 But I actually want to uniquely identify that data from other data because I would argue that
0:15:23 literature is already at a stage of knowledge and biology has a lot of data that isn’t yet
0:15:23 knowledge.
0:15:27 Ciderism deals with all the data, but also the literature.
0:15:32 And the reason we need it is because of, well, A, people want the knowledge, right?
0:15:37 But the other side, which is more interesting, is when I describe that this is a deep data
0:15:42 field where there’s way more features than there are kind of measurements or samples and
0:15:48 so forth, the way that people make decisions, you basically cannot just stick this into, you
0:15:53 know, a machine learning model and it’s going to be basically an overfit, right?
0:15:58 And the way you kind of deal with this is actually by the integration of prior data, which comes
0:16:02 to the literature that allows you to narrow down the search space in a variety of fields.
0:16:03 Right, okay.
0:16:04 It has two advantages.
0:16:08 One advantage is that you do make, it’s easier for you to make discoveries.
0:16:12 And there’s another advantage, which is relates to our customer base.
0:16:17 And I think in general, on how people make decisions, large decisions in the face of uncertainty,
0:16:24 which is that they want to stand on the shoulders of giants or at least stand on some level of
0:16:25 confidence, right?
0:16:33 So being able to connect new novel discoveries, emerging phenomenas and so forth that the AI
0:16:40 model produced to knowledge that I, you know, I solidly believe in a firm is actually an important
0:16:46 thing for our customer base and for any scientist to actually make the leap, right?
0:16:50 Because it’s going to, the next stage could be, you know, it usually would be an experiment.
0:16:54 It can sometimes be a very expensive experiment.
0:16:57 And people, it’s not enough just to have a predictive model.
0:17:02 People are seeking from our, our customers are seeking from us and where we strive for it
0:17:04 to be a mechanistic model.
0:17:08 Explain to me why that prediction makes sense.
0:17:08 Right.
0:17:10 And give me trust in it.
0:17:12 And the literature brings that piece in, right?
0:17:17 So from an agendic and I perspective, that means also, for instance, as an example, confidence
0:17:21 scores on the literature are a really key thing for us, right?
0:17:23 Because this literature is complicated.
0:17:29 There’s not a huge amount of instances of any one event being, how sure are we that this
0:17:36 particular kind of description of a biological event is actually correct?
0:17:43 And that for us was a huge piece of entering of how we kind of been pushing the LLMs within
0:17:45 CiderEase and then the agendic AI workflows.
0:17:47 And it goes, I kind of mentioned, it goes everywhere here.
0:17:53 We need to be really sure, you know, gen AI is awesome and agendic AI, but we need to be,
0:17:54 we need to have the high quality.
0:17:57 And so we’ve been putting a lot of these guardrails, if you like.
0:17:57 Yeah.
0:17:59 Where do the confidence scores come from?
0:18:01 Are you, is CiderEase generating them?
0:18:02 Are they in the literature?
0:18:07 So you basically can come up with a variety of different techniques in which by sampling
0:18:09 the literature, right?
0:18:13 And also fitting, you can, you know, by sampling the literature, you can build that confidence
0:18:17 on one hand by putting an LLM RAG component, right?
0:18:22 So you’re actually doing retrieval augmented generation and kind of querying this to be more
0:18:24 certain about what it is that I’m looking for.
0:18:24 Right.
0:18:26 All of those, and there’s a variety of other techniques.
0:18:32 There’s also the kind of the, what we call biological expectations or bio-credibility in the end
0:18:33 to check ourselves on this.
0:18:35 And so that it’s a loop that keeps improving.
0:18:40 All of those are techniques that allow us to basically build the confidence that we need
0:18:41 for, for these heavy decisions.
0:18:47 On one hand, to leverage the necessity to leverage an AI and a genetic guide to basically move forward
0:18:48 and do this on a large scale.
0:18:50 And on the other hand, to ensure the confidence is high.
0:18:53 I’m speaking with Shai Shenor.
0:18:59 Shai is co-founder and chief scientist at Cytoreason, the company we’ve been talking about.
0:19:04 And he’s also professor of systems immunology and precision medicine at the Technion, Israel
0:19:05 Institute of Technology.
0:19:12 We’ve been talking about Cytoreason and just recently the importance of building trust in
0:19:13 the model’s output.
0:19:17 And, you know, it’s something that applies to generative AI in any situation.
0:19:23 But as you were saying, Shai, in biology, in precision medicine and drug discovery and
0:19:28 pharma and everything, these decisions are both, you know, literally can be life and death for
0:19:33 lots of people, but also quite expensive and involve, you know, saying go involves a lot of
0:19:35 resources being put to use, a lot of money being spent.
0:19:41 And it made me think, Shai, we’ve had a couple of conversations on the podcast with folks in
0:19:48 the protein sequence prediction and generation space and other drug discovery related spaces.
0:19:54 And I’m wondering about Cytoreason’s place in the workflow, in the researcher’s workflow or
0:19:56 the end user, whoever’s using it.
0:20:01 And, you know, when you mentioned about making these decisions and experiments being expensive,
0:20:03 I’ve talked to folks before.
0:20:11 I’ve read about folks using AI models, generative AI, to do sort of simulated experiments, right,
0:20:16 before moving to the wet lab, being able to run and kind of narrow down which ones are worth
0:20:18 the cost and the effort to do.
0:20:23 Are your customers using Cytoreason kind of in a same way or what does a workflow look like?
0:20:28 And then out of that, I wanted to ask you if there’s a time you can share with us where
0:20:34 Cytoreason’s workflow enabled something, you know, really unique from an end user.
0:20:36 So let me talk a little bit about the workflow if you could.
0:20:37 Sure.
0:20:42 So from a workflow perspective, there’s maybe two points to say.
0:20:47 And it sounds like you have been talking to some interesting people doing interesting stuff around
0:20:47 protein stockers.
0:20:50 So I’ll differentiate ourselves from it.
0:20:54 So Cytoreason is a company, and this is also interesting, I think, almost from the NVIDIA
0:20:57 kind of marketplace and kind of the company.
0:21:04 Cytoreason is a company that NVIDIA invested in, and I think we stand out uniquely within those.
0:21:10 Because if I look at the healthcare flow, right, there’s the chemistry of it, right?
0:21:15 There is the biology of it, and there’s the kind of clinical side of it.
0:21:15 Okay, right.
0:21:21 And if you look where AI has been playing a big role at this point, it’s certainly been in the
0:21:24 chemistry space, chemical structure, and I would put protein structures there as well.
0:21:29 Like from small molecules libraries to protein structures, there’s a huge amount that’s happening
0:21:36 with kind of, you know, NVIDIA GPUs and, you know, generative AI and so forth to basically
0:21:40 build those molecules, and of course, anything that gets, you know, built there, there’s a
0:21:43 simulation test, but ultimately somebody puts in experimental tests.
0:21:50 And experimental tests are usually, I would say, at early stages, they are what would be called
0:21:51 an in vitro experiment.
0:21:57 There’s no animal, there’s no human, you know, you’re just texting to see, well, okay, it was
0:22:03 this antibody that I just kind of simulated when I generated, does it hold the properties and
0:22:05 can it be a good, you know, a good direction.
0:22:11 On the clinical side, there’s also a lot of, I think, agentic AI happening with a lot of
0:22:17 kind of shortening, say, kind of recruitment for clinical trials and so forth, right?
0:22:20 There’s a lot of space happening in the electronic and medical record.
0:22:22 It’s a relatively well-defined space.
0:22:27 There’s a huge amount of almost like human operations that goes there, and then I think
0:22:29 agentic AI has been playing a big role in.
0:22:34 Side of reason is quite unique in that we’re focused on the biology side of things.
0:22:38 And biology, if you compare them to the two, is actually the biggest unsolved problem.
0:22:39 Yeah.
0:22:45 I would say today, if I look at pharma, the two big problems from a science perspective are
0:22:51 one is that we don’t have a good understanding of the biology, and you see it in clinical trials
0:22:56 that phase two, which is the first time we tested in humans, is where the biggest failure
0:22:57 rates are.
0:22:57 Right, okay.
0:22:58 Okay, so it tells you.
0:23:02 And then the second piece is the human diversity.
0:23:08 So the biology, it can be, you know, you and I may not have, you know, we may have the same
0:23:09 indication and so forth.
0:23:12 We actually may look very different and could be for different causes.
0:23:14 We still don’t have a good understanding of this.
0:23:16 That’s where side of reasons are playing.
0:23:22 And bringing AI there is, you know, the search base is way, way, way bigger than the chemists.
0:23:25 And so it’s an early stage to build on.
0:23:30 But it’s clearly the biggest problem in that I think where we’ll see companies going on.
0:23:32 And certainly that’s kind of where we’ve been kind of leading.
0:23:38 And NVIDIA kind of putting the, or, you know, I think our trust in us has been a huge thing
0:23:39 for us.
0:23:44 So, you know, I think if I look at that space, how are the users behaving there?
0:23:46 Well, first of all, they need to explore the disease biology.
0:23:49 And then they need to think about their use cases.
0:23:55 And again, the use cases is what would be a good target to choose from, given I need to
0:23:57 have it work in this particular disease.
0:24:02 And given that I know, you know, this is, you know, this disease already has a bunch of
0:24:04 standard cares that I need to beat.
0:24:04 Right.
0:24:07 And I know that there’s people who are not responding.
0:24:11 And what is it that’s about them that maybe I can target?
0:24:16 And there’s other companies that are developing and there may be, you know, out to market before
0:24:16 me.
0:24:22 So they need to, all of this commercial questions need to come into a scientist thinking about
0:24:26 disease biology and saying, where’s the niche that I can come in?
0:24:33 And so whether it’s target prioritization or bigger than this indication, choose the next
0:24:34 clinical trial.
0:24:35 Is it happening in RA?
0:24:37 Is it happening in Crohn’s disease?
0:24:39 Is it happening in, you know, in Alzheimer’s?
0:24:42 That’s not an easy, those are the use cases.
0:24:46 And so if you think about what side reason brings in, it tells this particular target is the best
0:24:48 priority to go for this disease.
0:24:51 Here’s a bunch of mechanisms why we think this is the case.
0:24:55 And the users can go and do small tests.
0:24:58 It’s very different from the protein structure ones I mentioned before.
0:25:05 But small tests that actually validate, you know, the top hypotheses, build confidence in
0:25:09 the AI prediction, and then you go and execute on it.
0:25:12 What’s the feedback been like from your users?
0:25:14 And I’m wondering, I mean, go anywhere you want with this.
0:25:19 I’m wondering if there are certain areas that have been brought to your attention to focus in
0:25:25 on whether it’s that the users have been kind of poking at a certain area and wanting more
0:25:31 functionality, or if maybe something you didn’t expect popped up and it’s a different path to
0:25:31 look at.
0:25:31 Sure.
0:25:35 So I think in general, so it’s a very interesting question.
0:25:37 There’s a lot of these, right?
0:25:39 So in general, the users want a lot.
0:25:40 Yeah, no, right.
0:25:43 What does any user want right now?
0:25:43 Yeah.
0:25:48 But I’ll just mention a few directions that you’ll just see how they themselves struggle,
0:25:48 right?
0:25:53 So on one hand, you can think about this as I’m invested in a particular asset.
0:25:58 I just paid or I invested a huge amount of hundreds of millions of dollars to manufacture
0:25:58 a drug.
0:26:00 And what I want to do is deepen.
0:26:07 I want to study that, get every possible layer and model every possible layer here that my
0:26:08 prediction is the best.
0:26:13 And on the other hand, orthogonal to this is you can say, and this is, you know, obviously
0:26:17 like a person who’s a program lead for that drug.
0:26:19 If you ask him, that’s what he wants to do.
0:26:25 Then you go to somebody who’s in charge of an AI strategy for a pharma company or is the
0:26:26 head of a therapeutic area.
0:26:28 And they say, well, that’s one drug.
0:26:35 I obviously, you know, it’s, I care about it, but I have a hundred drugs that I actually
0:26:40 am simultaneously developing and I need to evaluate them across tens, if not sometimes
0:26:41 hundreds of diseases.
0:26:43 We need scale, right?
0:26:50 Those two are orthogonal and you need to basically care to do it because to both, because the science
0:26:51 is always in the depth, right?
0:26:55 And the commercial problems are often in the bread, right?
0:26:56 And you need to do them, right?
0:26:59 Other pieces of challenges come from new data types.
0:27:04 Biology keeps inventing, or biologists, new measurement modalities.
0:27:10 So I can model a tissue and say, here’s the mRNA in it, or I can model a tissue and say,
0:27:16 well, I’ve modeled, I took the mRNA, I’ve developed methodologies to describe that this biopsy
0:27:17 actually is made up of cells.
0:27:23 And now new technology allows me to say, well, I can tell you the geographical position of
0:27:24 every cell and how they interact.
0:27:28 So as new technologies come out, well, let’s add them into the model.
0:27:32 And they never come out with a huge amount of data in this field that’s deep data.
0:27:39 It’s like, we have 10 samples that we generated with a new technology that each file is a
0:27:40 gigabyte of data.
0:27:47 And so again, it comes to this, how do I enter a prior in so that, you know, on one end, I’m
0:27:52 aware of the fact that I only have 10 samples and the world’s population diversity is bigger
0:27:54 than 10 patients for this, right?
0:27:57 On the other hand, I use this new technology to contain.
0:28:03 And so studies and models from a perspective, and the word model here is deceptive in some
0:28:03 sense.
0:28:07 We develop what I would call hybrid models, right?
0:28:11 So on one hand, we have services that are deep learning and LLM and so forth.
0:28:16 And on the other hand, we have places where, you know, it would be standard traditional statistics
0:28:22 and statistical learning and rule-based because the problem, the richness of the data is so
0:28:23 big.
0:28:26 Like, you know, there’s very few places in biology today.
0:28:30 You can just stick them into a deep learning model and you’ll get good performance, right?
0:28:33 Maybe it’s images and genetics and protein structure.
0:28:35 Everywhere else, there’s just not enough data.
0:28:37 And you need to somehow overcome these things.
0:28:43 And so we build our model is ultimately an integrative framework that calls a lot of different
0:28:49 services that has many different solutions to each tailored for the different components
0:28:50 and then integrates them.
0:28:55 What do you see the future of biomedicine?
0:29:01 What do you see the researcher, scientist, you know, sort of the job look like and specifically
0:29:06 the tools, right, in a few years, you know, whatever the timeframe is, two, three years,
0:29:10 five years, 10 years, whatever timeframe makes sense to you from what you’ve seen.
0:29:12 What do you see that role looking like?
0:29:18 And what do you see the technology component looking like in a couple of years?
0:29:23 And I’m thinking about both, you know, everything you described in the industry and balancing
0:29:27 research and science and all of these, you know, the data and everything, but also something
0:29:31 you said about what you say to your own employees when they, I don’t know if you said
0:29:36 when they start, but, you know, like you need to figure out how to automate, you know,
0:29:39 make yourself obsolete, automate what you’re doing away because there’s so much more we
0:29:40 have to do.
0:29:42 So what, yeah, where are we headed?
0:29:43 I think it’s a wonderful question.
0:29:47 Obviously, I will only claim this as my viewpoint.
0:29:48 Exactly.
0:29:49 Yeah.
0:29:54 I feel like, you know, I personally feel I’ve been, I’ve been blessed by that.
0:29:56 I encountered biology when I did.
0:29:56 Yeah.
0:30:01 And then I ended up in what is simultaneously an infinite field, right?
0:30:09 As I, we will not solve all of biology in my lifetime, even with agentic AI and so forth.
0:30:16 And on the other hand, a field that has been right to start thinking in a more, I often call
0:30:21 it engineering fashion, but a rule kind of basically building principles on which you can actually
0:30:23 teach machines to help you.
0:30:29 So from my perspective, as I look at this and I, you know, I think about the job of computational
0:30:34 biologists and the job of biologists and the job of clinicians, all of which are critical
0:30:37 to ultimately bring that healthcare to patients.
0:30:43 I think all of these people have been blessed with, with now solutions that allow them to take
0:30:47 yesterday’s thing, automate it to a level they could never imagine.
0:30:53 It was like a science fiction thing and then get busy with the next cool thing that they
0:30:54 couldn’t even imagine.
0:30:55 Yeah.
0:30:55 Right.
0:30:58 And, you know, I’ll give you another example in biology.
0:31:00 You know, you keep discovering new things.
0:31:02 It’s a field of unknown unknowns.
0:31:07 Oftentimes when I bring in data scientists who never had any exposure to biology, one of the
0:31:12 things they struggle with inside a reason is they expect to have a gold standard, like that
0:31:13 I know what the truth is.
0:31:15 And I’m like, well, we don’t know.
0:31:20 We, you know, we have, we have, we’re, we’re continuing to see vipses.
0:31:23 We’re in an unknown, unknown space.
0:31:23 Right.
0:31:28 And so, so I think the challenges, it’s not the only field in science in which this is the
0:31:31 case, but, but I think those, those challenges are amazing.
0:31:38 And actually the necessity for us, the obligation that we have, I think, to bring in AI and machine
0:31:43 learning in, to accelerate our, our ability to actually bring cures to people.
0:31:51 I see this as an obligation and I’m not afraid of the situation of, you know, of suddenly a
0:31:54 machine doing what it is because there’s always the next thing.
0:31:57 And, and, and it’s actually why I got into this, right.
0:31:58 Is the fascination with the discovery.
0:32:04 And so I think that’s a good way of giving hope to the publisher.
0:32:04 Absolutely.
0:32:05 Yeah.
0:32:05 No, no, no.
0:32:06 Absolutely.
0:32:11 Shy, I think that was a great place to end kind of a, um, an uplifting, I don’t want to
0:32:15 say hopeful because it implies, you know, a lack of hope in other situations, but like
0:32:20 you said, there’s, there’s no end of, of hard problems and cool things to do.
0:32:24 And so, um, you know, using the tools to get the old ones done faster so we can get to the
0:32:25 new stuff.
0:32:26 It’s a great way of looking at it.
0:32:27 Usually shy.
0:32:32 I ask, I kind of wrap these episodes by asking the guest where listeners can go to find out
0:32:34 more about everything we’re talking about.
0:32:37 And I definitely want to do that with you, but first I understand you’ve got a
0:32:38 podcast to plug.
0:32:39 I do.
0:32:40 I do.
0:32:40 I do.
0:32:41 You play the host role.
0:32:41 Yeah.
0:32:42 Tell us about it.
0:32:43 Yeah.
0:32:48 It’s, uh, thanks for mentioning, uh, it’s, uh, it’s, uh, it’s called tech on drugs and I
0:32:54 basically interview, uh, interesting people from walks of life and mostly scientists and
0:32:59 clinicians, I would say, who are coming up with new innovative technologies, uh, whether
0:33:04 it’s computational and sometimes they’re experimental as well that allow us to, you know, bring
0:33:05 drug development to the next stage.
0:33:10 And there’s a huge amount there about AI, uh, from all.
0:33:16 Well, like I was saying, we’ve, I, I, I’d heard of, you know, protein structure, uh, prediction
0:33:16 before.
0:33:17 Right.
0:33:19 So we’ve, we’ve talked a little bit about it on the pod.
0:33:22 So I imagine you have plenty of fertile ground to cover there.
0:33:24 Uh, tech on drugs, tech on drugs.
0:33:25 Yes.
0:33:25 On Spotify.
0:33:26 Okay.
0:33:27 And it’s available Spotify.
0:33:29 All the regular channels.
0:33:30 Yes.
0:33:30 Fantastic.
0:33:32 So check that out as well.
0:33:36 More information about Cytoreason, the website, cytoreason.com.
0:33:39 Is there a research blog, other social channels?
0:33:41 Do you cover all that on your podcast?
0:33:41 Right.
0:33:44 So not on the podcast, but there’s a website.
0:33:46 There’s a, we’re on LinkedIn, uh, quite actively.
0:33:51 And, and, uh, so that’s probably the best resources to get in touch with, uh, folks at
0:33:51 Cytoreason.
0:33:52 Great.
0:33:56 Well, Shai, uh, again, thank you so much for making the time to talk with us.
0:34:01 Like I said, it’s vital work as, as you kind of alluded to, and we both mentioned, you
0:34:05 know, coming up with ways to extend and improve people’s lives, but the energy you bring to
0:34:08 it and that sense of like, yeah, let’s get this done.
0:34:09 The next cool things around the corner.
0:34:10 I think it’s awesome.
0:34:11 It’s really inspiring.
0:34:14 And for me, you know, personally, I’ll carry it, carry that with me.
0:34:19 So thanks again for taking the time, all the best of luck to, to you and your teams.
0:34:19 Thank you so much.
0:34:20 So.
0:34:50 Thank you.
0:34:51 Thank you.
0:34:51 Thank you.
0:34:51 Thank you.
0:34:51 Thank you.
0:34:52 Thank you.
Shai Shen-Orr, co-founder and chief scientist at CytoReason and professor at the Technion, talks about the next frontier in healthcare: disease modeling. Shai shares how CytoReason bridges the gap between exploding biological data and actionable insight, powering smarter, faster drug development for leading pharma and biotech companies.
Learn more: ai-podcast.nvidia.com
Leave a Reply
You must be logged in to post a comment.