Hippocratic AI’s Munjal Shah on How AI Agents Are Expanding Healthcare Capacity – Ep. 262

0
0
AI transcript
0:00:17 Hello, and welcome to the NVIDIA AI podcast from GTC 2025 in San Jose, California. I’m
0:00:22 Noah Kravitz, and I’m here with Munjal Shah, co-founder and CEO of Hippocratic AI, a startup
0:00:27 building a safety-focused LLM, large language model, for healthcare. Hippocratic recently
0:00:32 launched a healthcare AI agent app store and announced their Series B funding round, and
0:00:37 is at the forefront of the AI-powered healthcare transformation that’s happening all around
0:00:43 us. I’m excited to talk about AI and the big idea behind Hippocratic AI, the era of healthcare
0:00:48 abundance, with Munjal. So welcome, and thanks so much for taking the time to join the AI
0:00:50 podcast. I’m so excited to be here. Thank you for having me.
0:00:54 So let’s start with the basics. What is Hippocratic AI?
0:00:58 Well, you know, as you mentioned, we’re a safety-focused, large language model focused
0:01:04 on healthcare, and we really used it to build AI clinicians. So we have agents that operate
0:01:08 and reach out to patients and call them on the phone and say, you know, check in on them
0:01:13 post-surgery, and let’s look at your incision site, and is it getting infected, and, you know,
0:01:17 do you have enough of your medications, and do you need refills? So it’s really an agent
0:01:20 that talks to patients and delivers care.
0:01:21 So when was the company founded?
0:01:23 We started it about two years ago now.
0:01:27 Okay. And you’re in production now with agents interacting with patients?
0:01:34 Yeah, we’ve done, as of the end of this month, we will have done about 1.85 million calls to
0:01:36 patients all over the country.
0:01:40 What’s the reaction been like from the patients to getting healthcare from an AI agent?
0:01:44 You know, people always ask that question. They’re always like, well, what did the patients
0:01:44 think?
0:01:44 Yeah, yeah.
0:01:47 The average, I’ll give it to you in numbers, and I’ll give it to you in anecdotes.
0:01:47 Okay.
0:01:50 The average patient rating is an 8.95 out of 10.
0:01:51 That’s pretty good.
0:01:56 Yeah. And second, it’s, I think there’s about 30% who are like, I don’t want to talk to AI.
0:01:56 Right.
0:02:00 With a little bit of rebuttal, the AI goes, look, I can really help you.
0:02:04 And I don’t know when that human’s going to call you back because they don’t call you back
0:02:05 a lot of times.
0:02:05 Right.
0:02:12 Will you talk to me? It turns out to about 15% will ultimately leave, but the other 85%
0:02:13 will talk to it.
0:02:13 Right.
0:02:19 And within like 30 seconds, 60 seconds, when they realize this is not your grandfather’s
0:02:24 IVR, like this truly can understand you and talk to you and is empathetic, they just talk
0:02:24 away.
0:02:25 People just talk, yeah.
0:02:30 Yeah. I mean, and think about it in this day and age, like who really listens to every
0:02:36 word you say? Like no one ever. And I think that now what you realize is this thing listens
0:02:41 to every word and responds and pays attention and gives you its undivided attention. And that’s,
0:02:43 that’s gold in the modern age.
0:02:43 Right. Right.
0:02:47 Well, for what it’s worth, I’m going to go out on a limb and say, the listeners are hanging
0:02:51 on your every word right now. So I want to ask you about this idea of the age of healthcare
0:02:52 abundance.
0:02:52 Yeah.
0:02:53 That’s a North Star?
0:02:54 Absolutely.
0:02:55 Okay. And what does it mean?
0:03:00 I think that when we think about solving our healthcare problems, all of healthcare is premised
0:03:05 on this idea of clinical scarcity. The word triage assumes you don’t have enough.
0:03:06 Right.
0:03:07 You got to decide who to take first.
0:03:07 Right.
0:03:13 Population health uses this word called a risk stratification. We got to help those
0:03:19 most in need. What about those almost most in need? Like that’ll be in that need next year
0:03:20 if their condition keeps deteriorating.
0:03:21 Right.
0:03:25 Oh, we can’t help them because we have limited resources. And so I think we’ve always been
0:03:31 thinking about healthcare as saying, you know, we don’t have enough. How do we spread it around
0:03:37 instead of how do we get to a place where it’s infinitely abundant? And now, you know, we
0:03:42 do. We have these AI agents. We have an infinite supply of them. They speak every language. They
0:03:48 remember every conversation. They’re clinically safe. And they can take care of everybody at
0:03:53 all times. And I think clinical abundance is the way to solve a lot of the world’s problems
0:03:59 in healthcare. You know, imagine if everybody has a caregiver, you know, a care manager calling
0:04:03 them up and seeing how they’re doing and checking their blood pressure every single day. And I think
0:04:07 we’re beginning to enter that. And then there’s some other implications of it. You know,
0:04:11 today when there’s a heat wave, we don’t call every patient at risk at the hottest two hours
0:04:15 of the day and do a heat stroke assessment. And if they’re having issues, send them an Uber
0:04:19 to get them to a cooling center. You couldn’t do that without AI. Like you can’t get enough
0:04:23 humans together to do that every single day of a heat wave with only like five days notice.
0:04:23 Right?
0:04:23 Right, right.
0:04:24 But now you can.
0:04:30 This may not be the right phrase to use, but I keep thinking of the phrase last mile of delivery.
0:04:37 What happens when the patient needs to be seen by a human or needs some kind of physical
0:04:39 interaction beyond the phone call?
0:04:45 Yeah, you know, where we can operate today with technology is in this virtual care area.
0:04:45 Yeah.
0:04:47 But this is where we have our human clinicians.
0:04:48 Right.
0:04:53 Right? Like everybody’s like, well, you know, how does this relate to the human clinicians? I’m like,
0:04:59 we need the human clinicians to do all of the physical care that needs to be done. And in fact,
0:05:04 by focusing the AI on these areas, we’ll free them up to do even more of that. And so I think that
0:05:08 given the technology we have today, I think that we can do the virtual part,
0:05:11 but we’ll leave the human to do the physical part.
0:05:16 Right. So Hippocratic just launched Hippocratic AI’s healthcare AI agent app store.
0:05:16 Yeah.
0:05:18 Okay. How does that work?
0:05:23 So one of the things we realized was this is one of the powers of kind of general intelligence and
0:05:29 specific intelligence. Once you’ve built an LLM for healthcare, you know, then making a new use case
0:05:31 takes four minutes.
0:05:31 Right.
0:05:33 Right. You just write a different prompt.
0:05:33 Right.
0:05:38 We’re so used to software that’s a specific intelligence that it takes you three to six months to make a new
0:05:41 use case that people don’t realize, like, you can just make them quickly. I mean,
0:05:47 nobody goes to chat GPT and says, what use cases do you support? Right. Yet people ask us that
0:05:54 question all the time. And really the realization is, well, I support probably everyone you could think
0:05:55 of. Just try writing it. Right.
0:06:00 See what happens. So then we realized, oh, we can write these in four minutes. Okay. We could write
0:06:04 a ton of them, but we don’t have all the knowledge to write them. Why don’t we recruit every clinician
0:06:09 in the country to come be an author? You know, if you worked in a concussion clinic as a nurse for the
0:06:15 last 20 years and you know, all kinds of little details of what to ask in a way that maybe isn’t
0:06:20 standard protocol, but is an enhancement to it. Right. Yeah. Yeah. Why don’t you ask these questions
0:06:26 and write a new script and then put your script in our app store. We’ll validate it. We’ll run it through
0:06:32 safety testing. Um, and then once it’s live, um, you’ll get paid a portion of all the revenue it
0:06:37 makes. So I’m like leverage your intellectual property and expertise and experience over the
0:06:43 years and get paid while you sleep. And so it’s, but, and from our standpoint, we’re now crowdsourcing,
0:06:47 but only from clinicians. You actually have to send us your license number. Right. We validate that
0:06:53 you’re a licensed US clinician. And then we really say, Hey, your use case can help millions of
0:06:58 patients all over the country, not just the ones you personally can treat, giving you a scale of
0:07:02 impact that you never had in your career before. So I want to ask you about inference, all the talk
0:07:08 about generative AI, LLMs past couple of years, a lot of talk about training, the costs of training,
0:07:13 the energy of training, the data need for training. Now we’re all talking about inference. Yeah.
0:07:18 Hippocratic AI has been talking about inference. What is inference driven AI healthcare?
0:07:23 I mean, in our case, you know, we trained our model differently than others.
0:07:29 And we’ve always been focused on inference because our runtime is the key environment. So our model is
0:07:36 actually 22 models. It’s not one model. It’s one gigantic 400 B model doing the talking and it’s
0:07:42 19 supervising it and making sure it doesn’t say anything unsafe within this scope, this non-diagnostic
0:07:46 clinical scope. So we’re not, you know, we’re not a doctor, we’re not diagnosing, we’re not prescribing.
0:07:51 And then there’s another two deep thinking models that take 30 seconds to a minute to double check
0:07:56 everybody. Okay. On top of all that. Right, right. Well, that’s a lot of inference. Yep. That’s 22
0:08:03 models of inference. It’s 4.2 trillion parameters we’re running every time. Every time. Yeah. And so
0:08:12 we use up a ton. In fact, our entire instance today takes up over 128 NVIDIA H100 GPUs just to load in a
0:08:17 RAM. Wow. Before, you know, and now that can support many simultaneous conversations.
0:08:17 No, sure. Yeah, yeah.
0:08:22 Yeah. But to even spin up one agent takes a ton because we use so much RAM. Right. And so,
0:08:27 but this is all our focus is, is this inference, this inference stack. I think people haven’t thought
0:08:28 about inference. Yeah.
0:08:32 They haven’t built the infrastructure for it. In fact, this is some of the conversation I’m having
0:08:37 with a lot of people, including the NVIDIA team, but also some of the hyperscalers out there and just
0:08:43 saying, hey guys, you know, I, I don’t want to buy your servers for 24 hours, seven days a week, 365,
0:08:47 because I can only call patients during these hours. Right.
0:08:53 So I’d like on demand GPUs. Oh, I’ll give you on demand GPUs, but I’ll give them to you at five times
0:08:57 the price. I’m like, that doesn’t really help me. I need them for about six hours a day. So anything over
0:09:02 four, I’m better off having bought them all the time. But what I really need is you to sell me
0:09:06 them on demand. And so people are starting to come up with that technology. They’re starting to come up
0:09:12 with how do you spin up loading that much into RAM takes forever. So right now the time to spin up a
0:09:16 new, like if say this, this one instance gets saturated, we get too many inbound calls from
0:09:21 patients at the same time. It takes like 20 to 30 minutes to spin up another instance. Yeah.
0:09:26 The whole point of an AI agent is that you don’t wait on hold. Abundance.
0:09:32 It’s not abundance 30 minutes from now. And so, and even many of the hyperscalers,
0:09:37 you have to email somebody to get more servers allocated to you. Like it’s not a truly dynamic
0:09:43 thing the way it is on the non-GPU side. And so there’s a lot of new infrastructure needed
0:09:48 to really make this happen. I think the third part of this is actually something we’ve been
0:09:54 uniquely working with NVIDIA on. We have a different technology problem than a lot of the other players
0:10:00 in the LLM space. Most of them are doing text oriented search stuff and text interactions.
0:10:06 Well, they’re going another few more seconds in giving you a response like, you know, DeepSeek
0:10:10 takes, you know, the R1 of DeepSeek takes longer to give you an answer, but it gives you a deeper
0:10:15 answer, right? 30 seconds, 20 seconds in a text search is no big deal if you give me the perfect
0:10:16 essay so I don’t have to do my homework. Sure.
0:10:23 But in a voice conversation and all of ours are voiced, you have a 1.5 to two second budget
0:10:31 end to end. And so we’re really focused on latency. And so our inference isn’t a matter of trying to
0:10:36 optimize cost per token, but trying to optimize latency. And that’s a very different kind of focus.
0:10:41 And we do that everywhere. Like we’re working with NVIDIA on what to do on the chip level,
0:10:47 as well as the kind of additional infrastructure that NVIDIA provides. We’re also looking to do
0:10:51 that on the inference engines that we’re using. We actually had to take an open source and tune it a
0:10:56 different way because all the other ones are being tuned for throughput or cost per token, not for
0:11:01 latency. And so we basically worked on lots of different elements to really get the speed we need
0:11:04 out of this. So what’s the constellation architecture?
0:11:09 Yeah. So that’s the thing I was describing. We literally have multiple models double checking
0:11:14 each other. Right. And what people don’t realize is that a lot of the models now, they say you can
0:11:21 give a lot of input tokens to them now. Just put it all in there. It’ll figure it out. And with Gemini
0:11:25 is like what, a million, I think it is now, a million tokens. So it’s like, oh, okay, no problem.
0:11:25 Right.
0:11:27 But it can’t reason across it all.
0:11:27 Yeah.
0:11:30 They’ll show you examples of what we’ll call needle in haystacks, where it’ll be like, okay,
0:11:36 it’ll find that one thing. Yeah. I mean, grepping for a word is not that hard in computer science.
0:11:42 It’s like, we can find a word. But what you’re really trying to do is reason across it. So I’ll
0:11:47 give an example. If you ask your care manager, can I have ibuprofen? And they say, sure, you can have
0:11:51 ibuprofen, but don’t take too much. That’s fine, right? Because it’s an over-the-counter
0:11:55 medication. Unless you have chronic kidney disease stage three or four, then it’ll kill you.
0:12:02 Well, if you put the rules for ibuprofen and CKD into GPT-4 and then ask it, it’ll do great.
0:12:08 If you put in all the rules for all condition-specific over-the-counter medications and ask,
0:12:12 it’ll still do pretty good. It’ll start missing some sometimes, which is still not okay because
0:12:18 you could kill people, but fine. If you put in the patient’s medical history, the patient’s last
0:12:24 10 conversations with you, all of those rules for over-the-counter medication disallowance and
0:12:29 the current checklist for what you’re supposed to follow with that patient and maybe a few other
0:12:30 things and then ask it, good luck.
0:12:37 And what it is is we have an attention span problem. But if you have multiple models, we have these other
0:12:41 models only focused on checking one thing at a time. So there’s an overdose engine and it listens to
0:12:45 every turn of the conversation. It’s like, are we talking about drugs? Are we talking about drugs?
0:12:49 Yes, we’re talking about drugs. Okay. And then it’s like, well, okay, did somebody just say a number
0:12:56 that’s an overdose relative to their prescription or relative to max toxicity of what you can have of
0:13:00 that drug? Okay, he did. And it may not seem that hard for pills versus two pills, but when you’re
0:13:05 talking about creams and injectables, it gets quite hard. I took a whole bunch of my testosterone cream
0:13:09 and I rubbed it on my hand. Was that an overdose? I don’t know. How much cream was in your hand?
0:13:10 What’s a little bit?
0:13:14 What’s a little bit? Was it a pea size? Was it a cherry tomato size? Was it an apple size?
0:13:20 So RLM knows how to ask all these questions and knows how to navigate assessing whether it’s
0:13:25 actually an overdose. And you cannot have, if a patient shares an overdose information with a
0:13:27 care manager in a clinical setting, you need to do something.
0:13:31 Yeah. You may have said this at the beginning, so forgive me, but how many clinicians, doctors,
0:13:34 and how many patients are you working with right now?
0:13:38 A couple of different things. So first is to test and certify the product.
0:13:45 We basically ran a quasi-output testing trial. So a lot of people say, hey, tell me what you
0:13:50 trained your LLM on so I know it’s safe. I don’t know who came up with this question. Because you
0:13:56 have things like PubMed GPT that’s trained only on PubMed, a evidence-based archive, and it still
0:14:01 gives you stuff that’s not right. Or it’ll conflate two things and give you things not right.
0:14:06 So what we realized was you got to do output testing. You got to test every output. But you can’t
0:14:09 test every output of a horizontal model. Right. There’s an infinite number of
0:14:15 permutation combinations of GPT-4. But you can, it turns out, do that for a vertical model when you
0:14:17 roll it out one use case at a time. Right, right, right.
0:14:22 I’m doing a pre-op call for a colonoscopy. I’m going to make sure you took your bowel prep. I’m
0:14:28 going to make sure you’ve fasted the night before. I’m going to do all the steps. Okay, we hire a ton
0:14:33 of clinicians to act like patients, call it up when we first make the new use case, and mark every
0:14:37 year. And then we go back and keep improving the thing until we’ve done that. So we’ve now done that
0:14:46 6,000, we have 6,000 U.S. licensed clinicians who have now done 309,000 clinical test calls. Wow. And so we
0:14:51 call this output testing. Right. We’ve done more output testing than anybody. Yeah. You know, we spent,
0:14:56 you know, double-digit millions of dollars basically certifying the safety of the product,
0:15:03 not by looking at its architecture or how it was trained, but looking at what it finally does in the end.
0:15:06 Yep, yep. Yeah. And if this is going to call my mother, you know, she’s 81 years old. Like,
0:15:11 I want to know what it’s going to say in the end. I’m not comforted by, oh, I trained it on this.
0:15:15 Right, right. Yep. It still told my mom to take 10 Advil. That’s not okay. Yeah, right. Like,
0:15:21 that’s not okay. So I think these are the benefits of really having a large clinically driven thing.
0:15:26 And I mean, who knows best how to assess an AI except the clinicians. You’ve been speaking about this a
0:15:32 little bit, but what have some of the other or some of the biggest challenges been in developing
0:15:38 this inference-based system? The analogy I draw at the company is like,
0:15:45 this space is evolving so fast. We’re crossing a wood plank bridge and the planks are showing up like
0:15:50 two seconds before our foot hits the next step. And when we first started the company,
0:15:55 there was no open source and all of a sudden open source arrived. Right. You know, there was no
0:16:02 optimized inference engines and then all of a sudden those arrived. There was not a really good TTS
0:16:07 and all of a sudden a great TTS, a text-to-speech engine arrived. And so one of the things has been
0:16:12 we’ve had to redo some work. Sure. Right. Because we did it and then this thing showed up and realized
0:16:15 there was a better way to do it. So you got to redo the work. And so I think one of the challenges
0:16:20 has just been keeping up with kind of how things are evolving. Yep. I think the other one has just
0:16:26 been running counter to a lot of people’s kind of core thesis since they’re all going after this cost
0:16:30 per token, our infrastructure needs are different, but we also have a different budget. It’s made it a
0:16:36 little easier, you know, because we’re offsetting a very expensive resource per hour. Right. And so
0:16:40 that’s, that’s been that. I mean, the other stuff is all normal go-to-market stuff. Yeah. It’s new
0:16:44 technology. People want to know more about it. You know, how do you know it’s safe? That sort of thing.
0:16:47 I asked you about, and you said, this is the first question people ask about, you know,
0:16:53 how are the patients reacting? What about on the clinician side? Are doctors, other caregivers
0:16:58 excited to work with you? Are they worried about this kind of taking their role? What’s, what’s the,
0:17:04 the vibe like? I think there’s such a shortage there. Yeah. And post the pandemic, they really had,
0:17:08 yeah. Yeah. They realized like, we got to do something else. Right. Right. I mean,
0:17:14 if you ever, if you try to get a PCP here in Santa Clara County, like it’s like six months. Right. I mean,
0:17:19 you want to see a specialist. It’s pretty bad. In fact, the other day, my, uh, like, um, somebody
0:17:22 was telling me, they’re like, Oh, come fly to New York and get your test done. Cause you know,
0:17:28 you will get it done in a month. Like, I’m like, really? Like that’s our answer. And so I think we
0:17:33 have no choice. And so, and most of the, most people realize that and they realize there’s an
0:17:38 opportunity. And then when you start talking about this idea of abundance, they realize there’s a
0:17:43 whole bunch of things you can do that you never could do before. And I think that, uh, we’ve seen
0:17:50 it open arms. We signed up 25 health systems, providers, and pharma clients in basically six,
0:17:56 seven months since we took the product GA about last June. Moving fast. And in healthcare,
0:18:01 you never get that many. And we have another, we actually will sign another three this month.
0:18:06 By the end of, by, by the next June, I bet you we’ll be at like 30 to 40. That’s a number that
0:18:13 normally you’re a health tech startup at like year seven or year five. Like you’re way down
0:18:18 the line. The needs there, there it’s, there’s a lot of pain around staffing and staffing shortages.
0:18:24 Right. So what’s next for Hippocratic? What’s, uh, the rest of the year, the next couple of years,
0:18:27 whatever the timeframe is, what can you tell us about the future roadmap?
0:18:31 You know, we’re continuing to expand. I mean, so I think there’s a couple of directions. So one is
0:18:37 we’re, uh, really pushing hard into the payer space and providing them. A lot of payers have
0:18:41 large teams of what are called case managers that reach out to patients and follow up and make sure
0:18:46 they’re doing their proper, uh, treatment protocol because otherwise it ends up more expense for the
0:18:51 payers. And by payer, I mean, health insurance companies. We’re doing the same for pharma. You
0:18:54 know, they’re running all these clinical trials and they’re like, look, the AI can just call and make
0:18:58 sure every day at 4 PM, when you’re supposed to take that med, you take that med because if you don’t,
0:19:02 you mess up the trial, there’s also some interesting ways to use the technology for
0:19:08 clinical trial recruitment or kind of even qualification. Um, you could ask every patient,
0:19:12 you know, there’s a lot of soft factors in clinical trial qualification. And it’s like, oh, you know,
0:19:17 do you get a rash when you put the, uh, continuous glucose meter sensors on because we don’t want you
0:19:21 in the trial if you do because it’s a diabetes trial and then you’re going to take it off and it’s
0:19:25 not going to work. Oh, that’s not in your health record that you get a rash. Like very unlikely.
0:19:29 And so, you know, but if they, I could call up a whole bunch of people and ask and kind of figure
0:19:34 it out. So I think there’s some really interesting ideas like that. Um, we’re also expanding
0:19:40 internationally. We just did our first deal in UAE. Um, we’re about to do another set of deals in
0:19:46 Southeast Asia. Um, we’re seeing the whole, the, I actually thought it was mostly just the U S short
0:19:50 of clinical staff, maybe a little Europe. The whole world is short. And you know, with the aging
0:19:55 population in so much of the planet, you know, basically we all have no choice. And so we’re
0:20:02 seeing quite a large demand all over. One job for listeners who’d like to know more about the company,
0:20:06 about any of the aspects of what we’re talking about the website, the best place to go. Yeah.
0:20:12 Hippocratic AI.com. We have a lot about our LLM and the architecture. We published a paper on that.
0:20:17 Uh, we recently also put out a paper on our safety testing protocol. Oh, great.
0:20:21 We actually hope it becomes a kind of a way that everybody starts testing kind of mission
0:20:27 critical LLM stuff. And basically this output testing and how we did it in detail, how we
0:20:32 hired the people, how we tested the cohorts, how we compared them to human clinicians. And,
0:20:36 you know, we hope that that sets a new framework for how to do that. And then you can also just
0:20:40 read about the company and our, our history and kind of what we’ve done and our values and
0:20:45 things like that. So. Fantastic. Well, Munjal, thank you for taking the time. I was going to
0:20:49 say to tell us about Hippocratic, but you’ve been, it’s a short time, but you know, the name is out
0:20:53 there. People know. So it’s great to get an update and the work you’re doing and the approach to safety
0:20:57 and checking the output testing. I think it’s really fascinating. So appreciate you taking the
0:20:59 time to come tell us about it. Thank you for having me.
0:21:15 Thank you. Thank you.
0:21:24 Thank you.
0:21:33 Thank you.
0:21:33 Thank you.
0:21:33 Thank you.
0:21:33 Thank you.
0:21:33 Thank you.

Munjal Shah, CEO of Hippocratic AI, discusses how AI agents can dramatically expand healthcare capacity and access. With 1.8 million patient calls completed and an 8.95/10 satisfaction rating, Hippocratic’s safety-focused LLM demonstrates how optimized AI inference can handle routine patient monitoring, post-surgery check-ins, and medication management – freeing human clinicians to focus on complex care requiring physical intervention. Learn about their constellation architecture using 22 models for safety validation and how their healthcare AI agent app store enables clinicians to scale certain aspects of their expertise.

Learn more at: ⁠⁠ai-podcast.nvidia.com

The AI PodcastThe AI Podcast
Let's Evolve Together
Logo