Kaitlyn Albertoli, CEO and cofounder of Buzz Solutions, discusses how the company uses vision AI to enhance the reliability of the electric grid by quickly identifying potential issues such as broken components, encroaching vegetation, and wildlife interference from inspection data collected by drones and helicopters. This technology helps prevent outages and wildfires, ensuring the grid remains robust and safe.
Author: The AI Podcast
-
Roboflow Simplifies Computer Vision for Developers and the Enterprise – Ep. 248
AI transcript
0:00:10 [MUSIC]
0:00:13 Hello, and welcome to the NVIDIA AI podcast.
0:00:15 I’m your host, Noah Kravitz.
0:00:19 90% of the information transmitted to human brains is visual.
0:00:22 So while advances related to large language models and
0:00:26 other language processing technology have pushed the frontier of AI forward in
0:00:28 a hurry over the past few years.
0:00:32 Visual information is integral for AI to act with the physical world,
0:00:34 which is where computer vision comes in.
0:00:37 Roboflow empowers developers of all skill sets and
0:00:41 experience levels to build their own computer vision applications.
0:00:44 The company’s platform addresses the universal pain points developers face
0:00:47 when building CV models, data management to deployment.
0:00:51 Roboflow is currently used by over 16,000 organizations and
0:00:55 half the Fortune 100, totaling over 1 million developers.
0:00:59 And they’re a member of NVIDIA’s Inception program for startups.
0:01:03 Roboflow co-founder and CEO, Joseph Nelson, is with us today to talk about
0:01:08 his company’s mission to transform industries by democratizing computer vision.
0:01:09 So let’s jump into it.
0:01:13 Joseph, welcome, and thank you for joining the NVIDIA AI podcast.
0:01:14 >> Thanks so much for having me.
0:01:19 >> I’m excited to talk CV, there’s nothing against language, love language.
0:01:22 But there’s been a lot of language stuff lately, which is great.
0:01:25 But I’m excited to hear about Roboflow.
0:01:26 So let’s jump into it.
0:01:30 Maybe you can start by talking a little bit more about your mission and
0:01:34 what democratizing computer vision means and making the world programmable.
0:01:37 >> At the highest level, as you just described,
0:01:40 the vast majority of information that humans process happens to be visual
0:01:41 information.
0:01:45 In fact, I mean, humans, we had our sense of sight before we even created language.
0:01:46 It’s how we understand the world.
0:01:50 It’s how we understand the things around us, it’s how we engage with the world.
0:01:56 And because of that, I think there’s this massive untapped potential
0:01:59 to have technology and systems have visual understanding to
0:02:04 a similar way that humans do all across, really, we say the universe.
0:02:07 So when we say our North Star is to make the world programmable,
0:02:09 what we really mean is that any scene and
0:02:13 anything will have software that understands it.
0:02:14 And when you have software that understands something,
0:02:16 you can improve that system.
0:02:18 You can make it be more efficient, you can make it be more entertaining,
0:02:20 you can make it be more engaging.
0:02:23 I mean, at Roboflow, we’ve seen folks build things from understanding
0:02:27 cell populations under a microscope all the way to discovering new galaxies
0:02:28 through a telescope.
0:02:31 And everything in between is where vision and video and
0:02:33 understanding comes into play.
0:02:36 So if this AI revolution is to reach its full potential,
0:02:38 it needs to make contact with the real world.
0:02:42 And it turns out the real world is one that is very visually rich and
0:02:43 needs to be understood.
0:02:47 So we build the tools, the platform, and the community to accelerate that transition.
0:02:51 So maybe you can tell us a little bit about the platform and
0:02:55 kind of within that how your mission and North Star kind of shape the way you
0:02:57 develop products and build out user experiences.
0:03:02 And I should mention a great shout out from Jensen in the CES keynote earlier
0:03:04 this year for you guys.
0:03:06 And you raised Series B late last year.
0:03:10 So I want to congratulate you on that as well before I forget.
0:03:10 >> I appreciate it.
0:03:12 Yeah, it’s a good start.
0:03:15 I mean, as you said, a million developers, but there’s 70 million developers out there.
0:03:18 There’s billions that will benefit from having visual understanding.
0:03:21 I mean, in fact, in that Jensen shout out, just like maybe the sentence or
0:03:25 two before he described some of the visual partners that are fortunate to work
0:03:26 with the NVIDIA team.
0:03:30 He described that the way NVIDIA sees it is that global GDP is $100 trillion,
0:03:35 and he describes that visual understanding is like $50 trillion of that opportunity.
0:03:39 So basically half of all global GDP is predicated on these
0:03:43 operationally intensive, visual, centric, autonomy-based sorts of use cases.
0:03:47 And so, the level of impact that visual understanding will have in the world is
0:03:50 just a fraction of what it will look like as we progress.
0:03:53 Now, in terms of like how we think about doing that, so
0:03:56 it’s really about empowering the builders and giving the certainty and
0:03:58 capability to the enterprise.
0:04:02 So for example, anyone that’s building a system for visual understanding often
0:04:07 needs to have some form of visual input, camera, video, something like this.
0:04:08 >> Sure.
0:04:10 >> You need to have a model because the model is going to help you act,
0:04:15 understand, and react to whatever maybe actual insight you want to understand.
0:04:17 And then you want to run that model some more, you want to deploy it.
0:04:21 And commonly, you even want to chain together models or
0:04:23 have a system that triggers some alerts or
0:04:26 some results based on information that it’s understanding.
0:04:29 So Robofill provides the building blocks, the platform, and
0:04:34 the solutions so that over a million developers and half the Fortune 100
0:04:37 have what they need to deploy these tools to production.
0:04:41 And you’re doing it kind of mentioned in the intro.
0:04:45 Trying to make the platform available for folks who are deep into this,
0:04:48 have been doing CV and working with machine learning for a while.
0:04:51 And then also folks who might be new to this, they can get up and
0:04:55 running and work with CV, build that into their toolkit.
0:05:00 >> Yeah, the emphasis has always been kind of on someone that wants to be a builder.
0:05:04 That definition is expanding with the capabilities of code generation,
0:05:05 prompting to apps.
0:05:08 We’ve always kind of been bent on this idea that those that
0:05:11 want to create, use, and distribute software.
0:05:14 What’s funny is that when we very first launched some of our first products,
0:05:16 ML teams initially were kind of like, I don’t know,
0:05:18 this seems pretty pedestrian, I know exactly what to do.
0:05:21 And fast forward now, and it’s like, whoa, a platform that’s fully
0:05:25 featured that has immediate access to the latest models to use on my data in
0:05:28 the contexts of where I can even anticipate.
0:05:31 So it’s been kind of, as the platforms become more feature rich,
0:05:35 we’ve been able to certainly enable a broader swath of both maybe
0:05:37 domain experts of a capability.
0:05:41 But I think broadly speaking, the places that we see the rapid,
0:05:44 most impactful adoption in some ways is actually bringing vision to others that
0:05:46 otherwise may not have had it.
0:05:51 Like what used to be maybe like a multi-quarter PhD thesis level investment
0:05:54 now can be something that a team spends up in an afternoon.
0:05:58 And that really has this kind of demand-begets-demand paradigm.
0:06:01 I mean, for example, one of our customers, they produce electric vehicles.
0:06:05 And when you produce an EV inside their general assembly facility,
0:06:08 there’s all sorts of things that you need to make sure you do correctly as you
0:06:12 produce that vehicle, from the worker safety who are doing the work to
0:06:16 the machines that are outputting, say, when you do what’s called stamping,
0:06:19 where you take a piece of steel or aluminum and you press it into the shape
0:06:23 of the outline of the vehicle, and you get up potential tears or fissures or
0:06:26 when you actually assemble the batteries out of the correct number of screws.
0:06:30 Basically, every part of building a car is about visually
0:06:33 validating that the thing has been built correctly so that when a customer
0:06:36 drives it, they can do so without any cause for pause.
0:06:40 And just a little bit of computer vision goes a really long way in enabling
0:06:44 that company and many others to very quickly accelerate their goals.
0:06:47 In fact, this company had the goal of producing 1,000 vehicles three years ago,
0:06:50 and they barely did it at 1,012 in that year.
0:06:53 And then they scaled up to 25,000 and now 50,000.
0:06:56 And a lot of that is on the backs of having things that they know they’re
0:06:57 building correctly.
0:07:01 And so we’re really fortunate to be a part of enabling things like this.
0:07:04 So it’s kind of like you could think about it of a system that doesn’t have
0:07:07 a sense of visual understanding, adding even just a little bit of visual
0:07:11 context totally rewrites the way by which you manufacture a car.
0:07:14 And that same revolution is going to take place for
0:07:16 lots of operationally intensive processes, but
0:07:19 any kind of place that you interact with the world each day.
0:07:23 >> So kind of along those lines, what to use the phrase untapped opportunities
0:07:27 you see out there, what’s the low hanging fruit or maybe the high hanging fruit,
0:07:30 but you’re just excited about when it comes to developing and
0:07:33 deploying computer vision applications.
0:07:37 And we’ve been talking about it, but obviously talk about robo flows.
0:07:40 Work not just supporting, but helping developers kind of unlike helping
0:07:42 builders unlock what’s next.
0:07:46 >> The amazing thing is actually the expanse of the creativity of developers
0:07:49 and engineers, it’s like if you give someone a new capability,
0:07:53 you almost can’t anticipate all the ways by which they’ll bring that capability
0:07:53 to bear.
0:07:57 So for example, I mean, we have folks that hobbyists that’ll make things,
0:08:00 like the number of folks that make things that like measure the size of fish.
0:08:03 Cuz I think they’re trying to prove to their friends that they caught the biggest
0:08:06 fish, and then you separately have like government agencies that have wanted to
0:08:09 validate the size of salmon during migration patterns.
0:08:12 And so this primitive of like understanding size of fish both has
0:08:14 what seems to be fun and very serious implications.
0:08:17 Or folks that I don’t know, like a friend of mine recently was like,
0:08:21 hey, I wonder how many cars out in San Francisco are actually Waymo’s,
0:08:22 versus like other sorts of cars.
0:08:23 And what does that look like?
0:08:24 What does that track over time?
0:08:28 And so they had a pretty simple like Raspberry Pi camera parked down their
0:08:31 windowsill and in an afternoon, now they have a thing that’s counting,
0:08:34 tracking, and keeping a tabulation on how many self-driving cars are making
0:08:37 their way on the road, at least sampled in front of their house each day.
0:08:40 >> Right, no, I don’t wanna call anybody out, but that’s not the same person
0:08:45 who had the video of all the Waymo’s in the parking lot in the middle of the
0:08:46 night in San Francisco in circles.
0:08:47 No, okay.
0:08:49 >> It wasn’t that guy.
0:08:53 >> Yeah, but I mean, like the use cases are expansive because I don’t know,
0:08:57 like the way we got into this, right, is like we were making AR apps, actually.
0:09:00 And computer vision was critical to the augmented reality,
0:09:02 understanding the system around us.
0:09:04 And we’ve since had folks that make, you know,
0:09:08 board game understanding, technology, D&D dice counters,
0:09:10 telling you the best first movie you should play in Catan.
0:09:12 And so basically, like you have this like creative population of folks.
0:09:15 Or this one guy during the pandemic, you know, he’s really bored, locked inside.
0:09:19 And he thought maybe his cat needed to get some more exercise.
0:09:22 And he created this system that like attached a laser pointer to a robotic arm.
0:09:26 And with a little bit of vision, he made it so the robotic arm consistently points
0:09:28 the laser pointer 10 feet away from the cat.
0:09:31 That’s like jumping around the living room and makes this whole YouTube tutorial.
0:09:33 But then like the thing that’s really interesting, right, is that like,
0:09:37 you know what technology has arrived when like a hacker can just like build
0:09:40 something in a single setting or maybe like in a weekend.
0:09:46 You know that what used to be this far difficult to access capability is now
0:09:50 broadly accessible. And that fuels like a lot of like the similar sort of enterprise
0:09:53 use cases. Like we got to have a joke at Rebelflow that like one person’s
0:09:56 hobbyist projects is another person’s entire business.
0:09:59 So the low hanging fruit, frankly, is like everywhere around us.
0:10:02 Like any sort of visual feed is untapped.
0:10:05 This sort of images that someone might be collection and gathering.
0:10:07 I mean, the similar things that give rise to machine learning certainly apply
0:10:11 to vision where the amount of visual inputs doubling year on year and petabytes
0:10:14 of visual information to be extracted.
0:10:16 So it’s kind of like, if you think about it, you can do it.
0:10:21 That makes me think of an episode we did recently with a surgeon who founded
0:10:26 a surgical data collective and they were using just these, I say stacks.
0:10:28 They weren’t actually videotapes, I’m sure.
0:10:33 But all of this unwatched unused footage from surgeries to train a model
0:10:36 to help train surgeons how to do their jobs better.
0:10:40 But if you want to ask, are the inputs that could, the visual inputs that can go
0:10:43 into RoboFlow, it doesn’t have to be a live camera stream.
0:10:46 You can also use, you know, archive footage images.
0:10:47 That’s correct.
0:10:47 Yep.
0:10:48 Yep.
0:10:51 So someone may have like a backlog of a bunch of videos.
0:10:54 I mean, for example, we actually had a professional baseball team where
0:10:58 they had a history of a bunch of their videos of pitching sessions and they
0:11:03 wanted to run through and do a key point model to identify various poses and time
0:11:06 of release of the pitch and how does that impact someone’s performance over time.
0:11:09 And so they had all these videos that from the past that they wanted to run through.
0:11:12 And then pretty soon they started to do this for like minor leagues where you
0:11:15 might not have scouts, omnipresent, and you certainly don’t have broadcasts.
0:11:16 Right, right, right.
0:11:19 You just have like this like kind of low quality footage on like cameras from
0:11:22 various places and being able to produce like sports analytics out of, you know,
0:11:26 this information that’s just locked up otherwise and this unstructured visual
0:11:30 capture is now available for, in this case, building a better baseball team.
0:11:31 Yeah, that’s amazing.
0:11:36 Building community, you know, is something that is both vital to a lot
0:11:39 of companies like tech companies and developer platforms and such.
0:11:43 But it also can be a really hard thing to do, especially to build, you know,
0:11:48 an organic, genuine, robust community serving enterprise clients, let alone
0:11:54 across this seemingly endless sort of swath of industries and use cases and such.
0:11:57 You know, also pretty resource intensive.
0:11:58 So how do you balance both?
0:12:02 How is Roboflow approaching, you know, building that community that you’re
0:12:08 talking about just now with serving, you know, these I’m sure demanding in a good
0:12:10 way, but, you know, demanding enterprise clients.
0:12:13 I think the two actually go hand in hand more than many would anticipate.
0:12:14 Okay.
0:12:18 When you build a community and you build a large set of people that are
0:12:23 interested in creating and using a given platform, you actually give a company
0:12:26 leverage basically like the number of people that are building, creating and
0:12:30 sharing examples of Roboflow from a very early day made us seem probably much
0:12:33 bigger than maybe we were or are.
0:12:36 And so that gives a lot of trust to enterprises.
0:12:40 Like, you know, you want to use something that has gone through its paces and
0:12:43 battle tested something that might be like an industry standard.
0:12:47 And you don’t become an industry standard by only limiting your technology to a
0:12:49 very small swath of people.
0:12:53 You enable anyone to kind of build, learn the paradigm and create.
0:12:58 Now, you’re right that both take a different type of thoughtfulness to be
0:12:59 able to execute on.
0:13:03 So in the context of like community building and making products for developers.
0:13:06 A lot of that I think stems from, you know, as an engineer, there’s
0:13:09 products that I like using in the ways that I like to use those products.
0:13:12 And I want to enable others to be able to have a similar experience of the
0:13:12 products that we make.
0:13:16 So it’s, it’s things like providing value before asking for value, having a very
0:13:19 generous free chair, having the ability to highlight the top use cases.
0:13:22 I mean, we have a whole like research plan where if someone’s doing stuff on a
0:13:26 .edu domain, then they have increased access to GPUs.
0:13:30 Roboflow has actually given away over a million dollars of compute and GPU usage
0:13:33 for open source computer vision projects.
0:13:37 And, you know, we actually have this, it’s kind of a funny stat, but 2.1
0:13:40 research papers are published every day, citing Roboflow.
0:13:43 And those are things like people are doing all these sorts of things.
0:13:44 That’s a super cool stat, I think.
0:13:48 Yeah, I mean, it just gives it, it just gives you the context to like, yeah,
0:13:52 like that is someone’s maybe six or 12 month thesis that they’ve spent trying
0:13:57 to inject and to be able to empower folks to realize what’s possible.
0:14:01 And it’s really kind of the fulfillment of like our mission at its core, like
0:14:05 the impact of visual understanding is bigger than anyone company and anything
0:14:09 that we can do to allow the world to see, expose and deploy that is important.
0:14:12 Now, on the enterprise side, what we were really talking about is building
0:14:15 a successful kind of go to market motion and making money to invest
0:14:19 further in our mission and enterprises, as you alluded, are very resource
0:14:23 intensive in terms of being able to service those needs successfully.
0:14:26 Even there, though, you actually get leveraged by seeing the sorts of
0:14:30 problems, seeing the sorts of fundamental building blocks and then productizing
0:14:33 those building blocks, you know, there have been companies that have come
0:14:37 before Roboflow who have done a great job of be very hands on with enterprise
0:14:40 customers and productizing those capabilities, company like Pivotal or
0:14:45 Palantir or these large companies that have gone from, hey, let’s kind of do
0:14:48 like a bespoke way of making something possible and deploy it more broadly.
0:14:52 Now, we’re not fully, you know, like for like with those businesses.
0:14:56 I more give that as an example to show as someone that is building tooling
0:15:00 and capabilities, worst case is you’re giving the enterprise substantially
0:15:03 more leverage and certainly best case is there’s actually a symbiotic
0:15:06 relationship between enterprises being able to discover how to use the
0:15:10 technology, be able to find guides from the community, be able to find models
0:15:11 they want to start from.
0:15:15 I mean, Roboflow Universe, which is the open source collection of data sets
0:15:18 and models, is the largest collection of computer vision projects on the web.
0:15:22 There’s about 500 million user labeled and shared images and over 200,000
0:15:23 pre-trained models.
0:15:26 And that’s used for the community just as much as enterprise, right?
0:15:28 Like when you say enterprise, like enterprise is people.
0:15:31 And so there’s people inside those companies that are creating and building
0:15:32 some of those capabilities.
0:15:35 Now, operationalizing and ensuring that we deliver the service quality, it’s
0:15:39 just the types of teams you build and the way that you prioritize companies
0:15:39 to be successful.
0:15:42 But we’re really fortunate that, you know, we’re not writing the playbook here.
0:15:47 There’s been a lot of companies that, you know, Mongo or Elastic or Twilio or
0:15:51 lots of post IPO businesses that have shown the pathway to both building
0:15:55 really high quality products that developers and builders love to use
0:15:59 and ensuring that they’re enterprise ready and meeting the needs of high
0:16:02 scale, high complexity, high value use cases.
0:16:04 So you use the word complexities.
0:16:09 And, you know, one of the things that I hear all the time from, and I’m
0:16:12 sure you more than me from people are trying to build anything is sort of how
0:16:17 do you balance, you know, creativity and coming up with ways to solve problems.
0:16:20 And particularly if you get into kind of a unique situation and you need to find
0:16:24 a creative answer with things, with not letting things get too complex.
0:16:27 And, you know, in something like computer vision, I’m sure the technical
0:16:30 complexities can, can spin up in a hurry.
0:16:33 What’s been, you know, your approach and what success is?
0:16:35 How have you found success and balancing that?
0:16:40 Complexity for a product or global flow is always a balance.
0:16:44 You want to offer users the capability in the advanced settings and the ability
0:16:46 to make things their own.
0:16:50 Well, also part of the core value proposition is simplifying.
0:16:53 And so you often think, oh man, those two things must be at odds.
0:16:55 How do you simplify something, but also serve complexity?
0:17:00 And in fact, they’re not, especially for products like Roboflow, where it is
0:17:03 four builders, you make it very easy to extend.
0:17:07 You make it very interoperable, meaning there’s open APIs and open SDKs, where
0:17:11 if there’s a certain part of the tool chain that you want to integrate with or
0:17:13 there’s a certain specific enterprise system where you want to read or write
0:17:16 data to, that’s all supported on day one.
0:17:19 And so if you try to kind of boil the ocean of being everything in the
0:17:24 platform all at once on day one, then you can find yourself in a spot where you
0:17:28 may not be able to service the needs of your customers well.
0:17:30 In fact, it’s a bit more step by step.
0:17:33 And that’s where, you know, the devil’s in the details of execution of which
0:17:36 steps you pick first, which sort of problems you best nail for your customers.
0:17:39 But philosophically, it’s really important to us that, for example, when
0:17:43 someone is building a workflow, which is, you know, the combination of a model
0:17:47 and some inputs and outputs, you might ingest maybe like an RTSP stream from
0:17:49 like a live feed of a camera.
0:17:51 Then you might have like a first model that’s, once you’re the problem that
0:17:57 we’re solving is we’re an inventory company and we’re concerned about worker safety.
0:18:00 You might have a first model that’s just like constantly watching all frames to
0:18:03 see if there’s a presence of a person, a very lightweight model kind of run in
0:18:04 the edge.
0:18:07 And then maybe a second model of when there’s a person, you ask a large vision
0:18:10 language model, large VLM, Hey, is there any risks here?
0:18:11 Is there anything to consider?
0:18:13 Should we like look more closely at this?
0:18:17 And then after the VLM, you might have a another specific model that’s going to
0:18:22 do validation of the type of danger that is interesting, or maybe the specific
0:18:25 area within your, your store, maybe you’re going to connect to another
0:18:26 database that exists.
0:18:29 And then like, based on that, you’re going to write some results somewhere.
0:18:32 And maybe you’re going to write that result to have insights of how frequently
0:18:37 there was a cause for concern within the process that you’re monitoring, just as
0:18:40 much as maybe you’re flagging an alert and maybe send in a text or an email or
0:18:43 writing to an enterprise system like SAP to keep track.
0:18:50 And at each step of that juncture, any one of those nodes, since it’s built for
0:18:54 us on an open source platform, which we call inference, you can actually mix
0:18:58 and match, write your own custom Python, write an API in way or one way or another.
0:19:02 And so let’s imagine like a future where someone wanted the ability to write
0:19:04 to a system that we didn’t support yet, like first party.
0:19:05 You’re actually not out of luck.
0:19:09 As long as that system accepts a post request, you’re fine.
0:19:11 And so you have the ability to extend the system.
0:19:15 Yeah. And so it’s this sort of paradigm of like interoperability and making
0:19:17 it easy to use with alongside other tools.
0:19:20 And it gets back to your point around servicing builders, just as much as the
0:19:25 enterprise, I actually think those things are really closely interlinked because
0:19:29 you provide the flexibility and choice and ability to make something mine and
0:19:33 build a competency inside the company of what it is I wanted to create and deploy.
0:19:37 Right. The way you frame that makes a lot of sense and makes that link very clear.
0:19:40 We’re speaking with Joseph Nelson.
0:19:43 Joseph is the co-founder and CEO of Roboflow.
0:19:48 And as he’s been detailing Roboflow provides a platform for builders to use
0:19:50 computer vision in what they’re building.
0:19:55 Joseph, you know, I in the intro sort of alluded to all the all the advances
0:19:58 and buzz around large language models and that kind of thing over the past couple
0:20:02 of years. And I meant to ask Roboflow was founded in 2020.
0:20:05 Roboflow Inc. was incorporated in 2020.
0:20:06 That’s right. Got it.
0:20:09 And so anyway, kind of fast forwarding to, you know, more recently, the past, I
0:20:15 don’t know, six months, year, whatever it’s been, a lot of buzz around agents,
0:20:17 the idea of agentic AI.
0:20:21 And then, you know, there was buzz, I guess, that the word multimodal was being
0:20:26 flung around kind of more frequently, at least in circles I run in for a while.
0:20:29 And then it sort of dropped off just as people, you know, there were the
0:20:33 consumer models that the Clause and ChatGPTs and Geminis and what have you
0:20:37 in the world, just started incorporating visual capabilities, both to, you know,
0:20:43 ingest and understand and then to create visual output, voice models, you know,
0:20:45 now getting into short video clips, all that kind of stuff.
0:20:51 What’s your take on the role of multimodal AI integration when it comes to advancing
0:20:55 CV, you know, how is Roboflow kind of positioned to support this?
0:21:02 So multimodality allows an AI system to have even more context than from a
0:21:03 single modality, right?
0:21:07 So if one of our customers is monitoring an industrial process, and let’s say
0:21:12 they’re looking for potentially a leak, maybe in an oil and gas facility, that
0:21:16 leak can manifest itself as, yes, you see something, a product that’s dripping
0:21:17 out and you didn’t expect it to.
0:21:23 It also can manifest itself as you heard a noise or maybe there’s something
0:21:27 about the time dimension of the video that you’re watching as another modality
0:21:28 beyond just the individual images.
0:21:33 Right. And those additional dimensions of data allow the system that you’re
0:21:35 building to have more intelligence.
0:21:39 And so that’s why you see kind of like all these modalities crashing together.
0:21:42 And what it does is it enables our customers to have even more context.
0:21:47 The way we’ve thought about that is we’ve actually been built on and using
0:21:50 multimodality as early as 2021.
0:21:54 So in 2021, there was a model that came out from open AI called the clip,
0:21:57 contrastive language image per training, which introduced this idea of training
0:21:59 on 400 million image text pairs.
0:22:03 Can we just associate some words of text with some images?
0:22:06 What this really unlocked for our customers was the ability to do semantic
0:22:10 search, like I could just describe a concept and then I can get back the images
0:22:11 from a video frame or from a given image.
0:22:15 Now it’d be interesting for me for the purposes of building out my model.
0:22:20 Increasingly, we’ve been excited by increases of models that have more
0:22:22 multimodal capabilities on day one.
0:22:26 That comes with its own form of challenges, though, the data preparation,
0:22:30 the evaluation systems, the incorporation of those systems into the other parts
0:22:32 of the pipeline that you’re building.
0:22:36 And so where there’s opportunity to have even more intelligence, there’s also
0:22:41 challenge to incorporating that intelligence, adapting it to your context,
0:22:42 passing it to other sorts of systems.
0:22:47 And so Roboflow and being deep believers in multimodal capabilities very early
0:22:53 on have continued to make it so that users can capture, use and process other
0:22:54 modalities of data.
0:22:58 So for example, we support the ability for folks to use vision language models,
0:23:01 BLMs, in the context of problems they’re working, which is typically like
0:23:02 an image text pair.
0:23:08 So if you’re using, you know, Quen VL 2.5, which came out last week, or
0:23:11 Florence 2 for Microsoft, which came out maybe about six months ago, or PolyGemma
0:23:16 2 from Google, these are all multimodal models that have very rich text
0:23:20 understandings and have visual understandings, which makes them very good at,
0:23:22 for example, document understanding.
0:23:26 Like if you just pass a document, there’s both text in the document and a
0:23:27 position in the document.
0:23:30 And so Roboflow is one of the only places, maybe the only place where you can
0:23:35 fine tune and adapt, say, Quen VL today, which means preparing the data and running
0:23:37 it in the context of the rest of your systems.
0:23:40 And those sorts of capabilities, I think, should only increase and enable our
0:23:44 customers to get more context more quickly from the types of problems that
0:23:44 they’re solving.
0:23:44 Right.
0:23:48 So I think a lot of these things kind of like are crashing together into just
0:23:52 like AI, like amorphous AI that like has all these capabilities, like you’d expect
0:23:57 it, but as that happens, what’s important is there’s actually still unique parts
0:24:00 of visual needs, right?
0:24:03 Like visual needs require visual tooling in our opinion.
0:24:05 Like you want to see, you want to validate.
0:24:09 You need to do, you know, the famous adage of a picture being worth a thousand
0:24:11 words is extremely instructive here.
0:24:14 Like you almost can’t anticipate all the ways that the world’s going to look
0:24:17 different than how you drew it up.
0:24:21 Like self-driving cars are kind of this example one-on-one where, yeah, you think
0:24:24 you can drive, like you have a very simple way of describing what the world’s
0:24:25 going to look like, but I don’t know.
0:24:29 Like let’s take a very narrow part of a self-driving car, stop signs, right?
0:24:31 So go stop signs look universal.
0:24:32 They’re always octagons.
0:24:35 They’re red and they’re really well-mounted on the right side of streets.
0:24:39 Well, what about a school bus where the stop sign kind of flips off where it
0:24:43 comes on or what about like a gate of where like the stop signs mounted on a
0:24:44 gate and the gate could open and close?
0:24:48 And pretty soon you’re like, wait a second, there’s a lot of cases where a
0:24:50 stop sign isn’t really just a stop sign.
0:24:55 And seeing those cases and triaging and debugging and validating, we think
0:24:59 inherently calls for some specific needs for processing the visual information.
0:25:04 And so we’re laser focused on enabling our customers to benefit from as many
0:25:08 modalities as help them solve their problem while ensuring the visual
0:25:11 dimension in particular is best capitalized on.
0:25:12 Right.
0:25:17 And does that, and I may be showing the limits of my technical understanding here.
0:25:19 So, you know, have added if so.
0:25:25 But does that exist as, you know, RoboFlow creating these, you know, sort of, as
0:25:30 you said, amorphous AI all crash together models that have this focus and these
0:25:31 sort of advanced visual capabilities?
0:25:38 Or is it more of a like chaining a RoboFlow specific model, you know, onto other models?
0:25:43 Commonly you’re in a position where you’re chaining things together or you
0:25:46 wanted things to work in your context or you wanted to work in a compute
0:25:47 constrained environment.
0:25:51 Okay, so visions, visions pretty unique in that unlike language and a lot of
0:25:55 other places where AI exists, actually vision is almost where humans are not.
0:25:58 Basically, like you want to observe parts of the world where a person is present.
0:26:02 Like if you return to our example of like an oil and gas facility where
0:26:06 you’re monitoring pipelines, I mean, there’s tens of thousands of miles of pipeline
0:26:09 and you’re certainly not going to have a person stationed every hundred yards
0:26:11 along it’s just an S9 idea.
0:26:15 And so instead you could have a video theater visual understanding of maybe key
0:26:20 points where you’re most likely to have pressure changes and to monitor those
0:26:23 key points, you know, that you’re not necessarily in an internet connected
0:26:27 environment, you’re in an operationally intensive environment that even if you
0:26:30 did have internet and might not make sense to stream the video to the cloud.
0:26:33 So basically where you get to is you’re probably running something at the edge
0:26:36 because it makes sense to co-locate your compute and that’s where like a lot of
0:26:39 our customers, for example, using video Jetsons, they’re very excited about the
0:26:44 digits that was announced at CES to make it so that you can bring these highly
0:26:48 capable models to co-locate alongside where their problem kind of exists.
0:26:50 Now, why does that matter?
0:26:54 That matters because you can’t always have the largest, most general model
0:26:56 running in those environments at real time.
0:27:00 I think this is part of, you know, a statement of like the way the world
0:27:04 looks today versus how we’ll look at 24, 36 and 48 months.
0:27:07 But I do think that over time, even as model capabilities advance and you can
0:27:10 get more and more distilled at the edge, there’s I think always going to be
0:27:14 somewhat of a lag between if I’m operating in an environment where I’m
0:27:17 fully compute unbounded or these comparatively unbounded in the cloud
0:27:20 versus an environment where I am a bit more compute bounded.
0:27:25 And so that capability gap requires specialization and capability to work best
0:27:27 for that domain context problem.
0:27:31 So a lot of Roboflow users and a lot of customers and a lot of deployments tend
0:27:34 to be in environments like those, not all, but certainly some.
0:27:39 All right, shift gears here for a moment before we wrap up.
0:27:41 Joseph, you’re a multi-time founder, correct?
0:27:45 Yeah, maybe to kind of set this up, you can just kind of run through a little
0:27:48 bit your experience as an entrepreneur.
0:27:49 What was the first company you founded?
0:27:53 Well, the very first company was a T-shirt business in high school.
0:27:56 Nice. I don’t know that it was founded, there’s never an LLC.
0:27:58 I don’t even know my parents knew about it.
0:28:02 But there is that in a university.
0:28:08 I ran a satirical newspaper and sold ads on the ad space for it and date myself here.
0:28:11 But Uber was just rolling out to campuses at that time.
0:28:14 So I had my Uber referral code and had like free Ubers for a year for like all
0:28:16 the number of folks that discovered it.
0:28:20 I kind of joked my first company that maybe the closest thing to a real business
0:28:24 beyond these side projects was a business that I started my last year of university
0:28:27 and ran for three years before a larger company acquired it.
0:28:29 And I went to school in Washington, D.C.
0:28:34 I had interned on Capitol Hill once upon a time and I was working at Facebook
0:28:38 my last year of university and was brought back to Capitol Hill and realized
0:28:41 that like a lot of the technical problem or a lot of the problems,
0:28:44 operational problems that could be solved with technology still existed.
0:28:48 One of those is Congress gets 80 million messages a year and interns sort through
0:28:51 that mail. And this was, you know, 2015.
0:28:55 So we said, Hey, what if we use natural language processing to accelerate
0:28:57 the rate at which Congress hears from its constituents?
0:29:01 And in doing so, we improve the world’s most powerful democracies,
0:29:02 customer success center.
0:29:07 And so that grew into business that I ran for about three years and we had a tight
0:29:10 integration with another product that was a CRM for these congressional offices
0:29:13 and that company called Fireside 21 acquired the business and rolled it out
0:29:15 to all of their their customers.
0:29:19 That was a bootstrap company, you know, as nine employees at PE can relatively
0:29:23 mission driven thing that we wanted to build and solve a problem that we knew
0:29:26 should be solved, which is improving the efficacy of Congress.
0:29:28 How big is Roboflow?
0:29:28 How many employees?
0:29:31 Well, I tell the team, whenever I answer that question, I start with,
0:29:33 we’ve helped a million developers so far.
0:29:37 So that’s how that’s how big we are team wise, team wise.
0:29:40 Team doesn’t necessarily mean, you know, come in any, any number of things.
0:29:43 Yeah, yeah, we’re growing quickly.
0:29:43 Excellent.
0:29:49 As we’re recording this and this one’s going to get out before GTC 2025 coming
0:29:52 up in mid-March down in San Jose, as always.
0:29:54 And Joseph Roboflow is going to be there.
0:29:55 Yeah, we’ll be there.
0:29:56 I mean, GTC has become the Super Bowl of AI.
0:29:57 Right.
0:30:01 Any hints, any teasers you can give of what you’ll be showing off?
0:30:05 We have a few announcements of some things that we’ll be releasing.
0:30:08 I can give listeners a sneak peek to a couple of them.
0:30:13 One thing that we’ve been working pretty heavily on is the ability to chain models
0:30:16 together, understand their outputs, connect to other systems.
0:30:21 And from following our customers, it turns out what we kind of built is a system
0:30:26 for building visual agents and increasingly as there’s a strong drive around
0:30:29 agentic systems, which is, you know, more than just a model.
0:30:33 It’s also memory and action and tool use and loops.
0:30:38 Users can now create and build and deploy visual agents to monitor a camera feed
0:30:42 or process a bunch of images or make sense of any sort of visual input in a
0:30:45 very streamlined, straightforward way using our open source tooling in a
0:30:46 loginless way.
0:30:50 And so that’s one area that we’re excited to show more about soon.
0:30:55 In partnership with NVIDIA and the inception program, we’re actually
0:30:59 releasing a couple of new advancements in the research field.
0:31:03 So without giving exactly what those are, I’ll give you some parameters of what
0:31:09 to expect at CBPR in 2023, Robofo released something called RF 100, which
0:31:12 the premise is for computer vision to realize its full potential, the models
0:31:15 need to be able to understand novel environments.
0:31:15 Right.
0:31:18 So if you think about a scene, you think about maybe people on a restaurant
0:31:21 or you think about like a given football game or something like this.
0:31:22 Yeah, yeah.
0:31:24 But the world is much bigger than just where people are.
0:31:25 Like you have like documents to understand.
0:31:27 You have aerial images.
0:31:28 You have things under microscopes.
0:31:30 You have agricultural problems.
0:31:31 You have galaxies.
0:31:36 You have digital environments and RF 100, which we released is sampling
0:31:40 from the Robofo universe, a basket of a hundred data sets that allows
0:31:45 researchers to benchmark how well does my model do in novel contexts.
0:31:47 And so we really sat in 23.
0:31:52 And since then labs like Facebook, Apple, Baidu, Microsoft, NVIDIA, Omniverse
0:31:55 team have benchmarked on what is possible.
0:31:59 Now, universe has, the Robofo universe has grown precipitously since then as
0:32:02 I have the types of challenges that people are trying to solve with computer
0:32:07 vision. And so we’re ready to show what the next evolution of advancing
0:32:11 visual understanding and benchmarking understanding might look like at GTC.
0:32:16 And then a second thing we’ve been thinking a lot about is the advent of
0:32:21 transformers and the ability for models to have really rich pre trainings
0:32:24 allows you to kind of start at the end, so to speak, with a model and its
0:32:31 understanding, but that hasn’t fully made its way as impactful as it can to vision,
0:32:35 meaning like how can you use a lot of the pre trained capabilities and especially
0:32:37 to vision models running on the edge.
0:32:41 And so we’ve been pretty excited about how do you marry the benefits of
0:32:45 pre trained models, which allow you to generalize better with the benefits of
0:32:46 running things real time.
0:32:51 And so actually this is where NVIDIA and Robofo have been able to pair up
0:32:55 pretty closely on something that we’ll introduce and I’ll leave it at that for
0:32:57 folks to see and do an interesting to learn more.
0:33:00 All right, I’m signed up.
0:33:01 I’m interested, can’t wait.
0:33:05 So you’ve done this a few times and you know, one way or another, I’m sure
0:33:09 you’ll do it again going forward and you know, scaled up and all that good stuff.
0:33:14 Lessons learned advice you can share, you know, for founders, for people out
0:33:18 there thinking about and you know, whether it’s CV related or not.
0:33:19 What does it take?
0:33:22 What goes into being, you know, being a good leader, building a business,
0:33:27 taking an idea, seeing it through to a product that, you know, serves humans
0:33:29 as well as solving a problem.
0:33:32 What wisdom can you drop here on listeners thinking about their own
0:33:34 entrepreneurial pursuits?
0:33:36 One thing that I’ll know is you said you’ll do it again.
0:33:40 I’m actually very vocal about the fact that Robofo is the last company
0:33:44 that I’ll ever need to start like the lifetime’s worth of work by itself.
0:33:48 As soon as I said it, I was like, I don’t know that.
0:33:49 He doesn’t know that.
0:33:50 And what if that comes off?
0:33:52 Like Roboflow is not going to, I was thinking about, oh, your last
0:33:55 company got acquired and so on and so forth, but that’s great.
0:33:59 I mean, that’s like in and of itself, you know, I suppose that could be
0:34:03 turned into something of a motto for aspiring entrepreneurs or what have you.
0:34:07 But that’s instructive actually for your question because I think a lot of people,
0:34:11 you know, you should think about the mission and the challenge that you’re,
0:34:13 you know, people say commonly like, oh, you’re marrying yourself to you
0:34:17 for 10 years, but I think even that is perhaps too short of a time horizon.
0:34:21 It’s what is something that you like a promise face that you can work on
0:34:25 excitedly in the world is different as a result of your efforts.
0:34:28 I will also note that, you know, what does it take?
0:34:29 How does it figure it out?
0:34:30 I’m still figuring it out myself.
0:34:32 There’s like new stuff to learn every single day.
0:34:37 And I can’t wait for like every two years when I look back and just sort
0:34:40 of cringe at the ways that I did things at that point in time.
0:34:44 But I think that, you know, the attributes that allow people to do well in startups,
0:34:49 whether they’re working in one, starting one, interacting with one is a deep sense
0:34:55 of grit and diligence and passion for the thing that you’re you’re working on.
0:35:00 Like the world doesn’t change by itself and it’s also quite malleable place.
0:35:05 And so having the wherewithal and the aptitude and the excitement and vigor
0:35:12 to shape the world the way by which one thinks is possible requires a lot of drive
0:35:16 and determination. And so, you know, it’s work with people, work in environments,
0:35:22 work on problems where if you have that problem changed with that team and the
0:35:27 result that that company that you’re working with continues to be realized.
0:35:28 What does that world look like?
0:35:29 Does that excite you?
0:35:33 And does it give you the ability to say independently, I would want to day in
0:35:37 and day out, give it my best to ensure and realize the full potential here.
0:35:41 And when you start to think about your time that way of something that is a
0:35:46 mission and important and time that you want to enjoy with the team, with the
0:35:50 customers, with the problems to be solved, the journey becomes the destination
0:35:51 in a lot of ways.
0:35:53 And so that allows you to play infinite games.
0:35:57 It allows you to just be really focused on the key things that matter and
0:36:01 delivering customer value and making products people love to use.
0:36:03 And so I think that’s fairly universal.
0:36:06 Now, in terms of specific advice, one things or another, there’s a funny
0:36:11 paradox of like advice needs to be adjusted to the prior of one situation.
0:36:14 It’s almost like the more universally useful the piece of advice is perhaps
0:36:17 like the less novel and insightful it might be.
0:36:17 Right.
0:36:21 Here I’ll note that I pretty regularly learn from those that are a few stages
0:36:23 ahead of me and aim to pay that favor forward.
0:36:27 So I’m always happy to be a resource for folks that are building or navigating
0:36:30 career decisions or thinking about what to work on and build next.
0:36:32 So I’m pretty findable online and welcome that from listeners.
0:36:33 Fantastic.
0:36:38 So let’s just go with that segue then for folks listening who want to learn
0:36:43 more about Roboflow, want to try Roboflow, want to hit you up for advice
0:36:45 on working at or with a startup.
0:36:47 Where should they go online?
0:36:50 Company sites, social medias, where can listeners go to learn more?
0:36:54 Roboflow.com is where you can sign up from the build of the platform.
0:36:55 We have a careers page.
0:36:59 If you’re generally interested in startups, work@astardup.com is YC’s
0:37:02 job support and we’ve hired a lot of folks from there.
0:37:03 So that’s a great resource.
0:37:10 I’m accessible online on Twitter of our ex @JosephofIowa and regularly share
0:37:11 a bit about what we’re working on.
0:37:13 And I’m very happy to be a resource.
0:37:16 If you’re in San Francisco and you’re listening to this, you might be surprised
0:37:18 that sometimes I’ll randomly tweet out when we’re welcoming folks to come
0:37:21 co-work out of our office on some Saturdays and Sundays.
0:37:23 So feel free to reach out.
0:37:24 Excellent.
0:37:25 Just Nelson, Roboflow.
0:37:27 This is a great conversation.
0:37:29 Thank you so much for taking the time.
0:37:33 And, you know, as you well articulated, the work that you and your teams are
0:37:39 doing is not only fascinating, but it applies to so much of what we do on
0:37:40 the earth, right, and beyond the earth.
0:37:45 So all the best of luck in everything that you and your growing community are doing.
0:37:46 Really appreciate it.
0:37:48 [MUSIC PLAYING]
0:37:50 [MUSIC PLAYING]
0:37:52 [MUSIC PLAYING]
0:37:54 [MUSIC PLAYING]
0:37:56 [MUSIC PLAYING]
0:37:58 [MUSIC PLAYING]
0:38:00 [MUSIC PLAYING]
0:38:02 [MUSIC PLAYING]
0:38:04 [MUSIC PLAYING]
0:38:06 [MUSIC PLAYING]
0:38:08 [MUSIC PLAYING]
0:38:10 [MUSIC PLAYING]
0:38:12 [MUSIC PLAYING]
0:38:14 [MUSIC PLAYING]
0:38:16 [MUSIC PLAYING]
0:38:18 [MUSIC PLAYING]
0:38:22 [MUSIC PLAYING]
0:38:24 [MUSIC PLAYING]
0:38:26 [MUSIC PLAYING]
0:38:28 [MUSIC PLAYING]
0:38:30 [MUSIC PLAYING]
0:38:32 [MUSIC PLAYING]
0:38:34 [MUSIC PLAYING]
0:38:36 [MUSIC PLAYING]
0:38:46 [BLANK_AUDIO]
Joseph Nelson, co-founder and CEO of Roboflow, discusses how the company is making computer vision accessible to millions of developers and industries, from manufacturing to healthcare and more.
-
Telenor’s Kaaren Hilsen on Launching Norway’s First AI Factory – Episode 247
AI transcript
0:00:10 [Music]
0:00:15 Hello and welcome to the NVIDIA AI podcast. I’m your host, Noah Kravitz.
0:00:19 In late 2024, the Telenoar AI factory was officially launched.
0:00:24 The AI factory is Norway’s first sustainable, sovereign, and secure
0:00:28 generative AI cloud service designed to enhance AI adoption for both internal
0:00:33 operations and external customers and to provide local AI computing
0:00:36 capabilities to the Nordic region. With us to share the story behind the
0:00:41 Telenoar AI factory and to discuss the impact responsible AI is set to have on
0:00:45 the country of Norway and the Nordic region more broadly is Karin Hilson.
0:00:50 Karin is the chief innovation officer and head of the AI factory at Telenoar.
0:00:55 And she’s also set to speak at NVIDIA GTC 2025 as part of a session titled
0:01:00 Accelerating Sovereign AI Factories – Insight from Telco Case Studies.
0:01:03 Karin, welcome to the AI podcast and thank you so much for taking the time to
0:01:07 join us. Thanks, Noah. Wow, what an introduction.
0:01:10 Well, you know, it’s hard because I try to, you know, I want to give you your
0:01:14 credit, give you your flowers as they say, but our guests and your no exception
0:01:17 have done so much. It’s hard to cram it all in there, but it’s all genuine.
0:01:21 We’re delighted to have you on the podcast. It’s great to be here, Nara.
0:01:26 So before we get into the story of the AI factory, which I’m really excited to hear
0:01:30 from you, maybe you can start by telling us a little bit about your own background
0:01:33 and your journey into AI. And then we can talk about Telenoar.
0:01:36 Yeah, I mean, the journey into AI, I mean, do we all end up here?
0:01:42 It’s a bit of the question, but is that my journey is maybe being shortly.
0:01:47 But I’ve been working in the the telco industry and with Telenoar now for 25 years
0:01:51 across the globe in many different continents and sort of moving back to
0:01:55 Norway a couple of years ago. And in the last few years, I’ve been working very
0:02:01 much with innovation, with sustainability and these things. And a bit of a fun fact,
0:02:07 I was actually sort of read up the other day that in, was it 2023 that
0:02:11 emissions that data centers have actually reached the same level as emissions
0:02:16 as the global airline industry. And that was a bit of an eye opener for me.
0:02:19 I don’t know if it’s fact or fiction, but it just sort of got me thinking.
0:02:24 And and then I see with the AI now sort of driving demand and everything,
0:02:27 the data sense is going to increase. And then I was thinking, well, okay,
0:02:32 but then we need to, how can we do this responsibly? And as I said, my journey to it,
0:02:36 I was working a lot with sustainability and how we can actually use our digital
0:02:44 infrastructure in the green shift. And then the vision of sort of democratizing AI
0:02:49 and every nation is an AI sort of came in. And I thought, wouldn’t it be cool if we
0:02:53 could build a green AI factory? And it was very much sort of the, the
0:02:57 passion I had around the sustainability, the need for AI is sort of,
0:03:02 it just beautifully came together. And I remember in the summer, sort of standing
0:03:06 on the banks of the Thames with my friends at a pub in the UK. And they said to me,
0:03:08 oh, come on, what are you doing at the moment? And I said, oh, yeah,
0:03:12 I’m building a green AI factory. And they said, is that even possible?
0:03:21 So I just love, so it’s sort of how I ended up, but I guess it’s a passion to make ideas happen.
0:03:26 Absolutely. You were working on sustainability previously in the telco industry?
0:03:30 Yeah, I’ve had various roles, but I had a sort of passion to sort of,
0:03:36 I was working with the critical infrastructure. I’ve had various CEO roles in Sweden,
0:03:41 in Montenegro, I’ve been working in Asia. And as I said, move back to Norway for
0:03:47 private reasons and looking, what can we really sort of do now? And seeing more and more that
0:03:54 the telcos critical infrastructure has such a key role in society. And I think this was one of my
0:04:00 eye knows during COVID, I think, like many of us sit and reflect. And I think all of us became so
0:04:06 dependent on the critical infrastructure that we have and the digital infrastructure.
0:04:10 So let’s talk about the factory. Should we start on the sustainability angle?
0:04:16 Should we start on sort of the commercial and sort of benefit of the factory to the
0:04:20 community? How would you like to begin? Tell us the story of building the AI factory?
0:04:25 I mean, I always like to sort of say, the story always started with a vision. And it was there
0:04:31 are sort of a top executive teleno where we’re meeting the sort of top executives in Nvidia.
0:04:35 I love this standard. There’s sort of every country must own the production of their own
0:04:42 intelligence. I sort of love that. And that sort of vision, then with the combination of teleno,
0:04:48 we’re 170 years old. We’ve not been running critical infrastructure in Norway for that time.
0:04:55 That’s amazing. It’s just such a beautiful fit. And I just love this vision that we had, okay,
0:05:00 that every country needs to sort of own his own intelligence and produce it. We need to do this
0:05:06 in a responsible, sustainable way. And we need critical infrastructure to do this. So this was
0:05:11 sort of the idea was then born. And I remember executives sort of coming back to Norway and
0:05:16 sort of said to me, current, can you make this happen? And he’s like, okay, do I have any funding?
0:05:22 Do I have no, no, just make it happen. And it was like, okay, where do we start sort of thing?
0:05:28 And every, you know, and everybody said, just believe, believe. And this is where our journey
0:05:34 started. There’s sort of which angle do you sort of tackle it? And if anything, I was overwhelmed.
0:05:39 And I think the only thing was certain is there was so much uncertainty, so many unknowns.
0:05:45 And we very much then started the journey exploring, okay, is there a need for it in
0:05:52 Norway? What’s the level of maturity? How can we build on this? And we started then working in
0:05:58 sprints. Forgive me for interrupting Karen, but when was this? How long ago? So this was, we started
0:06:02 the journey in February last less than a year ago. Less than a year ago. It’s amazing how fast
0:06:07 things are moving. Yeah, it was, even when I say, do you know, I feel like I’ve been working on a
0:06:15 lifetime, but it’s just just the sort of a year ago. And we started looking up the market, we
0:06:20 started looking, okay, what is our go to market proposition? You know, what equipment do we need
0:06:27 and what data center do we do it in? And all these sort of questions were coming up. And this is
0:06:34 quite early on we said, okay, you know, telcos are fantastic, but they maybe haven’t got the best
0:06:39 reputation for moving fast. So it’s like, how could we do it? And this is where we said, okay,
0:06:45 let’s take a bit of a different approach. And even the basic thing I faced with like, okay,
0:06:50 I want to order our first sort of cluster to put in a data center. And I didn’t have a legal entity
0:06:56 to do this with. I mean, a very silly sort of internal governance. So we decided, okay, let’s
0:07:02 think like a setting up like a startup, we’re going to set up a new company. So we set up a new
0:07:07 legal entity. And we said, okay, we’re going to be a startup. I said, I don’t want a huge team,
0:07:12 I don’t want a huge program, which is going to sort of drag us down in corporate governance,
0:07:18 slow decision making. So we said, okay, we’re going to think like a startup, we set up then our
0:07:26 first cluster, we caught together our go to market proposition. And then in August, we started to go
0:07:32 talk to some customers out in the market and said, this is our proposition, we’re going to have,
0:07:37 we’ve got an offering where we can offer a sovereign, secure and sustainable effectively.
0:07:43 What do you think? And this is sort of them where I think the fun really, really started.
0:07:49 And I always say, I said to the organization as well, the decision makers, the next talk I’m
0:07:55 going to come is when we’ve signed two customers. And I even remember standing on stage at an
0:08:00 internal town hall. And one of the leaders said to me, Karm, what does success look like? At the
0:08:05 end of the year, what does success look like? And I spontaneously said, yeah, we have two customers
0:08:11 in our AI factory. And there was, okay, that is Ray. And I said, yeah, we’re going to have two
0:08:16 customers, one internal and one external. And I was like, okay, so that became our goal. And this
0:08:22 is where the team then were fully focused that this is what we would then work towards.
0:08:24 Was that high autonomy, the first external customer?
0:08:31 So high, actually, we actually started off saying we wanted to tell another as our own
0:08:38 internal customer. Because we said it was very good to sort of start with that, to learn, build
0:08:43 confidence, you know, make sure we’re there before we dare go out and talk to other customers.
0:08:44 Right, right.
0:08:49 And this is where we pivoted quite early is then we need more intel if we just focus on one
0:08:54 customer. And again, we did a lot of this in building the AI factory for our own needs.
0:09:01 We saw this capability being fundamental for Teleno to accelerate his own AI journey.
0:09:07 Right, as well. I mean, Teleno sits on a lot of critical information and data,
0:09:13 which we need to secure that is sort of stays within in Norway and is operated very, very
0:09:19 securely. So we actually built the AI factory originally for our own needs. But we saw that
0:09:25 there was also a demand outside Teleno. So we pivoted quite quickly saying we can do this in
0:09:32 parallel, get one internal customer and one external customer. So we’re working sort of with
0:09:38 the Teleno operations to say, okay, they come on board. And then as you mentioned,
0:09:41 Hive Autonomy were then the actual first external customer.
0:09:41 Right, okay.
0:09:46 We got a board. And I think from both, you know, we’re learning a lot from both our customers.
0:09:52 And we have an MVP product. It’s not perfect. We’ve set some design principles that the whole
0:09:59 team, whereas where we say speedover perfection, we always like to say that roughly right is
0:10:05 better than precisely wrong. And again, in a Telco, we are so, for years, we’ve been on the
0:10:10 building, you know, perfect network. So we’re kind of our comfort zone here. And again,
0:10:16 in finding the right and what we call our MVP customers was quite critical. So we can have
0:10:19 customers that can learn and grow with us.
0:10:25 Are there currently specific use cases that you’re tackling, whether on the internal side,
0:10:29 or, you know, that Hive or perhaps other external customers are interested in? What are
0:10:31 some of the leading use cases?
0:10:35 Yeah. So I think that the one thing all of them have in common, first of all, is sort of very
0:10:40 giant. They’re all solving a specific problem. It’s not just like, oh yeah, we’ll buy some GPUs,
0:10:42 and then we’ll work out what we’re going to do.
0:10:45 Right, right, right. What can this chatbot do? Let’s play around with it.
0:10:50 Yeah, exactly. Which is what I think a lot of people want to do. But as I know, we want to
0:10:57 sort of, so like Hive autonomy, for example, I mean, they work with logistics, robotics.
0:11:03 So they are actually innovating, I say a lot of industries, whether it’s ports, as I said,
0:11:10 factories or this sort of in their operations and have efficiency cases. So they have very
0:11:16 specific customer needs that they are trying to solve. The reason sort of why they were
0:11:20 very interested in coming to the factory is that they’re sitting with sensitive data. So it was
0:11:27 very keen, they wanted it to be really on Norwegian soil. The teleno grand sort of represents
0:11:32 security, you know, so there’s sort of a gain that really helps them. And then the sustainability
0:11:39 part is super key. And so that was sort of the combination of these three. Cup Gemini is also
0:11:46 a customer of ours. They are developing products of doing voice to voice translation. And we can
0:11:51 say, yes, that can be done. But these are for sensitive dialogues. Not all dialogues can go
0:11:56 out in the cloud somewhere. These are very sort of sensitive dialogues, if you think, you know,
0:12:02 within the health sector, within the police. So not so much on prep, but again, it’s sort of a
0:12:10 safe, secure environment. And that’s really key. And another customer is working a lot with the
0:12:17 municipalities in Norway. And again, with sort of sensitive cases that they sort of really would
0:12:25 like their data to be secured. And the sustainability part of it. And this is something that, again,
0:12:30 as I said to you in my intro, I’m very, very passionate about. And then sort of, you know,
0:12:36 while some people say AI can solve climate change. So I think, you know, with the increasing
0:12:41 number of compute power and data centers that are needed, we have to be responsible and build
0:12:49 data centers in a sustainable way. And this is also why Telenoor is also building with several
0:12:56 partners, a state of the art modern data center here in Oslo. So Telenoor, in addition to the AI
0:13:02 factory is also partnered with a leading power company called Hufsman, and also a renewable
0:13:08 energy investor called high tech vision are actually now building a super modern data center.
0:13:14 And here it’s actually, this is not just about using renewable energy sort of coming in,
0:13:20 but is also all the excess heat will actually go into district heating and actually heat up
0:13:25 apartments and the surrounding area. So the heat coming out of the data center, you’re going to
0:13:30 capture it and redistribute it to heat homes in the area. Yes. I have very little, you know,
0:13:34 physical world building and engineering capabilities. So it’s a genuine question.
0:13:40 Is that a tricky undertaking? Or is it? Yes, it is. And this is sort of a company,
0:13:46 which is sort of Telenoor is partnering, taking all that. So I will say, we know. But I think what
0:13:53 we need to provide is then sort of, again, the sort of the critical infrastructure, the connectivity,
0:13:59 the security elements of it, and then being able to say to customers, and we’re doing this in an
0:14:05 energy efficient way. Our guest is Karin Hilson. Karin is the chief innovation officer and head
0:14:10 of the AI factory at Telenoor. And Karin Telenoor, we’ve talked Telenoor Norway,
0:14:14 Telenoor Sweden, and forgive me, I should have asked this up front. How many countries,
0:14:19 how big of an area does Telenoor serve? Telenoor, I mean, globally has over 200 million customers.
0:14:26 We have footprints in the Nordics. So Telenoor has a presence then in Norway, Sweden, Denmark,
0:14:32 and Finland. And then we have a presence in Asia. Got it. So to get back to the AI factory and thinking
0:14:38 about the use cases and such, are most of the use cases right now, and again, whether, you know,
0:14:44 actually happening or sort of in the works, are they centering around generative AI and large
0:14:51 language models? Or is it kind of, you know, other forms of machine learning and AI? What’s the buzz
0:14:56 right now in terms of, you know, the sort of current and future-looking use cases for the
0:15:02 factory? I would say that there is a lot of buzz that’s still in the exploratory phase to be very
0:15:07 open and honest. We forget. That’s where we all are. We’re still so early in all of this.
0:15:14 Yeah. And I see that there is definitely different maturity levels. I mean, when we talk to customers,
0:15:20 the market, and so many are, you know, exploring, you know, some very sort of super cool and
0:15:26 forward-leaning others are very much like, okay, again, going back to the, we’re sitting on very
0:15:33 critical data and everything. And it’s used to being in our, you know, basement or under our desks
0:15:38 and everything. But how can we sort of, we need more compute power and everything. And these are
0:15:44 the kind of dialogues that we are having with customers, sort of, how can we really, you know,
0:15:51 secure the handling of their data in a very secure way? They can trust that it is sort of,
0:15:58 stays on Norwegian soil that we still then are owning the production of intelligence as we
0:16:05 spoke about earlier. So I would say there is still, the dialogues are still very much evolving
0:16:08 around this. You’ve worked in different regions of the world and you currently do,
0:16:15 as you were just speaking to, how similar or different are not just the laws and regulations,
0:16:23 but sort of the common wisdom, the attitudes around responsible AI and data security, those
0:16:28 kinds of things. And these are obviously, you know, new and evolving topics, responsible AI
0:16:34 specifically. How similar or different do you find conversations about these things as you work in
0:16:39 different countries, different regions of the world? I would say they are very different.
0:16:46 I say the similarity is that certainly around security and sovereignty, what I have seen just
0:16:52 on the journey of the last two months is really increased. We are onboarding one customer at
0:17:00 the moment into the AIF factory and they sent us a list of 135 security questions. And this is sort
0:17:08 of what we forget. And again, these are becoming very, not just topical, but very sort of business
0:17:16 critical questions as well. And this is back to why we started the journey with the AI factory.
0:17:21 We saw that there was a bit of a hole in the market, if I could say that, we’ve been able
0:17:27 to have sovereignty, security and sustainability all three together. And then using the sort of
0:17:33 teleno trusted brand as when people trust us, we’re reliable, you know, is the sort of
0:17:39 combination. I mean, one could say, if you just want, you know, compute capacity to build an open
0:17:45 sort of model, go ahead and do that. But there is a lot of sort of really to move society forward
0:17:50 as a whole. You know, if it’s in the health sector, within defense, within the public sector, we see
0:17:56 that, you know, that there are sensitive dialogues, we see that there are, you know, things. So we
0:18:03 really see that the teleno AI factory, and together with our partnership with NVIDIA can
0:18:08 really help bring society forward. Absolutely. Because it’s some of these dialogues, as you say
0:18:14 now, around sovereignty and security, that are making people nervous, they’re making people
0:18:20 uncertain, they’re making people relook at their, you know, their fantastic IT, whatever that they’ve
0:18:26 had for the last years. So we’ve seen that there is definitely a change in climate, I would say,
0:18:33 around the seriousness of the dialogues. It seems appropriate, it makes sense to me.
0:18:38 All right, Karin, before we get to wrapping up, are there AI tools that you’re finding particularly
0:18:43 helpful, or, you know, just that you’re using regularly in your own life, be it work outside of
0:18:47 work, you don’t have to go into the details of what, but are there any tools that, you know,
0:18:52 you’ve been using regularly recently? Well, I have to confess, probably that I use it most to
0:18:57 help my kids with their homework, or if they say I’m cheating, I always tell them they can’t,
0:19:03 but I need to just answer all the, you know, I have to, you know, to answer all the why
0:19:09 questions and what it was. Yes, my older child tells me actually, when he’ll use it to sort of,
0:19:13 it’s almost the way that I use in people, we talk about on the show, people using, especially
0:19:18 folks in creative lines of work or doing creative, you know, projects that you’ll do something and
0:19:23 then send it to a chat bot to kind of get its take on it. And it kind of points out, oh, you missed
0:19:26 this, or, oh, here’s another way to think about that kind of thing. So I don’t think it’s cheating
0:19:31 myself, but, you know. No, it’s not cheating, but it helps get a dialogue because then exactly,
0:19:36 as you say, I then engage in discussions and we talk about whether it’s right or wrong,
0:19:41 and it is a fact that it actually, I find it actually is not cheating, I joke, but it actually
0:19:47 is a trigger to get the dialogue going. Which is fantastic, that’s great. Which isn’t always easy
0:19:53 with teenagers. No, no, it’s not. No, it’s not. All right, I want to give you time to talk about
0:19:58 the vision going forward. You know, this began with a vision, and it’s a vision that’s taken
0:20:03 flight now. What are your hopes? What’s your vision for the AI factory in the next, we’re
0:20:09 recording this in February 2025. So whatever time period makes sense, as we said, things are
0:20:13 moving so fast, it’s hard to know where AI will be in three years. But, you know, the next year,
0:20:17 two years, five years, what are you, where are you taking the AI factory?
0:20:21 Yeah, I mean, we certainly want to scale it. As I said, you know, we see, we’re starting in Norway,
0:20:30 we see Norway has this need. And I really believe that we can really help empower society here in
0:20:36 Norway through giving, you know, access to a place where different organizations, business,
0:20:42 it is the public sector, the private sector, can really innovate their businesses, make it more
0:20:49 efficient. And really sort of going back to this and feel that, yes, we are really doing this in a
0:20:56 sort of sustainable way. We’re helping that we’re really, we know our data and it’s safe,
0:21:01 we can run these really sensitive cases. And it’s sort of going back to this, I just love
0:21:07 this, the production of the intelligence is here in Norway. So really, I do see the AI factory as
0:21:12 being a very, maybe I’m being too, my vision is getting a bit, but the more I work with it,
0:21:17 the more I see it can really help move societies forward and develop.
0:21:21 Fantastic. That’s what it’s all about, if you ask me, but just my opinion.
0:21:23 My fully agree.
0:21:27 Karen Hilson, this was a pleasure. Thank you for taking the time. And you’ll be on the panel at
0:21:33 GTC next month as we record this. So looking forward to that. And for folks listening,
0:21:39 who would like to find out more about the AI factory, about Telanor, anything we’ve discussed?
0:21:44 Is there a good website, URL, social media, where can listeners go online to learn more?
0:21:49 Yeah. I mean, go onto the Telanor website, you’ll find our Telanor AI factory there. You can reach
0:21:57 out to me on LinkedIn as well. So please, we’re welcome. We’re on this journey together. As you
0:22:02 say, we’re still very much in an explorative phase. So I’d love to hear from people.
0:22:06 Fantastic. Well, again, Karen, thank you so much. It’s been a pleasure and
0:22:09 all the best of luck with all the work you’re doing. There’s no better reason to employ AI
0:22:12 than to bring us all forward, as you said. So all the best.
0:22:14 Thanks, Snare.
0:22:18 [Music]
0:22:29 [Music]
0:22:40 [Music]
0:23:01 [Music]
0:23:10 [BLANK_AUDIO]
Telenor’s Chief Innovation Officer and Head of the AI Factory, Kaaren Hilsen, discusses Norway’s first AI factory. Opened in November, the facility processes sensitive data securely within Norway, ensuring data sovereignty and environmental sustainability. Learn how Telenor’s green computing initiatives, including a renewable energy-powered data center in Oslo, are advancing responsible and sustainable AI.
-
Temenos’ Barb Morgan Shares How AI Is Reshaping Banking – Episode 246
AI transcript
0:00:10 [MUSIC]
0:00:13 Hello, and welcome to the NVIDIA AI podcast.
0:00:15 I’m your host, Noah Kravitz.
0:00:17 Since its founding in 1993,
0:00:21 Temenos has been on a mission to revolutionize banking.
0:00:23 Its open platform enables people across the world
0:00:25 to carry out their daily banking needs,
0:00:27 and for banking providers to build new services
0:00:30 and state-of-the-art consumer experiences
0:00:33 using AI and other cutting-edge technology.
0:00:35 Starting a bit more recently,
0:00:37 our guest has been leaning Temenos efforts
0:00:38 to drive digital transformation
0:00:41 from financial institutions across the world.
0:00:44 In October of last year, 2024, to be specific,
0:00:47 Barb Morgan joined Temenos as chief product
0:00:48 and technology officer,
0:00:51 bringing over 25 years of leadership experience
0:00:53 in global product development organizations
0:00:54 with her to the role.
0:00:57 Barb has done a lot in banking and financial services
0:01:00 to put it mildly, especially with AI and cloud tech.
0:01:02 In fact, it’ll be better to ask her
0:01:03 to tell us about her background.
0:01:05 So we’ll start there in just a second,
0:01:07 except that I will add that Barb holds
0:01:09 a Bachelor of Science in Computer Science
0:01:11 from the University of Central Oklahoma.
0:01:13 That said, Barb is here to talk about
0:01:14 generative AI and banking,
0:01:16 Temenos’ approach to AI,
0:01:18 and the importance of sustainability
0:01:19 in the industry for starters.
0:01:20 So let’s get to it.
0:01:23 Bob Morgan, welcome, and thank you so much
0:01:25 for joining the NVIDIA AI podcast.
0:01:27 – Thanks, Noah, excited to be here.
0:01:29 – Excited to have you.
0:01:31 All right, so I teased it in the intro a little bit,
0:01:34 but maybe we can start with you telling us a bit
0:01:35 about your background,
0:01:37 your journey into working with AI,
0:01:39 and how you landed at Temenos.
0:01:41 – Absolutely, so I actually started
0:01:43 with my hands on the keyboard.
0:01:46 So I was a developer many years ago.
0:01:49 When you said 25 years, I had to smile a bit
0:01:51 ’cause it reminds me how long career has been.
0:01:55 But now, my career did start with the hands on the keyboard,
0:01:58 but I always really enjoyed that link
0:02:00 between what we were doing with technology
0:02:02 and how that was really impacting the customer.
0:02:05 And so as my career continued
0:02:07 to kind of go through my journey,
0:02:10 I really gravitated towards those areas
0:02:13 that had a strong customer centricity.
0:02:15 And so I spent about the past 15 years of my career
0:02:18 focused in the financial services industry.
0:02:23 So I’ve led transformations inside banks within techs,
0:02:24 side by side with the banks,
0:02:27 and really focused around modernizing core systems,
0:02:29 building innovative products,
0:02:31 and accelerating AI adoption,
0:02:34 which you can’t have a conversation anymore without AI.
0:02:37 But AI isn’t new to me.
0:02:40 We’ve been using it for years from fraud detection,
0:02:42 risk modeling, automation,
0:02:45 but what’s really different now is we’re seeing that shift
0:02:50 where Gen AI has shifted the entire landscape
0:02:51 where it’s not about efficiency,
0:02:54 it’s really about making the bank smarter,
0:02:58 more intuitive, and bringing that hyper-personalization
0:02:59 to the clients.
0:03:01 It’s exciting when we talk to our clients.
0:03:05 So as you mentioned, I joined Temno’s in October,
0:03:08 and I’ve spent, in my past four months,
0:03:12 a lot of time out there just talking with the clients,
0:03:13 understanding what they’re thinking about,
0:03:15 whether it’s the CEO, CTO,
0:03:19 and they really want to get back to their customers, right?
0:03:22 Whether it’s having us run a banking suite for them on SaaS
0:03:24 so that they can focus on their customers
0:03:28 versus infrastructure, leveraging AI,
0:03:31 but that customer centricity is really coming out,
0:03:33 and it’s paired so nicely with the Gen AI.
0:03:37 – It’s interesting you mentioned the shift from efficiency,
0:03:39 and not that efficiency is a thing of the past,
0:03:42 I’m sure in the banking sector especially,
0:03:44 but kind of that shift from efficiency
0:03:47 to the customer relationship
0:03:49 and how can we better serve customers?
0:03:53 And is that something that you felt happening
0:03:56 before Gen AI in particular
0:03:58 kind of took center stage over the past couple of years,
0:04:00 or is it something that you think
0:04:03 kind of followed the technology that people realized,
0:04:06 like, oh, this is a great way to do all these things
0:04:08 with customer, you know,
0:04:10 personalization, customer service, what have you,
0:04:13 and that trend sort of followed the tech?
0:04:17 – I think the era of the chat box, right?
0:04:20 Which was kind of that first introduction.
0:04:23 When you were sitting on the technology side,
0:04:25 whether inside the bank or at Vintex,
0:04:26 we thought, this is fantastic.
0:04:28 And what we realized was it was actually
0:04:30 really frustrating to the clients.
0:04:33 I don’t know if you’ve ever called into your bank
0:04:37 and it’s like, press one for this, press two for this,
0:04:40 and then you just end up mashing that zero key, right?
0:04:43 – I’m just screaming representative in the phone,
0:04:44 is that right?
0:04:47 And so now I think what we’ve seen
0:04:50 where technology is leading,
0:04:55 is that Gen AI can bring that human centered approach,
0:04:58 and really bringing more humans back
0:05:00 to that immediate touch point with customer,
0:05:05 because now all that data can get pulled instantly.
0:05:07 And so you don’t have to go through
0:05:10 that representative representative.
0:05:14 And so I do think that we thought technology was leading
0:05:16 when kind of that era of the chat bots
0:05:21 and some of the different customer type efficiencies,
0:05:23 we’re playing out maybe five, 10 years ago.
0:05:26 But now I think technology truly is leading
0:05:28 and people are seeing it as an abler
0:05:31 versus again, just that cost efficiency.
0:05:32 – Right, right.
0:05:34 Maybe you can unpack a little bit
0:05:37 what it means from the banker side
0:05:39 to deliver a better experience
0:05:43 and how they’re thinking about leveraging Gen AI
0:05:44 and related technologies.
0:05:46 I love, I’m not even a developer,
0:05:50 but I get on my high horse when people talk about Gen AI
0:05:52 as if it’s the only kind of artificial intelligence, right?
0:05:54 So machine learning, predictive analytics,
0:05:57 all these things, those are not, like we said,
0:05:58 they’re not going away,
0:05:59 but from the bankers perspective,
0:06:01 what are they excited about now?
0:06:04 What are some of the some of us banking customers doing
0:06:07 to leverage this tech to deliver better experiences
0:06:09 and make their clients happier?
0:06:12 – Yeah, I think we’re seeing kind of two themes
0:06:14 really kind of play out
0:06:16 when we have the conversations with the banks.
0:06:20 And the first one is, how can they help their team?
0:06:22 And then secondly, can they help their customers?
0:06:26 So from the perspective of helping their teams,
0:06:27 they’re asking us questions of,
0:06:30 how can we give our bankers instant access to insight?
0:06:33 How can we leverage the historical data
0:06:36 and make recommendations almost instantly?
0:06:38 How can we have those richer
0:06:42 more meaningful conversations enabled for our bankers?
0:06:45 And so that’s really kind of that internal look of his,
0:06:48 how can AI sit side by side
0:06:52 and really be that plus one in the conversation.
0:06:54 I think for the customers,
0:06:56 it goes back to that hyper personalization,
0:06:59 whether it’s pay-learning loan options
0:07:00 or giving them insights
0:07:02 that they hadn’t thought about before,
0:07:06 both from historical, but also predictive in the future.
0:07:09 And then there’s the speed that customers expect now.
0:07:11 Everyone seems to have,
0:07:13 whether it’s chat, GBT or perplexity
0:07:15 or whatever loaded onto their phone,
0:07:17 everyone expects that instant answer now.
0:07:19 And so they expect that everywhere,
0:07:21 but we’re in a highly regulated space.
0:07:23 And so making sure that we do that
0:07:25 in a very responsible way.
0:07:28 But some of the things that we’re doing that,
0:07:31 I really enjoy myself and try to entrench myself
0:07:33 with the teams as much as I can on
0:07:36 is instead of just coming up with solutions
0:07:37 and going out to our clients,
0:07:40 we’re doing a lot of co-design around the AI solution.
0:07:45 So really saying, hey, here’s five or six use cases.
0:07:46 Which of these stands out to you?
0:07:48 Which one or two of these do?
0:07:50 Let’s sit side by side.
0:07:53 Let’s think about how we can co-develop this together.
0:07:54 One that we’re working on recently
0:07:56 is a AI powered solution
0:07:59 that allows the product managers at the bank
0:08:01 to create those financial products,
0:08:04 leveraging that data insight.
0:08:07 And that was where that predictive future capabilities
0:08:08 came into question, right?
0:08:12 So based off of the history, what can we predict?
0:08:14 And we can give you the bank’s knowledge.
0:08:17 We have bank expertise as well,
0:08:19 because we serve 1,000 different banks
0:08:21 or more than 1,000 banks.
0:08:22 And so bringing that all together
0:08:25 to think about future predictions,
0:08:27 pulling it together and giving the right products
0:08:30 to the right customers at the right time.
0:08:33 For me, it’s also,
0:08:35 we don’t wanna leave any of our customers behind.
0:08:38 One of the things that I really enjoy
0:08:40 about what we bring to our clients is,
0:08:41 we focus on flexibility.
0:08:46 And what I mean by that from a pure tech stack perspective
0:08:48 is if you’re gonna be on-prem,
0:08:50 if they wanna be in the cloud or if they want SaaS,
0:08:51 they have the optionality.
0:08:56 And so partnering most recently with Amidia to bring AI
0:09:02 to our on-prem banks has been hugely well received.
0:09:06 Just that ability to give that AI-driven analysis
0:09:09 with those massive data sets that they have on-prem
0:09:12 but allowing them to control the security around it,
0:09:16 allowing them to really not need to have
0:09:19 the deep technical expertise to analyze.
0:09:21 And so a lot of excitement there.
0:09:24 And it’s good too, because we were talking
0:09:26 with the bank the other day and he said,
0:09:29 our ability to have adequate data management
0:09:30 is very limited.
0:09:32 There’s areas we’ve invested,
0:09:35 but it might be 25% of our landscape
0:09:37 that we actually can pull analytics.
0:09:40 So these tools that can look across those massive data sets
0:09:42 are really exciting.
0:09:45 – So to kind of take a step back for a second,
0:09:49 if you don’t mind, Teminos offers a platform.
0:09:51 And so when you’re talking about,
0:09:53 I just kind of wanted to unpack for the listener
0:09:55 what Teminos actually does.
0:09:57 And my understanding is, it’s services,
0:10:01 within you also have a platform where clients
0:10:03 can build products for their banks?
0:10:07 – Yeah, so in kind of three different ways.
0:10:11 So first we have a end-to-end banking platform.
0:10:15 And so for our banks, within like, let’s say within the US,
0:10:17 our tier three regional banks will come in
0:10:19 and they’ll take an end-to-end platform
0:10:22 that provides all of the capabilities of banking.
0:10:25 – Almost like a turnkey banking solution, okay.
0:10:29 And that’s where we see a lot of the adoption of SaaS,
0:10:32 bring us a bank, even within kind of
0:10:33 that neo banking space as well.
0:10:38 But then we also, we also offer modular solutions
0:10:39 to our clients.
0:10:42 And so if you look at some of our tier one banks,
0:10:44 they don’t wanna replace their whole platform, right?
0:10:46 And so, but they may come to us and say,
0:10:50 hey, we just want your payments module
0:10:53 or we just want your originations.
0:10:56 And so giving that choice and flexibility,
0:10:58 whether they want the full platform
0:11:01 or they want modulars within the platform.
0:11:03 And then we also offer products
0:11:05 around what we call point solutions.
0:11:08 So things that may be add-ons that they may choose
0:11:10 to build themselves and plug into our platform,
0:11:11 like their digital interface.
0:11:14 Or we also offer a digital interface
0:11:17 that they can leverage with our suite.
0:11:18 – Gotcha, thank you.
0:11:20 And you serve bank customers of all sizes?
0:11:24 – We do, so we have over a thousand banks globally.
0:11:27 So we have a very global footprint
0:11:30 from the Americas to Mia to APAC.
0:11:34 And so, with that, it’s everything from tier one banks
0:11:37 all the way down to the near banks.
0:11:40 – Are you seeing similar or different trends
0:11:42 in terms of, I guess both adoption rates
0:11:44 from smaller banks and larger banks
0:11:45 when it comes to AI tools,
0:11:47 but also what they’re using them for?
0:11:50 Or is it everyone primarily is customer service?
0:11:53 Like this is the big, it’s not just low hanging fruit.
0:11:55 Like it’s a potentially really big win.
0:11:56 Is that just where the focus is now?
0:12:00 – I would say AI is no longer an option, right?
0:12:03 So they have to have AI, whether it’s embedded,
0:12:07 whether it’s actually more of a feature functionality,
0:12:09 maybe within their digital to allow their customers
0:12:13 to click on some type of AI agent.
0:12:17 But I would say we see two different lens on this.
0:12:21 So one is regionally, you have different areas
0:12:25 that they’re more prone to adopt AI faster
0:12:26 than other regions.
0:12:28 And then from the customer tiering,
0:12:32 what I would say is the tier one, tier two banks,
0:12:34 absolutely they wanted embedded in their product.
0:12:38 The kind of tier three banks, regional banks,
0:12:42 they’re really focused on the personalization that it brings.
0:12:44 – Okay, you kind of alluded to this a little bit,
0:12:49 but is just gathering data and helping the banks
0:12:54 kind of find and gather up and scrub and prepare and use.
0:12:59 You mentioned the example of only 25% of the data
0:13:00 actually going to analytics.
0:13:02 I’m getting that wrong on my paraphrasing,
0:13:04 but is that still kind of the biggest,
0:13:07 I don’t wanna say hurdle, but one of the biggest hurdles
0:13:09 to kind of leveling up success?
0:13:13 – It is, one of the first conversations that I often have
0:13:15 or something I always try to click into
0:13:18 and probably a bit of my kind of geeky background
0:13:21 leads me into is what does your data look like?
0:13:24 Because in order to have responsible AI,
0:13:26 you have to have your data in a state
0:13:29 that you can actually leverage the tools.
0:13:34 And for us, we really take that AI explainability
0:13:35 very seriously.
0:13:37 And so we spend time with our customers
0:13:40 to understand what is the state of their data.
0:13:43 And oftentimes what we find is if you think back
0:13:45 to the transformation 10 years ago
0:13:47 that banks were undergoing,
0:13:51 it was a lot about how it looked and felt to the customers,
0:13:53 not about transforming the back end.
0:13:57 So many banks are made up of acquisitions,
0:13:58 mergers over time.
0:14:02 And so the front end looks really slick and great
0:14:04 and it looks like you’re dealing with one bank
0:14:06 and it’s actually hitting six different banks
0:14:07 in the background.
0:14:08 Right.
0:14:13 And so sometimes it is, working with our clients,
0:14:17 we have a Temnos data product, our RTDH product
0:14:22 and it is bringing that data into a state
0:14:24 that it can then be leveraged.
0:14:27 And so kind of back to that optionality
0:14:29 that we talked to our clients about,
0:14:32 sometimes we may just take a portion of the bank
0:14:36 and really focus on getting the data streamlined
0:14:39 and getting it ready to be able to use that embedded AI
0:14:40 and then proving it out.
0:14:44 And I think as we see, banks are highly regulated,
0:14:45 that’s not going away,
0:14:49 the regulations are gonna get tougher if not anything else.
0:14:52 And so we always keep that top of mind,
0:14:54 making sure that compliance standards
0:14:58 are absolutely embedded into our products
0:15:00 that fully auditable.
0:15:02 And so that’s where a lot of our conversations
0:15:04 end up leading around data is, are you ready?
0:15:05 Right, right.
0:15:07 I’m speaking with Barb Morgan.
0:15:11 Barb is the chief product and technology officer
0:15:14 at Temnos, a role she started within the last year,
0:15:17 the current chapter in an illustrious career
0:15:19 in global product development,
0:15:21 particularly across banking and financial services.
0:15:24 And we’ve been talking about generative AI in particular
0:15:27 and the banking perspective on the chat bot revolution,
0:15:29 if you wanna put it that way,
0:15:30 but just as a leaping off point,
0:15:34 so much more obviously before chat GPT and the bots came
0:15:37 in the world of machine learning and obviously sense
0:15:40 the pace, it’s just been breakneck.
0:15:42 Barb, I wanna kind of shift gears for a second here
0:15:45 if we can and talk a little bit about sustainability
0:15:46 in the industry.
0:15:49 I know that Temnos has a point of view on this
0:15:52 and is actively working with client organizations
0:15:53 to help them be more green.
0:15:55 And I would think that your perspective,
0:15:58 both in the industry and also in kind of the international
0:16:01 nature of the work that you do with Temnos,
0:16:04 curious to hear both the company’s perspective
0:16:07 and what you’re up to with clients to help them be more green,
0:16:10 but also your take, having been around the industry
0:16:13 and around the world literally for a few decades now.
0:16:17 – So to me, sustainability, it’s not a trend.
0:16:19 I think when it really started becoming part
0:16:22 of the conversations, call it maybe 10 years ago
0:16:25 where it actually was part of annual reports
0:16:28 and things like that, I think people kind of questions,
0:16:30 like is this long-term?
0:16:31 And now what we’ve seen the shift is,
0:16:34 it’s really a responsibility of the organization.
0:16:38 And so, when we sit down and we talk with our banks
0:16:39 and why it’s important to us
0:16:42 and bringing some of those solutions forward
0:16:45 that allow them to be greener,
0:16:48 we talk about both what it means to them,
0:16:49 where their focus is.
0:16:52 So to your point, we serve banks globally
0:16:54 and so we see different parts of the world
0:16:59 where it may be more around their carbon accounting
0:17:01 or some areas that may be like,
0:17:04 hey, we really want to understand
0:17:06 how our cloud deployments are helping.
0:17:08 And so understanding like,
0:17:11 how can we drive a greener banking future?
0:17:14 And there are always great conversations
0:17:17 because they really often talk to the values
0:17:18 of the organizations.
0:17:20 And so you get to actually spend time
0:17:22 in the cultural side of the bank.
0:17:26 And what’s really kind of cool for the lack of a better word
0:17:30 about those conversations is when you can use AI
0:17:32 to bring the culture forward in the solutions
0:17:34 through something that really matters to them,
0:17:37 it’s a very rewarding solution, right?
0:17:41 So, like I talked about the smart carbon accounting,
0:17:44 helping our customers track their carbon footprint
0:17:47 both through how they’re using their software
0:17:49 but then being able to offer that to their customers.
0:17:51 There’s many consumers who want to know,
0:17:54 like, hey, how are the purchases that I’m making
0:17:56 impacting my carbon footprint, right?
0:17:58 So not only are we talking to our customers
0:18:00 but we’re actually impacting their end customers as well.
0:18:04 – So kind of to piggyback off that a little bit
0:18:06 and open it up a little more abstractly, I guess.
0:18:07 This is a big question,
0:18:09 but I’ll throw it at you, you can handle it.
0:18:13 How do you see the future of banking being shaped by AI?
0:18:15 And I guess the flip side of that is,
0:18:17 how do you see the future of AI growing and banking?
0:18:21 But I think really, there’s been so much,
0:18:22 we kind of joked about it for a second.
0:18:24 There’s been so much in the past couple of years
0:18:26 with gen AI and the pace isn’t slowing down.
0:18:30 And we have in our notes here to talk about AI agents,
0:18:32 which is kind of the latest thing
0:18:34 buzzword-wise in the past few months, right?
0:18:35 But certainly not a new thing
0:18:37 and certainly something that could wind up
0:18:40 really shaping the technology going forward.
0:18:41 We’ll have to see what plays out.
0:18:43 Do you have a strong view on what you think
0:18:47 is going to happen with AI and banking
0:18:49 and banking kind of being reshaped by the technology?
0:18:52 And that can be short-term next couple of years,
0:18:54 take it a little further out if you like.
0:18:56 What are your thoughts kind of generally on this?
0:18:59 – Yeah, I think even before stepping into that,
0:19:02 I think the one thing that I really see
0:19:06 and I think it’s important to kind of talk about
0:19:09 is the leadership of an organization
0:19:13 really shapes how AI is going to be accepted, right?
0:19:16 Is it the same as a friend or a foe?
0:19:18 If you have your top leadership just talking about
0:19:21 how much money we’re gonna save from AI, it’s a foe, right?
0:19:25 But we’re seeing the leaders of the organizations
0:19:28 really look at AI as not as a threat
0:19:32 and really talking about it as an enabler,
0:19:35 getting people curious, getting people engaged,
0:19:38 more and more organizations, and we do this ourselves,
0:19:40 whether you use the term eating their own dog food
0:19:44 or French would say drinking their own champagne.
0:19:46 We’ve been doing that ourselves to say,
0:19:49 “Hey, let’s actually use this on ourselves.”
0:19:51 And then if it works well for us, great.
0:19:53 We can start to expand it to our customers.
0:19:56 And so when you start to see the leadership
0:19:58 of the organizations, whether it’s the CEO
0:20:02 or any of the C-suite, talking about how they’re curious,
0:20:04 how they’re using it in their daily lives,
0:20:08 how they’re getting in there and playing around themselves
0:20:11 and thinking about how can I get rid of repetitive,
0:20:14 time-consuming tasks and focus on deeper matters
0:20:15 and more strategic work,
0:20:18 you start to see that really come out.
0:20:20 And I think that’s important in order
0:20:22 for people to see AI as a tool
0:20:26 to amplify the human potential, not to replace it.
0:20:30 – Are the bankers, the employees, to put it that way,
0:20:32 are they thinking about it the same way?
0:20:33 Is there excitement?
0:20:35 Is there fear around job replacement?
0:20:37 Is I think– – Any change, right?
0:20:39 There’s always gonna be a bit of fear.
0:20:44 And I think it’s up to us as the banking experts
0:20:46 and as partners to our clients
0:20:49 and then working with their teams
0:20:52 to help kind of show how the change does
0:20:54 actually help them, right?
0:20:59 And so when we see that kind of pivot away from,
0:21:01 “Oh my gosh, this is gonna replace me,”
0:21:02 to, “Wait a second,
0:21:05 “I’m actually gonna sit side-by-side with AI.”
0:21:08 And it’s gonna– – Plus one, you–
0:21:09 – Yes. – I forgive me in a rub.
0:21:10 I just wanted to give you credit.
0:21:12 I hadn’t heard somebody use Plus One
0:21:14 to talk about AI before and I love it, right?
0:21:17 It’s the co-pilot, whatever you wanna call it,
0:21:18 but Plus One is great.
0:21:22 – Yeah, and for me, I really focus on,
0:21:25 I hate the word artificial intelligence
0:21:28 because artificial, it’s fake.
0:21:30 There’s just that negative connotation.
0:21:33 And so I often start out by talking with our clients
0:21:36 about thinking about it as augmented intelligence.
0:21:40 And that gives you that Plus One effect, right?
0:21:43 And then when you show the bankers,
0:21:46 hey, someone’s gonna walk into your branch,
0:21:48 you’re instantly gonna be able to know
0:21:51 more about that customer than they know about themselves.
0:21:55 And you’re gonna be able to have a really deep conversation
0:21:57 both about what’s right for them today,
0:22:00 what’s right for them in the future,
0:22:02 how they’re shaping those things.
0:22:03 Their eyes light up, right?
0:22:06 Because oftentimes they would have to,
0:22:08 the customer would sit in the lobby,
0:22:09 they would do a bunch of research,
0:22:11 they might be pulling paper files out,
0:22:12 they’re trying to remember,
0:22:15 “Okay, this person has been with us for 10 years
0:22:17 “and they have a mortgage and they have a car
0:22:19 “and they have this.”
0:22:19 – Right, right.
0:22:21 – “Oh gosh, what else could I offer them?”
0:22:24 When they can, through natural language questions, say,
0:22:27 “How long has this customer been with us?
0:22:29 “What is their familial history?”
0:22:33 So this may be a 30 year olds,
0:22:37 but maybe their family’s been with the bank for 25, 30 years.
0:22:40 And then when their customer walks up and they say,
0:22:44 “Hey, it’s been great, we’ve served your parents.”
0:22:47 And so excited to have you here with us.
0:22:49 And we had these great,
0:22:51 we’re looking at what loans that you have with us,
0:22:53 we could consolidate those together,
0:22:54 we could offer you a better rate.
0:22:57 We have this great potential over here.
0:22:58 They’re excited, right?
0:23:01 – Yeah, it really kind of,
0:23:03 you sort of made real in listening to you talk about that.
0:23:04 It’s a great example,
0:23:07 ’cause it makes me think of the sort of abstract talk
0:23:11 about machine learning, AI tech, kind of,
0:23:14 freeing humans up to do what humans do
0:23:15 better best.
0:23:18 And in this case, I can relate because it’s not quite AI,
0:23:21 but if it’s not in my phone calendar, I forget it, right?
0:23:24 And so I can only imagine being a banker,
0:23:26 having so many clients to serve.
0:23:28 As you said, I’m in the lobby waiting
0:23:30 because the banker is doing their best
0:23:33 to kind of do a crash course on my whole history
0:23:36 with the bank to serve me, ’cause there’s so many customers,
0:23:37 I’m getting frustrated and waiting, et cetera.
0:23:39 Yeah, let the AI do it.
0:23:41 And then it just, in real time, it pops up.
0:23:43 And yeah, that’s a great example.
0:23:47 And today, I mean, the future is gonna be all
0:23:50 about human and AI collaboration.
0:23:52 We’re already seeing kind of AI agents.
0:23:54 So to your agentic AI, right?
0:23:58 And latest buzzword, handling those routine banking tasks.
0:24:02 But if you can set it up where it’s doing your segmentation,
0:24:04 then it’s doing some product suggestions,
0:24:07 then it’s seeing, as you offer those products,
0:24:10 maybe it’s actually shaping that segmentation.
0:24:12 And so those agents are continually learning
0:24:13 from each other.
0:24:17 And then you can bring that to that human collaboration.
0:24:19 It’s just exciting.
0:24:23 I think we’ll start to see digital humans in making
0:24:26 so that you aren’t saying representative representation.
0:24:29 I was gonna say, when we were talking about that before,
0:24:33 I’m 100% for the plus one,
0:24:35 the augmented intelligence is also,
0:24:37 I like that way of thinking about it as well, right?
0:24:41 And I wanna see humans use the technology to thrive
0:24:43 and not talk about things like replacement, et cetera.
0:24:46 That being said, I’m an impatient person.
0:24:49 And so I always gravitate towards the self-checkouts
0:24:51 at stores.
0:24:55 And so if the automated banking menu could give me,
0:24:58 ’cause I never call unless I have some weird question
0:25:00 or like I’ve missed four payments
0:25:03 and wanna try to beg somebody to give me grace, right?
0:25:05 So yeah, get the automated system to that point
0:25:07 and I’ll be happy.
0:25:10 Yeah, or even think about how great would it be
0:25:13 if your phone pops up, ’cause I know we all,
0:25:16 or at least I know my phone is always within an order.
0:25:17 Yes, yep, yep.
0:25:19 Anywhere, and if it just said,
0:25:22 “Hey, Noah, looks like you missed your last payment.
0:25:24 “Would you like us to auto debit from your account
0:25:27 “and we’ll free up any late fees?”
0:25:28 And you’re like, “Yes, done.”
0:25:30 Yeah, that’s exactly the one, yeah.
0:25:34 And so that proactive monitoring, bringing that,
0:25:36 so that you aren’t even having to call in, right?
0:25:38 You, how much better would that be
0:25:41 if it automatically reaches out to you?
0:25:45 Yeah, if my creditors happen to be listening to the podcast,
0:25:48 I just made that example up, we’re good.
0:25:51 Barb, before we get to wrapping up, I wanted to ask you,
0:25:53 and I think this is something I need to start asking guests
0:25:56 going forward, so thank you for inspiring me.
0:26:00 You mentioned in your work talking to leadership,
0:26:03 it’s so important, it’s such a tone for so many things,
0:26:06 but including an organization’s a bank’s kind of perspective
0:26:07 on embracing AI.
0:26:10 And you talked about getting these clients,
0:26:12 these banking leaders, to start using the tools
0:26:12 in their own life.
0:26:13 Do you have a routine?
0:26:16 Do you have things that you use AI for
0:26:19 on a daily, regular basis that,
0:26:22 maybe a pro tip to share with the audience?
0:26:27 Yes, I might overuse it, that’s probably the engineer in me.
0:26:30 My husband, I’ll be traveling and I’ll get a message,
0:26:31 like, can you turn this thing off?
0:26:33 Like, why are all the lights coming on?
0:26:36 Why is, like, I know you wake up at six,
0:26:38 but I’m not waking up at six when you’re not home.
0:26:40 Like, why is the house waking up for me?
0:26:41 Smart home lights, yeah.
0:26:43 No, no, I mean, you know, for me,
0:26:45 I use it in a couple different ways.
0:26:49 Sometimes I use it just to say, is my message clear, right?
0:26:54 Like, when you’re so deep into whatever your specialty is,
0:26:55 right?
0:26:57 You feel like your message is clear
0:26:59 because you’ve been living, eating, breathing,
0:27:00 working on it for a while.
0:27:03 I can quickly throw that into whatever my favorite tool is,
0:27:06 whether it’s chat, GBT, or perplexity, or whatever,
0:27:10 co-pilot, and say, summarize this message.
0:27:12 What is the tone?
0:27:16 What level of audience is this reaching?
0:27:20 And hopefully, it’ll say, hey, this is actually
0:27:22 geared at a engineering audience.
0:27:24 Oh, well, wait a second, that’s not who I’m speaking to.
0:27:25 Right, that’s great.
0:27:29 Let me make sure that I bring this back into more
0:27:33 of a business speak, or this is very financially focused.
0:27:34 OK, wait a second.
0:27:37 And so I use it oftentimes in a way to sense, check me.
0:27:40 But I also use it for a bit silly stuff, right?
0:27:45 So we have four kids back in the state, all college age,
0:27:47 nine to 22.
0:27:50 And we were going on holidays in Mexico.
0:27:53 And as much as I think I’m a cool mom,
0:27:56 I absolutely used AI to say, what are the best things
0:27:57 to do down in Mexico?
0:27:58 Totally.
0:28:01 Yeah, and it got it pretty close to right.
0:28:03 Like, they like the different restaurants
0:28:04 that we took them to.
0:28:05 And there you go.
0:28:07 Yeah, so I use it quite often.
0:28:11 I also– I play with a lot of tools
0:28:13 outside of the financial industry,
0:28:16 because I think it’s important to see how other industries are
0:28:16 leveraging AI.
0:28:19 It gives us ideas into the financial space,
0:28:21 whether it’s maybe the insurance space.
0:28:23 I was on my insurance app the other day,
0:28:25 and they have AI embedded.
0:28:27 And I thought, wow, this is really cool.
0:28:32 And so looking for ways that other people are using AI
0:28:34 is sometimes the way that I use AI.
0:28:35 Excellent.
0:28:38 Barb, for listeners who would like to learn more
0:28:43 about Temenos approach to AI, other services Temenos offers,
0:28:47 maybe something a little more engineering, geeky oriented.
0:28:48 I don’t know if there’s a developer blog,
0:28:51 or you have other social media, anything.
0:28:54 Where would you direct them to go after listening?
0:28:56 Yeah, whether it’s LinkedIn, if that’s
0:29:00 their favorite within just looking at Temenos.
0:29:03 They will definitely find a cluster of areas to go.
0:29:09 And then, of course, our website, just our www.temenos.com.
0:29:10 They can look at our products.
0:29:13 We do have more of the technical aspects, right?
0:29:16 So our developer portals, and then also just understanding
0:29:19 where our thought leadership is in the space.
0:29:19 Fantastic.
0:29:22 Barb, Morgan, thank you so much for joining the podcast.
0:29:24 This was a pleasure.
0:29:26 I learned some things, which I knew I would.
0:29:27 We talked before we started.
0:29:30 Banking’s not my wheelhouse, so I appreciate that.
0:29:30 Thank you.
0:29:32 But more so, it’s just kind of– it’s always
0:29:36 fascinating to talk to somebody who’s a leader in their field
0:29:39 and has been living and breathing it for long enough to–
0:29:42 we’re talking about world-changing technology,
0:29:44 but there are deeper things that have been around for a while
0:29:48 now that are really important to shaping your perspective.
0:29:50 So your perspective is greatly appreciated.
0:29:52 Thank you.
0:29:55 [THEME MUSIC]
0:29:59 [THEME MUSIC]
0:30:02 [THEME MUSIC]
0:30:06 [THEME MUSIC]
0:30:09 [THEME MUSIC]
0:30:13 [THEME MUSIC]
0:30:17 [THEME MUSIC]
0:30:20 [THEME MUSIC]
0:30:24 [THEME MUSIC]
0:30:28 [THEME MUSIC]
0:30:31 [THEME MUSIC]
0:30:35 [THEME MUSIC]
0:30:38 [THEME MUSIC]
0:30:42 [MUSIC FADES OUT]
0:30:52 [BLANK_AUDIO]
AI is transforming banking by providing hyper-personalized services and real-time insights, enhancing customer experiences and ensuring robust data security. Barb Morgan, chief product and technology officer at Temenos, shares her expertise on how AI is transforming the banking landscape.
-
Tara Chklovksi, Anshita Saini on Technovation Pioneering AI Education for Innovation – Episode 245
AI transcript
0:00:10 [MUSIC]
0:00:13 Hello, and welcome to the NVIDIA AI podcast.
0:00:15 I’m your host, Noah Kravitz.
0:00:21 On October 17th of 2024, NVIDIA was honored to host the Technovation World Summit
0:00:24 finalist pitch and awards ceremony in Santa Clara, California.
0:00:28 The summit featured more than 50 young innovators from around the world,
0:00:30 pitching tech solutions for sustainable development goals.
0:00:33 And it was a celebration of what happens when the creativity and
0:00:37 resilience of girls meets the transformative power of technology.
0:00:41 Here to talk about the World Summit, Technovation, and
0:00:44 the incredible work young women are doing across the globe to tackle our
0:00:47 biggest problems are Tara Choklovsky and Anshita Sani.
0:00:53 Tara is the founder of Technovation and a repeat guest on the podcast,
0:00:56 which I’m terribly excited about, we’ll get into that in a second.
0:01:01 And Anshita is an alumna of Technovation and currently works as a member of
0:01:06 the technical staff at Open AI, an AI company some of you might have heard of.
0:01:09 Tara, Anshita, thank you so much for taking the time to join the podcast.
0:01:12 Welcome, congratulations on the awards summit.
0:01:14 That was a few months ago.
0:01:16 So I hope your new years are off to a great start and welcome.
0:01:17 >> Thank you, Noah.
0:01:21 NVIDIA has been such a long-term supporter of Technovation that I’m so
0:01:25 happy to be back and so happy to share our stories along with.
0:01:27 An alumna we’re super proud of, Anshita.
0:01:31 >> Well, I am, like I said, Tara, I’m thrilled to get to talk to you again.
0:01:35 And I’m equally thrilled to meet you, Anshita, and to hear about the work you’ve
0:01:35 been doing.
0:01:37 But let’s start, Tara.
0:01:40 You were on the show about five years ago, Time Flies.
0:01:44 At that time, you were the founder of a nonprofit called Iridescent.
0:01:47 You were described, we use this quote from somebody else, not our words.
0:01:50 But you were being described as a pioneer empowering the incredible tech girls of
0:01:51 the future.
0:01:52 Seems like prescient words.
0:01:57 What have you been up to since then and how did it all lead to now to starting
0:01:59 Technovation and us getting to talk to you again?
0:02:00 >> Yeah, thank you.
0:02:03 I think I sort of see Iridescent initially.
0:02:07 That was the name of the organization, the nonprofit that I started almost 19 years
0:02:12 ago with the goal to empower underrepresented communities with the
0:02:18 skills that they could be the innovators of technologies that solve the problems
0:02:22 they face so they didn’t have to wait for a savior.
0:02:28 And so I think when NVIDIA first supported us back in 2018, I could sense
0:02:33 that AI was going to be the most powerful revolutionary force.
0:02:37 And it was interesting, like we started to do that actually in 2016.
0:02:42 And I was talking to people and I was like, we need to revamp our curriculum to
0:02:43 focus on AI.
0:02:47 And nobody was talking about that, especially not in the education space.
0:02:51 And everybody would be like AI and underserved communities don’t go in
0:02:53 the same sentence.
0:02:57 And NVIDIA was one of the first companies to actually pay attention to that.
0:03:01 And they said that, okay, we will fund this program.
0:03:06 And so we launched the first AI family challenge back in 2018, 2019.
0:03:09 And it was the first global AI education program in the world.
0:03:16 And everybody, there was so much negative feedback that I got that this will not work.
0:03:17 People don’t care about it.
0:03:21 And we actually opened it up and we went into communities.
0:03:26 I saw we had a refugee community in Somalia participate in the AI family challenge.
0:03:30 We had low income communities from Michigan participate families.
0:03:33 And they all came to me and they were saying that, thank you for teaching us
0:03:37 about how these things work because we see it in our phones, we see it everywhere.
0:03:41 But nobody ever bothered to tell us about how it works.
0:03:44 And with NVIDIA, we did these three design challenges.
0:03:49 That was literally in 2019 on how a self-driving car works, how a neural network works,
0:03:53 and how parallel processing works, like using hands-on experiments.
0:03:56 Those videos are still some of our top videos of all time.
0:04:03 Because guess what, neural networks, people actually know what LLMs stand for and all that kind of stuff.
0:04:08 So I’m just so excited that NVIDIA had such a huge role to play in pioneering.
0:04:11 Not just AI, but also AI education.
0:04:15 And then when we were looking at the data of what works, what doesn’t work,
0:04:19 the program that was really coming to the forefront that has long-term impact on
0:04:24 participants’ identity was this technology model of girls going through an accelerator.
0:04:29 A three-month accelerator where you find problems that you care about and actually build AI solutions
0:04:34 or mobile app solutions and pitch to investors and have a demo day.
0:04:39 And the whole experience of working in a team, being the founder, is just transformational.
0:04:44 And so we changed the name of the organization from iridescent to technology.
0:04:47 We cut all our other programs and just focused on this accelerator.
0:04:52 And then everybody else started to recognize AI was a thing.
0:04:57 And then in 2022 when ChatGPT came out in November 2022.
0:05:02 So we were the first education organization to remote and we developed a whole curriculum
0:05:06 for our girls to say, this is how you use ChatGPT to help with identification,
0:05:08 with help with developing your code.
0:05:15 And then we had thousands of girls in December 2022 using ChatGPT to sort of come up with,
0:05:16 help them in their accelerator journey.
0:05:18 And since then, of course, I took.
0:05:24 I’m just remembering so vividly that this was around the time or was the time when a lot of
0:05:29 school districts and educators, and I mean, no shade in saying this, were really struggling.
0:05:30 Well, what do we do with this?
0:05:33 We need to ban it so we can figure out how people won’t cheat.
0:05:37 And here you are saying, no, no, no, no, let’s people want to know how this works.
0:05:37 Let’s embrace it.
0:05:38 Let’s use it.
0:05:39 And yeah, absolutely.
0:05:44 So, you know, I remember going to a local school district here in Silicon Valley and telling
0:05:48 them, OK, you’ve got to use ChatGPT and they were like, what is this?
0:05:52 And so they’re like, oh, it blows their mind.
0:05:54 And they’re like, we need to get permission from.
0:05:56 I’m like, it’ll be too late, right?
0:06:00 Get this, have the girls use it and then you can deal with the consequences later on.
0:06:03 But anyway, so and then I mean, there’s more and more information.
0:06:06 About how AI is going to revolutionize the world, right?
0:06:11 And so I was like, the problem of not having enough people developing these
0:06:15 technologies is only going to get bigger because these technologies are moving so fast.
0:06:19 And so we launched the AI Forward Alliance with UNICEF as a core partner.
0:06:24 And the goal was, and the goal still is to empower 25 million innovators, AI
0:06:28 innovators, and especially females, because guess what?
0:06:32 Like globally, there are only, I think, 18 million tech professionals in the world.
0:06:35 Only 14 million men and only 4 million women.
0:06:37 I do this all the time in the show.
0:06:38 And but it’s genuine.
0:06:39 I’m just not a big stats person.
0:06:43 So I apologize, but unfortunately, the 14 four split doesn’t surprise me.
0:06:45 It’s yeah, we need to work on that.
0:06:47 Yeah, goes without saying.
0:06:48 But is that only 18 million?
0:06:50 Wow. OK. Yeah, it’s less than half a percent.
0:06:55 So the reason is so I did this study with the ILO workforce data and we went country
0:06:57 by country by country, whichever country has data.
0:07:01 And I think the reality is like, I mean, if you look around, maybe not in Silicon Valley,
0:07:05 but if you look around, most people are employed in the service sector in the world.
0:07:09 Right. And so people who can actually build technology, they’re very few.
0:07:13 And so I think if countries need to progress economically, they actually
0:07:14 need to have a bigger talent base.
0:07:16 And where are you going to get your talent base?
0:07:17 Happen to the one that’s not part of it.
0:07:21 Right. So so that’s that’s that’s where the AI forward alliances.
0:07:24 And last year, we were just so we had like record numbers.
0:07:28 Like I think we had like 35,000 girls participate in the Technovation Program.
0:07:32 And in the finalists, it was awesome to be able to bring them to Nvidia.
0:07:36 I was and I’m now feeling it again, so bummed that I couldn’t make it down in person.
0:07:38 But by all accounts, it was a terrific event.
0:07:42 Do you want to tell us about some of tell us about the summit?
0:07:44 Do you want to get into some of the pitches?
0:07:46 Yeah, I can. It’s just interesting, right?
0:07:50 Like when you give technology to people, especially young people,
0:07:53 that’s when you see like the limits of the points of breakage, right?
0:07:56 Track technology, but also where innovation comes.
0:08:01 And I think what I’ve always seen is like young people, especially girls,
0:08:05 because you underestimate them all the time, they just have such interesting ideas.
0:08:06 So I’ll give an example.
0:08:11 In 2012, I had a team of girls come up and create a mobile app
0:08:14 that was around focusing and mental health.
0:08:18 This was 13 years before this whole mental health and wellness.
0:08:20 And I could not understand that.
0:08:24 And I was like, why on earth would you want an app that is blocking other apps?
0:08:28 Because this increases productivity and I actually give them negative feedback.
0:08:32 OK, because I was it was so against what I was seeing.
0:08:36 Yeah, but they were like, no, this is really messing with my productivity
0:08:39 because I’m not so addictive and it’s messing with how I feel.
0:08:41 And I was like, wow, right?
0:08:44 Like this was literally 13 years before the adults caught up.
0:08:46 And so it’s a similar thing like this year.
0:08:51 We had such interesting solutions where we had some Vietnam,
0:08:56 which are about preserving their cultural heritage through lullabies
0:08:58 and putting them into large language models.
0:09:01 Also sort of Vietnamese facial expressions,
0:09:04 because there are a lot of the facial expression recognition systems
0:09:06 don’t recognize Vietnamese facial expressions.
0:09:09 So they did their own training of data sets.
0:09:12 And so I see sort of the technology community as like
0:09:17 bringing last mile data to large language models or large models
0:09:20 and helping them become more representative of the world.
0:09:23 As we record this, the episode that just went live
0:09:27 was an episode with the founders of a company called GUI AI,
0:09:31 and they are doing conceptually, to me, very similar things and,
0:09:36 you know, feeding localized data around populations, communities,
0:09:39 specific problems, challenges to be solved into LLMs
0:09:43 to create these tools to actually serve the users in those places.
0:09:47 Just the whole idea of, you know, you’re wherever you’re from,
0:09:50 wherever you live, your community’s knowledge, your language,
0:09:53 your history, your stories, your best practices of doing the things
0:09:57 that you do to thrive and survive, you know, that’s just the most important data.
0:10:02 And so being able to use that positively to, you know,
0:10:04 so I think it’s an area that I’m hoping gets more and more exposure
0:10:07 because it’s really, I mean, it’s vital, it’s important.
0:10:09 And it raises the quality for everyone, right?
0:10:11 Because the large language model becomes better.
0:10:14 Just it’s not boring anymore, it doesn’t show in this corporate.
0:10:15 Yes. Yes.
0:10:18 I think I remember one long researcher was saying his wife
0:10:20 calls chatty-pity-chatty boy.
0:10:23 Oh, I’m sorry I took that one.
0:10:25 But I think those are the things that we need to change, right?
0:10:28 And I think that’s what this community is thinking to change, so.
0:10:32 I’m speaking today with Tara Chaklowski and Sheeta Saini.
0:10:36 We were just speaking with Tara about her organization Technovation
0:10:39 and the origin story from when we had Tara.
0:10:42 I didn’t say this earlier, I should have, but it’s episode 85.
0:10:46 If you want to go back in the archives and listen to Tara’s first appearance
0:10:49 on the show, Teaching Families to Embrace AI.
0:10:53 And we’ve been talking now about Technovation and the origin story
0:10:58 and the mission of it to, you know, really unlock the innovators of that we need today.
0:11:03 We certainly need going forward in the future and girls, females, women in particular.
0:11:08 And so now and Sheeta would love to hear your story of how you got involved with Technovation.
0:11:12 You don’t have to be the sole spokesperson for the Technovation experience,
0:11:16 but I really delighted to have you here with us today and would love to hear your story.
0:11:20 Yeah, for sure. So I started getting involved in Technovation through,
0:11:23 I think just serendipity, really lucky to have found the program.
0:11:27 But I had just taken my first computer science class in high school.
0:11:30 That was just AP computer science.
0:11:34 And we had done a lot of fun projects in that class that sort of showed me what coding meant,
0:11:40 what it means to create a program using, you know, just your laptop and your fingers.
0:11:44 And so from there, I was thinking, all right, this is really cool,
0:11:46 but what can I actually do with this?
0:11:48 Right? We made programs like Boggle and stuff like that.
0:11:53 But I wanted to see if I could do something actually useful and in the real world with it.
0:11:56 And so I was looking around for different opportunities to that end.
0:12:01 And that’s how I came across Technovation and that whole mission of empowering girls
0:12:05 to solve problems in their community was something that really resonated with me.
0:12:09 And so I actually ended up starting a club in my high school Technovation Club.
0:12:13 And we were just bringing together a bunch of girls from across our high school
0:12:17 that were interested in this program and interested in finding a problem to solve.
0:12:21 And so one of the first problems that I tackled through Technovation
0:12:25 actually ended up being really personal to me and really personal to the situation
0:12:28 that was going on in my high school at the time that was vaping addiction.
0:12:33 And so this was happening at my high school, I think maybe 2018, 2019.
0:12:37 And in fact, I think the situation became so bad
0:12:40 that we actually had the doors taken off of the bathrooms at my high school.
0:12:44 And so when I sat down at Technovation to think about problems
0:12:47 going on in my community with a group of friends, we said, look,
0:12:48 this is going on right around us.
0:12:53 What can we do with computer science with technology to help this situation?
0:12:59 And so we ended up going through a lot of research from different Johns Hopkins studies.
0:13:02 We had mentors actually from Microsoft that helped us
0:13:04 through the process of building this app.
0:13:09 And I think at the end of it, it’s just really empowering to have that concept of,
0:13:13 hey, like Tara said, I can look at these problems and I can solve them.
0:13:17 I don’t have to wait for somebody else to jump in and solve this problem.
0:13:19 And so that’s what’s got involved with Technovation.
0:13:22 Amazing. I have to know, though, what was the app?
0:13:24 What was the solution that you came up with?
0:13:26 Yeah, so we, again, looking at those different research studies,
0:13:29 we came up with sort of a couple of different approaches.
0:13:31 So I shouldn’t say solution.
0:13:33 What’s the approach? That’s the question. Yeah.
0:13:36 Yeah, there was a couple of different features that we had in our app.
0:13:37 One thing was pretty simple.
0:13:41 It was a craving control timer, which actually was to be remarkably effective.
0:13:45 So setting a 10 minute timer when you have the urge to pick up a vape
0:13:49 or pick up some addictive thing is very helpful to curb that craving.
0:13:53 We would also take a user’s profile of interesting hobbies,
0:13:57 other things that they like to do and suggest those at the time of the craving
0:14:01 control timer. Right. So you’re just like to rail the craving into something positive.
0:14:03 Yeah, so simple, but it’s so hard for humans.
0:14:07 Yeah, yeah, I think I still need that to write with my phone.
0:14:12 So yeah. And then another aspect that we had was trying to create
0:14:15 a personalized quitting timeline that you could visualize along a graph.
0:14:19 We had another chat system so that you could chat with your support system
0:14:20 that was, you know, helping you through this process.
0:14:23 I think addiction is not just something that you can solve in an app.
0:14:27 It’s something where you need people, you know, a support system in community.
0:14:29 So. So a chat system hooked up to humans.
0:14:31 Yes, exactly. Yeah, amazing.
0:14:34 Yeah, with your friends, they would be on the app as well to, you know, got it.
0:14:36 OK, right, right, right.
0:14:38 That sounds amazing and not to diminish it all.
0:14:40 But you’re still in high school at that point.
0:14:42 Now I make sure we have time to keep going forward.
0:14:46 So that was your first Technovation experience, building that up incredible.
0:14:50 And so you’re in high school, you’re 14, 15, something like that.
0:14:52 Maybe how does the story progress?
0:14:56 Right. So I think, again, after Technovation, it was I think that was sort
0:14:59 of the aha moment in computer science for me, where it wasn’t some gimmick.
0:15:02 It’s it’s something that people are using every day to solve these problems.
0:15:05 And so I wanted to go further down that path.
0:15:08 And especially I was interested that in that time in these health things,
0:15:11 after after building this vaping app, I was wondering what what are other problems
0:15:14 that I can go after in this, you know, mobile health care space.
0:15:17 Yes, well, I ended up going out for an opportunity that
0:15:21 well, I never thought I would land it, but it was this research internship
0:15:24 at the University of Washington in Seattle, in Seattle. Yeah.
0:15:27 I’ll go Huskies, my wife’s an alum.
0:15:29 Oh, really? OK. Yeah, I love you, Doug.
0:15:32 And so, yeah, so I applied to this opportunity at the ubiquitous computing
0:15:36 lab, it was called at the University of Washington, and they had a lot of
0:15:39 different projects around solving what seemed to me at the time really big,
0:15:43 complex problems with the help of computer science or just a mobile phone
0:15:44 or something like that.
0:15:47 And so the project that I ended up working on that summer at that lab
0:15:54 after Technovation was essentially being able to detect, count and classify
0:15:57 different exercises with just a phone’s microphone.
0:15:59 And so that sounds kind of hard to wrap up.
0:16:04 So people doing physical exercise, but you’re using the mic, not the camera,
0:16:08 just the mic to detect what they’re doing and then do something with that information.
0:16:10 Exactly, right. That sounds super cool.
0:16:14 Yeah, it was I think it was really cool from a technological perspective,
0:16:16 but also for who we were solving the problem for.
0:16:19 Before you get into that, can I just ask, because again, I got it,
0:16:21 it’s the NVIDIA podcast I got asked.
0:16:25 So were you like bouncing sound waves off from the microphone or?
0:16:30 Yeah, exactly. The whole detection and everything just came from like the Doppler effect.
0:16:31 Yeah, yeah, yeah. Yeah, that’s so cool.
0:16:35 When you’re doing exercises, the waves just reflect back to the phone in a different way.
0:16:40 And so if we just play a really high frequency sound, we can’t even hear it, right?
0:16:43 But the phone is able to pick up the reflection of the sound back to the phone.
0:16:48 And so from there, you can actually see really interesting patterns in the sound waves to see,
0:16:52 like, for example, doing like arm raises is different than doing bicep curls.
0:16:52 And those show up.
0:16:58 Yeah. And so we were solving this problem for some populations
0:17:03 that aren’t typically addressed by normal trackers like the Fitbit or something like that.
0:17:08 And so, you know, obese, disabled populations are often not able to use a Fitbit
0:17:13 for exercise tracking because they’re just prescribed different exercises by their doctors.
0:17:17 For example, like I mentioned, the bicep curls with cans or something like that,
0:17:20 or elderly populations as well.
0:17:23 And so we wanted to create this tracker to solve this problem for people
0:17:26 that current solutions just weren’t accounting for.
0:17:29 And yeah, we ended up building this really cool app.
0:17:33 We did a research study where a bunch of us, including myself, went in
0:17:35 and we were doing a bunch of these exercises to collect data.
0:17:37 And that was my first introduction to AI.
0:17:41 We built a classification system that could classify, I think,
0:17:45 around 14 or 15 different exercises at pretty high accuracy, 80 to 85 percent.
0:17:49 And we were also able to count the number of those exercises.
0:17:52 So you get this really holistic tracker where you don’t have to select
0:17:55 the exercise that you’re doing. It’ll write a note. Yeah. Yeah.
0:17:58 So this was your first exposure to AI per se.
0:18:00 That’s such an umbrella term these days, but that’s fine.
0:18:03 Were you intimidated?
0:18:08 Yeah, I really did not know what AI meant or that you could just implement it with,
0:18:11 you know, maybe a hundred, a little over a hundred lines of code, right?
0:18:13 Like that’s that is a crazy concept.
0:18:14 Yeah, yeah, yeah.
0:18:18 System that actually learns with just, you know, by yourself.
0:18:21 And then what was the process of actually, I mean, you said, you know,
0:18:22 one of the things, right?
0:18:24 You’re like, it’s so few lines of code and it’s doing the stuff.
0:18:29 Once you got going, once it was sort of, you know, somebody in the program,
0:18:32 the program itself, whatever it was, kind of showed you how it worked, right?
0:18:33 Took you behind the curtain.
0:18:36 What then? Was it steep learning curve?
0:18:37 Was it sort of natural? Was it?
0:18:41 I’m just kind of curious of the perspective, you know, because when I was in high school,
0:18:43 I took AP Computer Science in high school.
0:18:47 And I think I wrote code on like with a pencil on like one of those exam books
0:18:49 for like my AP final, right?
0:18:52 So different world and imagining like trust my eye.
0:18:54 No, different world for sure.
0:18:56 I think there was a little bit of a steep learning curve for me.
0:19:00 I think that that idea of something, a computer learning was just something
0:19:02 that was so foreign to me, right?
0:19:03 Is yeah, no, it’s still it.
0:19:05 Yeah, yeah, exactly.
0:19:07 It still is. But that’s to you to me.
0:19:10 I think the idea of, you know, how much data we need to train
0:19:13 of a reasonably confident model was was really surprising to me.
0:19:16 You know, we had so many people come in and do so many different repetitions
0:19:20 in these exercises and needed all that for the system to be able to learn.
0:19:23 And so I think that was really an interesting moment for me because I realized,
0:19:25 you know, OK, yeah, I guess that makes sense.
0:19:29 If you have so many different examples of what’s wrong and what’s right,
0:19:33 maybe it makes sense that a computer can learn to identify those patterns
0:19:35 and and predict them in the future.
0:19:37 And so those are sort of some of the initial learnings I had
0:19:39 that sort of helped me wrap my head around this concept.
0:19:42 But I think now, of course, that something from a classification system
0:19:44 to what we’re doing today is is crazy.
0:19:49 Oops. Yeah. So so not to do that time leap here, but for the sake of the episode.
0:19:52 So this project, the project with the exercise tracker,
0:19:55 you were still in high school and undergrad.
0:19:56 Yes, I was in high school.
0:19:59 That was my right after my first experience. Right.
0:20:00 Right after activation. OK.
0:20:03 And so then you studied computer science in college.
0:20:05 Yeah, yeah. I think after all these experiences,
0:20:07 it just really felt like natural path for sure.
0:20:11 Yep. And did you have a focus on machine learning or I mean, what in 60 seconds?
0:20:13 What’s computer science in college like these days?
0:20:17 There’s still a lot of the core education around data structures,
0:20:19 architecture, so I had a lot of exposure to different things.
0:20:24 My honors thesis ended up being around AI and actually an open AI model.
0:20:27 The clip model was what I was working with for image retrieval.
0:20:30 And so that was sort of my focus towards the end of college.
0:20:31 Yeah. Gotcha. OK.
0:20:33 And how long ago did you graduate?
0:20:36 Just last May, so it’s been like a little less than a year.
0:20:37 Yeah. Wow. All right.
0:20:40 So in the workforce, where are you up to now?
0:20:44 Technovation, all these experiences have led to you’re in the thick of it.
0:20:45 You’re working at open AI.
0:20:47 I am working at open AI.
0:20:48 Yes, I’m an engineer on chat.
0:20:50 GPT growth at open AI.
0:20:52 And so it’s a really cool job.
0:20:55 It’s all about our growing, our free and paid user base.
0:20:57 And that ends up being really broad in practice,
0:21:00 but it is a really, really cool job to be at.
0:21:02 I don’t know. It’s so new.
0:21:05 But in a way, I think that makes your your perspective really interesting.
0:21:08 How similar, how different is it now, you know,
0:21:11 doing the stuff, computer science and research and engineering
0:21:15 at a job versus, you know, all of these educational experiences
0:21:16 kind of kind of leading up to it.
0:21:18 Is it is it vastly different?
0:21:22 Does it feel like the next logical step sort of on the on the journey of,
0:21:25 you know, kind of familiar, lots new, but learning and growing?
0:21:27 What’s it what’s it like being out in the workforce?
0:21:30 I think it’s sort of required a mind mindset shift for me.
0:21:33 I think when it comes to college and internships,
0:21:36 there is sort of this concept of some end goal, right?
0:21:38 For so in college, it’s like, let me get through these classes
0:21:40 and I can graduate. I want to learn a lot.
0:21:41 I want to meet my friends.
0:21:43 I want to meet mentors and things like that.
0:21:46 But I think when it comes to working in the industry for me,
0:21:49 the mindset shift was now I’m thinking about what is the impact
0:21:50 that I can have on the world, right?
0:21:52 And how can I maximize that?
0:21:55 And before that, that that was sort of in the back of my head,
0:21:56 but not not really at the front.
0:21:59 And so now it’s sort of what I’m thinking about all day, every day, right?
0:22:01 I don’t have homework and assignments
0:22:05 and all these things going on to to distract me from that from that goal.
0:22:07 And so I think that shift was really interesting.
0:22:12 And that was especially why I was interested in in a team like ChatGBT
0:22:13 growth at OpenAI.
0:22:17 I think one of the things that is really important to me is making sure
0:22:21 that it’s not just people in Silicon Valley using ChatGBT to help their daily lives.
0:22:25 It’s people in, you know, corners of the world where maybe they don’t have
0:22:29 a super powerful mobile phone that can even support an internet connection
0:22:33 for ChatGBT, but maybe they can message ChatGBT through WhatsApp
0:22:37 and get the messages that and and get help and information
0:22:38 and whatever it is that can help them in that way.
0:22:42 And so I think that that sort of shift in mindset from from college,
0:22:45 from learning to working has really helped me shape my goals
0:22:48 and and just feel passionate about about our mission
0:22:50 and what we’re doing with with the product like ChatGBT.
0:22:51 That’s amazing.
0:22:54 I mean, it’s not it’s not my place to say if this makes sense,
0:22:56 but that’s an asset to the world to have that mindset.
0:22:58 So that’s fantastic.
0:23:03 What advice would you give to girls in high school and college
0:23:07 who, you know, are just into the field or interested in computer science
0:23:10 or maybe, you know, the teenager who’s thinking about it
0:23:13 and getting their first exposure to to working with some of the tools?
0:23:15 Oh, this is kind of cool. Where can I go with this?
0:23:18 Any advice? Any maybe learnings from your journey?
0:23:21 Yeah, I think for my journey, I think one of the things I can say
0:23:24 is building confidence early and not being scared of failure.
0:23:28 I think I probably never would have discovered AI if I hadn’t had
0:23:31 I guess the audacity to go after that first research in a new chip
0:23:33 and say, hey, maybe this is a room I belong in.
0:23:36 And, you know, that was not something I felt when I was applying.
0:23:40 I was talking about this boggle program that I built in my computer science class.
0:23:44 I was like, how am I about to jump from this to searching in a lab
0:23:45 and building things for real people?
0:23:49 But I think having that courage to go out and say, I can do this.
0:23:52 That this is an opportunity that is for me as much as it is for anyone else
0:23:55 is something that’s really important to help you to find your path.
0:23:58 You open doors that you never knew were even there.
0:24:01 I think along with that, what goes into building confidence
0:24:05 is finding mentors and a community that can support you in in these things.
0:24:08 And so for me, that came through Technovation, right?
0:24:12 Even now in the Bay Area when I’m back, that community is still there supporting me.
0:24:17 And, you know, just meeting amazing other alumni in the Bay Area was such a surprise for me.
0:24:21 I didn’t know that it would last, that the community would last long for me.
0:24:23 Communities, women, engineers in college, right?
0:24:26 Meeting meeting other people that are doing similar things to you
0:24:28 is the most inspirational thing that you can find, right?
0:24:30 It’s right. If they can do it, I can do it too.
0:24:35 Yeah. And mentors, of course, as well, so many helpful mentors along my journey
0:24:38 had helped me figure out what’s right for me and, you know, wrap my head around
0:24:40 moving from college, going into the industry.
0:24:43 And I think that’s really important to to have some people that have been
0:24:46 through the same things that you’re going through and getting advice
0:24:50 or just even just talking through what you’re going through.
0:24:52 Yeah, absolutely. What is Wiser AI?
0:24:57 Yeah, so Wiser AI is an organization, loosely, that I’ve recently been working on.
0:24:59 It’s it’s more of an initiative.
0:25:02 Just I think what I noticed when I first started working in AI,
0:25:07 even in research and college was we have done so much work to sort of uplift
0:25:08 women in technology.
0:25:12 And there was a big focus, I think, on on that gap when I was in high school
0:25:13 and even middle school.
0:25:15 And I was really seeing feeling that around me.
0:25:20 But as I got into AI and research, I was noticing these are all really,
0:25:21 really male dominated rooms.
0:25:25 And for some reason, I just wasn’t finding as many people talking about
0:25:28 women in AI and underrepresented folks in AI.
0:25:33 And when I first landed the opportunity at Open AI, there was a lot of chatter
0:25:36 about, you know, how quickly AI is moving, you know, the exponential curve
0:25:40 at which these models are performing on these really, really powerful benchmarks.
0:25:44 And that really got me thinking, if we’re building this really powerful
0:25:48 technology that is meant to benefit everyone, how are we going to build that
0:25:52 if we don’t have everyone’s voices involved in developing the system?
0:25:57 And so Wiser AI is an initiative targeted towards that and in bringing
0:26:00 more underrepresented voices in AI, specifically women.
0:26:03 One of the things that I’m really interested in doing with this initiative
0:26:05 is supporting underrepresented founders in AI.
0:26:09 I think that’s a really interesting space where the progress in terms
0:26:13 of like female founders and, you know, the percentage of females founders
0:26:18 supported by investments or even just the percentage of like all women founded
0:26:20 companies is really, really surprisingly low.
0:26:23 I think that was an interest of mine at one point, right, becoming a founder.
0:26:26 And it’s it’s really surprising to see that that’s still.
0:26:32 Yeah, it’s it’s so unreflective of the actual, you know, population
0:26:34 of people walking around the planet. It’s yeah.
0:26:35 And so you started Wiser.
0:26:37 Yes. Yeah. Yeah.
0:26:38 Bringing together a team right now.
0:26:41 I think one of the really big things that we want to do this year and sort
0:26:45 of the big target for the organization this year is to actually hold a
0:26:49 conference, bringing together underrepresented founders in AI, seeking
0:26:52 people that can talk about what it means to build responsible AI and how
0:26:54 we can incorporate that into our systems.
0:26:58 And yeah, just just helping bring together a community that can support each other.
0:27:03 Amazing. So, Tara, I want to sort of welcome you back for this kind
0:27:06 of last sort of wrap up thing and ask both of you.
0:27:09 And I feel like this is one of those questions that like shouldn’t need
0:27:13 to be asked out loud and answered because it should be obvious, but it’s not.
0:27:16 Why is is the work you’re doing so important?
0:27:21 And why is it so important to work for equitable and inclusive AI right now?
0:27:25 You’ve talked about it throughout the episode, but to kind of put a point
0:27:28 on it and really bring it to bear, why is this so important right now?
0:27:33 As I was saying, like the large language model has been built on the majority,
0:27:37 right, like the majority language on the Internet, which may not
0:27:39 necessarily reflect the reality.
0:27:44 And so you actually get boring results because it’s the generic
0:27:46 very, very the lowest baseline, right?
0:27:52 And so what’s the fastest way to make more innovative products, right?
0:27:53 Is to open it up, right?
0:27:55 And I think there could be many ways of doing it.
0:28:00 I think one obvious way is look, 70 percent of the developers
0:28:04 are of a particular background and from a particular cognitive perspective,
0:28:08 right? And so when you have different people around the room, I mean,
0:28:12 it just, it came from India last night, and it’s fascinating, right?
0:28:15 Like just going to a different place and understanding how people
0:28:19 in different cultures think it just increases your sort of scope, right?
0:28:21 So it’s just basic stuff.
0:28:26 So I’m not big on business axioms per se, but isn’t there just a basic guideline
0:28:29 of sort of product development that, you know, the more that what you’re developing,
0:28:32 the more that people building the thing sort of reflect the people who are going
0:28:36 to be using it, the better off you’re likely, you know, the thing you’re
0:28:37 making is likely to be now.
0:28:41 Absolutely. And the market is beyond the U.S., right?
0:28:44 No, I mean, this is, this is everything you’re building for the world.
0:28:48 So to me, it’s just a very basic common sense argument that you want
0:28:51 a diverse group of people innovating.
0:28:57 And I will say that the challenge is the big one because women represent
0:29:00 I think the largest group of people that have been historically
0:29:03 discriminated against as a whole, right?
0:29:07 So I think 151 countries legally discriminate against women.
0:29:10 And so social norms are very deeply rooted, right?
0:29:13 And I think men and women are extremely different.
0:29:17 So I think that the conversation can be very nuanced and subtle.
0:29:22 But from an economic point of view, I think you are missing out if 50%
0:29:27 of the population isn’t represented in your software development teams.
0:29:30 You know, it’s not enough to say we have equal representation
0:29:32 in sales or HR or whatever.
0:29:34 I think it has to be in the software development team.
0:29:39 So I think everybody will benefit when the products are less boring.
0:29:44 Yep. Anshita, in many ways, this whole conversation, particularly you and I
0:29:47 speaking, you talking about your work in the past several minutes, you know,
0:29:51 address this question, but specific to talking about the importance of working
0:29:54 for Equitable Inclusive AI right now, anything you’d like to add?
0:30:00 Yeah, I think overall we’ve already seen the effects of building bias systems.
0:30:03 In systems like predictive policing or in health care.
0:30:07 We know there’s been many studies about how this affects
0:30:09 disproportionately affects underrepresented populations.
0:30:14 I think at the rate at which AI is developing right now, these systems
0:30:18 are going to be everywhere, not just in health care, but also in finance,
0:30:19 hiring everywhere.
0:30:23 And so I think it’s just we’re at a crucial point where if we don’t
0:30:26 start doing that work towards making these systems inclusive, when they’re
0:30:28 everywhere, it’s hard to take that back.
0:30:30 I think it needs to happen now.
0:30:35 For listeners who want to learn more, want to get involved,
0:30:39 maybe want to see if they can join Technovation, any and all the above,
0:30:44 all the great work you’re both doing, where can folks go online to learn more?
0:30:47 Again, I’ll plug Episode 85, Tara, your first appearance on the pod.
0:30:51 But for more information about Technovation, Tara, where can people look?
0:30:54 Yeah, I would say the Technovation season is ongoing right now and you’re
0:30:59 actively looking for mentors, so and you don’t need to have a technical
0:31:02 background to be a mentor, because what you need to bring to the table is
0:31:07 just this courage to say, I don’t know, let’s go find out together.
0:31:08 Because guess what?
0:31:11 You probably don’t know because these things are changing so fast, but it’s
0:31:15 an incredible experience where you’re project managing and cheerleading
0:31:18 a team of girls to actually build an AI prototype within three months.
0:31:23 So it’s an amazing way to actually go through an accelerator yourself.
0:31:27 We’ve had stories where mentors have applied to my combinator and gotten in
0:31:30 after they went through Technovation because Technovation gave the mentors
0:31:32 the courage to start their own business.
0:31:36 So I’d say 95 percent of the time when girls match with mentors,
0:31:37 they finish the full program.
0:31:41 So mentors are the one of the most important pillars for our program.
0:31:43 So please sign up at Technovation.org.
0:31:45 Technovation.org?
0:31:45 Yeah.
0:31:46 OK, great.
0:31:48 And Chita, is there wiser?
0:31:50 Can people find out more about wiser online?
0:31:54 Yes, and I’m actively looking for women to work with to organize this
0:31:56 conference that we have a goal of having this year.
0:31:57 Do you have a location in mind?
0:32:00 It might have to be around the Bay Area just because of production,
0:32:04 like maybe in the Bay Area, but it’s still still working on a location.
0:32:06 Honestly, so open to any suggestions.
0:32:10 But we would love to have more talented, amazing, passionate women to work with.
0:32:16 They could reach out directly or wiser.ai.org is the website.
0:32:17 Excellent.
0:32:19 Well, again, thank you both so much, Chita and Chita,
0:32:23 for taking the time to come on the podcast and share the work you’re doing.
0:32:28 It’s just really great, inspiring work to learn about and for people to get involved with.
0:32:32 And, you know, anybody out there who’s considering maybe being a mentor,
0:32:34 if you’ve never had that experience, I haven’t worked with Technovation,
0:32:38 but I’ve done other related things and it’s the best.
0:32:41 And not only that, but you get to go through an accelerator yourself.
0:32:42 It’s kind of hard to beat, right?
0:32:43 Let’s do it again.
0:32:44 Let’s catch up down the line.
0:32:48 But it was an absolute pleasure to meet you and Chita and all the best to both of you
0:32:49 and the work you’re doing.
0:32:50 Thank you, Noah.
0:32:51 This was a pleasure.
0:32:52 Yeah, thank you so much, Noah.
0:32:53 And thank you, Tara.
0:32:56 It’s just so inspiring to hear about the work that you’re continuing to do
0:32:58 and knowing how much of an impact it’s already had on me.
0:32:59 So thank you.
0:33:00 That made my day, right?
0:33:03 Like, listening to alumna and saying, OK, all the hard work
0:33:05 and your vision of improving the world.
0:33:07 Like, that’s what we work every day for, right?
0:33:11 So I wouldn’t have the energy to do what we do in the battles to fight
0:33:13 if it weren’t for the alumna, so it’s mutual.
0:33:16 [MUSIC PLAYING]
0:33:17 .
0:33:18 .
0:33:20 .
0:33:24 [MUSIC PLAYING]
0:33:27 [MUSIC PLAYING]
0:33:31 [MUSIC PLAYING]
0:33:34 [MUSIC PLAYING]
0:33:38 [MUSIC PLAYING]
0:33:41 [MUSIC PLAYING]
0:33:45 [MUSIC PLAYING]
0:33:48 [MUSIC PLAYING]
0:33:52 [MUSIC PLAYING]
0:33:55 [MUSIC PLAYING]
0:33:58 [MUSIC PLAYING]
0:34:02 [MUSIC PLAYING]
0:34:12 [BLANK_AUDIO]
In this episode of the NVIDIA AI Podcast, Tara Chklovski, founder and CEO of Technovation, returns to discuss the importance of inclusive AI. With Anshita Saini, a Technovation alumna and OpenAI staff member, Chklovski explores how Technovation empowers girls through AI education and enhances real-world problem-solving skills. Saini shares her journey from creating an app that helped combat a vaping crisis at her high school to taking on her current role at OpenAI. She also introduces Wiser AI, an initiative she founded to support women and underrepresented voices in AI.
-
AI for Everyone: How Gooey.AI Empowers Global Frontline Workers with Low Code Workflows – Episode 244
AI transcript
0:00:10 [MUSIC]
0:00:13 >> Hello, and welcome to the NVIDIA AI podcast.
0:00:15 I’m your host, Noah Kravitz.
0:00:19 Our guests today were recently featured on the NVIDIA blog for their work in
0:00:22 creating Ulangizi, an AI chatbot that delivers
0:00:25 multilingual support to African farmers via WhatsApp.
0:00:28 As vital a project as it is, however,
0:00:31 GUI AI is much more than a single chatbot.
0:00:33 GUI AI is a platform for developing
0:00:37 low-code workflows built on private and open source AI models.
0:00:41 Combining ease of use with innovative features like golden Q&As,
0:00:44 GUI enables developers to code fast and change the world.
0:00:48 Here to tell us the GUI story are the company’s founder and CEO,
0:00:52 Sean Blagsvet, and founder and chief creative officer, Archana Purcell.
0:00:56 Welcome to you both and thanks so much for joining the NVIDIA AI podcast.
0:00:57 >> Hello.
0:00:59 >> Hi, thanks, Noah.
0:01:02 >> So there’s a lot that I’m looking forward to you getting
0:01:05 into about the GUI platform, how it started,
0:01:06 all the things it can do,
0:01:11 including how you’re helping developers combat AI hallucinations,
0:01:12 which is a big topic these days.
0:01:15 But I’d love it if you can start at the beginning and
0:01:18 tell us what GUI AI is and how you got started.
0:01:20 >> Let me take a shot at that.
0:01:24 We could start it from actually a digital arts project funded by
0:01:26 the British Council many moons ago,
0:01:29 I’d like to say 2018, 2019,
0:01:33 where we applied to create an AI persona that would match,
0:01:36 make creators, activists,
0:01:39 designers from across borders of the UK and India.
0:01:43 We won that award, we built out a prototype,
0:01:45 we tested it, it worked beautifully,
0:01:47 and long story short,
0:01:52 we managed to get a seed fund from Techstars in the race.
0:01:55 >> We got into Techstars, which was an excellent program.
0:01:58 We took this idea of an AI persona and built
0:02:02 an entire communications app around it called Dara.network,
0:02:04 meant to service cultural organizations and
0:02:06 social impact organizations,
0:02:10 enable them to manage their alumni and keep in touch with each other easily.
0:02:14 The first AI persona we built also confusingly called Dara,
0:02:18 feeling lonely and wanted some friends and we thought,
0:02:22 wouldn’t it be great to invite non-tech folks,
0:02:24 writers, playwrights, all those poets,
0:02:26 could we have them come in and craft
0:02:29 their own AI personas from scratch?
0:02:31 >> This is right in the middle of COVID.
0:02:33 So all those are out of work, right?
0:02:35 >> They’re isolated.
0:02:36 >> Isolated, yep.
0:02:42 >> Yeah. So we invited 23 folks from across the UK,
0:02:45 the US, India, Sri Lanka even,
0:02:48 and met constantly literally every week,
0:02:50 and ended up developing pretty much
0:02:54 an underlying architecture that enable them to build out,
0:02:55 what one might call it.
0:02:57 >> Hiring task-tasking video box, right?
0:03:04 So our co-founder Dave was hanging out in Discord forums with,
0:03:06 what’s his name, Brockman,
0:03:08 with the president of OpenAI until I think it was-
0:03:09 >> Brockman, yeah, yeah.
0:03:11 >> Yeah, it’s like five years ago,
0:03:15 and then so we had really early access to the GPT APIs,
0:03:17 and we had already built Dara
0:03:20 as an asynchronous video messaging platform,
0:03:22 kind of like Discord plus LinkedIn
0:03:25 with a little bit of Mark Polo in there.
0:03:27 And so the thought was, well,
0:03:29 what if we took, it was kind of a wild idea,
0:03:32 like what if we took the video messages that people sent,
0:03:35 pulled that up with Google’s speech recognition,
0:03:39 fed that to a long-form script, right?
0:03:42 These playwrights and authors were putting together,
0:03:43 and then we had this thought of like,
0:03:47 what is it if it’s $1.50 to $2 per API call?
0:03:48 What could we get, right?
0:03:50 And then so we basically had this script
0:03:53 that they were crafting and they were writing together.
0:03:57 And then we had the deepfakes APIs
0:04:00 were just beginning to come out and there was text speech.
0:04:01 So we’re like, well,
0:04:03 we can take what the bot says back, right,
0:04:07 that DaVinci output and then take that little text,
0:04:09 put it to a text-to-speech engine,
0:04:11 put it into a lip sync piece,
0:04:15 and then, boom, you’ve got these Turing test passing characters.
0:04:16 We called them the Radbots.
0:04:17 And those were the Radbots.
0:04:18 They were awesome.
0:04:20 They still are kind of awesome.
0:04:22 Yeah, they were crazy.
0:04:23 And we had a little thing,
0:04:25 usually this is a public family podcast,
0:04:28 but we had the Radbots say.
0:04:32 That was the Wild West of LLMs back in 1921, yeah.
0:04:35 Yeah, and these bots really spoke their mind.
0:04:38 And they just spoke their mind.
0:04:42 They represented issues that the writers brought in,
0:04:44 that they felt were not represented well enough,
0:04:48 spoke on behalf of communities that were underrepresented,
0:04:51 kind of in a bid to reduce the algorbias
0:04:54 that the group was feeling quite intently.
0:04:56 You know, we are in, the bots are done,
0:04:58 they’re happily chatting away, people are excited.
0:05:01 And like little kids have talked to them a thousand times,
0:05:03 like one kid, 1200 times, right?
0:05:06 And these were presented at, you know, the…
0:05:09 India Art Fair, the, you know,
0:05:12 Be Fantastic, the British Council hosted the event,
0:05:14 the India Literature Festival.
0:05:19 They went a little bit out, quite a bit.
0:05:21 The point was that we had built out
0:05:23 this underlying architecture and orchestration platform
0:05:27 pretty much that enabled us to kind of plug and play
0:05:29 all the incumbent new technologies
0:05:31 that were emerging in AI.
0:05:35 And that was a moment, I think, in this very room,
0:05:37 actually, when John and I were like,
0:05:40 “Okay, we have the messaging platform,
0:05:41 “but what can we do more?”
0:05:44 And that was pretty much the start of GUI.
0:05:47 We felt, hey, if we can take artists and writers
0:05:49 with us on a journey, why not open it up
0:05:52 into a wider world and kind of really take this,
0:05:55 you know, this mission we had begun to feel quite deeply
0:05:57 by then of like democratizing AI
0:06:00 and really allowing people to play with it,
0:06:03 even if they didn’t know how to code, I’m not a coder.
0:06:04 So, you know, I feel that part.
0:06:06 I want to ask you quickly about your backgrounds.
0:06:08 I was wondering kind of both technically
0:06:12 and at the risk of making, you know, being corny,
0:06:15 aside from the fact that enabling everyone,
0:06:19 artists, creatives, non-coders to use the technologies,
0:06:21 aside from the fact that that’s just an awesome thing to do,
0:06:23 why was that your focus?
0:06:25 How would you do such a thing?
0:06:28 Well, my background is actually in art and design.
0:06:31 I studied painting and animation film design
0:06:34 back in the day, and quite wonderfully met Sean.
0:06:39 He had landed into India to set up the Microsoft Research Lab.
0:06:41 This was in 2005.
0:06:43 So, I was number three and she was number five.
0:06:45 In that amazing organization.
0:06:47 We’re going to track down number four, get them on the pie.
0:06:49 Yeah, yeah, yeah.
0:06:52 Yeah, so we, that’s how we met.
0:06:54 And I don’t have a coding background,
0:06:56 but I did get to hang out at Microsoft Research,
0:06:58 get access to some of the brightest minds in the world
0:07:00 and the papers that were coming out of there.
0:07:03 And we were the, I’d say the first
0:07:06 and maybe the only project, what is it?
0:07:07 Advanced prototyping.
0:07:09 My title was Head of Product Management
0:07:11 and Advanced Prototyping.
0:07:13 And so, we did a lot of work.
0:07:17 You know, this is like 2004 to 2010, around,
0:07:19 you know, how do we build interfaces that reach everybody?
0:07:23 How do we build interfaces for folks where literacy is an issue?
0:07:24 You know, I have the patent on, you know,
0:07:28 machine translation and instant messaging from 2005.
0:07:31 This is a space that we have been in for a long time,
0:07:33 running big models on NVIDIA hardware,
0:07:36 trying to understand what everybody has to say
0:07:37 through channel interfaces.
0:07:41 And frankly, India was a head that skipped email, right?
0:07:43 And everybody kind of went directly
0:07:46 into SMS-based interfaces into WhatsApp in the years ahead.
0:07:50 So this is a space that I would say we’ve been experimenting
0:07:52 in literally for 20 years in terms of like,
0:07:57 how do you build tools and interactive AI-based interfaces
0:07:58 that work for everybody, including people
0:08:01 that don’t read or write in English very well.
0:08:04 And then after that, Archana went off and…
0:08:05 – Set up an organization called JAGA
0:08:08 and then another called Be Fantastic,
0:08:10 with co-founders Freeman and Kamiya.
0:08:14 And the idea was really, how do we take artists
0:08:17 and creative practice and technology practice?
0:08:18 How can we bring practitioners
0:08:20 from these different fields together
0:08:23 and enable them to have conversations
0:08:26 and service kind of some of the big pressing issues
0:08:27 of the time.
0:08:30 So soon a lot of activists and all that started.
0:08:33 We started JAGA in 2009, Freeman and I,
0:08:38 and Be Fantastic kicked off as a public arts festival
0:08:39 looking at arts and technology,
0:08:44 particularly towards climate change and UNSDG kind of issues.
0:08:45 – And then meanwhile, I was off running
0:08:47 a company called BalaJob,
0:08:49 which was the largest informal sector-focused job site
0:08:50 in India.
0:08:53 So how do we take drivers and cooks and maids
0:08:55 and basically using the phone and IVR.
0:08:58 This is an SMS and multi-lingual interfaces
0:09:00 basically hook them up with better jobs.
0:09:03 – Pretty deep phone interface as a way.
0:09:04 – Yes, she was the voice in,
0:09:07 but we did 50,000 phone calls a day
0:09:09 or telephone bills were astronomical.
0:09:11 We had nine million users reprocessed
0:09:13 and a million applications a month.
0:09:15 That was a big initiative that was 11 years of my life
0:09:18 while she was running JAGA in parallel.
0:09:19 And over there we got there.
0:09:20 – Yep.
0:09:23 So you’ve got the credentials, you’ve got the chops.
0:09:26 And I think before I interrupted you,
0:09:27 Archana, you were about to say,
0:09:30 and so we wound up with this platform that we called GUI.
0:09:31 – Yes.
0:09:32 – We did, and I love the name.
0:09:35 It’s a little take on, you know, graphic user interfaces.
0:09:35 – Yep, yep.
0:09:38 – We talked about it all the time when GUI was a thing.
0:09:40 – Also the connective tissue, right?
0:09:42 – And the GUI love, you know.
0:09:46 Yeah, so we kind of pivoted pretty much overnight,
0:09:49 quite literally, we took our team over to the idea.
0:09:51 They really liked it as well.
0:09:52 And pretty soon we had GUI,
0:09:54 which is pretty much what it is today.
0:09:55 – Well, and then, you know,
0:09:57 it was sort of founded on this premise
0:09:58 that almost started out as a joke,
0:10:01 but feels less and less like a joke every day,
0:10:04 which is that all of us are just gonna be AI prompt writers
0:10:06 and API stitchers, right?
0:10:07 In terms of like the job of the future
0:10:09 that everyone will do seem to be that.
0:10:11 And then, you know, this thought being,
0:10:15 well, if that’s the world, what kind of things do you need?
0:10:17 Like what would be the JS equivalent,
0:10:18 or JS fiddle equivalent of that?
0:10:21 As in, how, when I make something,
0:10:22 do you get to view source on that
0:10:24 and understand what I’m doing so that I can learn
0:10:26 by building on top of what you’ve done?
0:10:27 – Right.
0:10:29 Like also from the, you know, publishing.
0:10:30 I mean.
0:10:31 – Yeah.
0:10:33 Then research, both of citations and public play papers,
0:10:36 which goes back to the beginnings of the light, right?
0:10:38 And so, and the open source movements as well.
0:10:41 We wanted to say, what would that mean
0:10:44 for these new higher level abstractions
0:10:46 of, hey, you got a little bit of LLM prompts
0:10:49 and you want to get over to hear this other API
0:10:51 and you want to connect to some other communication platforms.
0:10:53 How do you extract that up
0:10:55 to sort of allow a whole new generation
0:10:57 and non coders to basically play?
0:10:59 – And efficiently too, right?
0:11:02 I mean, the idea being that it’s a one-stop spot, right?
0:11:06 You can try all kinds of different AI technologies
0:11:09 and tools without subscribing individually to each other.
0:11:09 – To any one of them.
0:11:11 – Yeah, so it’s, yeah.
0:11:12 – And so that’s because we do this part of,
0:11:15 we saw a huge amount of innovation
0:11:17 coming from many sectors, right?
0:11:20 The open source ecosystem was clearly making
0:11:22 new incredible models every day.
0:11:23 – Right.
0:11:25 – And not just around LLMs, but around animation
0:11:28 and image creation and text speech, right?
0:11:30 And we saw, given our work with Radbots,
0:11:33 that when you allow people who are creative
0:11:36 and empowered to put those together in novel ways,
0:11:38 you get this magic.
0:11:38 – That’s where you get the magic.
0:11:40 – And it’s thousands of interactions,
0:11:42 which is definitely bigger than the sum of its parts.
0:11:43 – Yes.
0:11:46 – And so we wanted very specifically to say, great,
0:11:49 when opening AI or Google or the open source community
0:11:51 comes out with some new feature,
0:11:54 we should constantly allow the ecosystem to get better.
0:11:56 So we, with this whole thought of,
0:11:59 we should abstract on top so that every component
0:12:02 is essentially hotswappable and evaluatable,
0:12:04 which is where your golden questions things come in.
0:12:05 – Yes.
0:12:08 – That you can basically say, hey, open AI 4.0
0:12:11 or, you know, some 01 piece came out,
0:12:13 is it better, cheaper, faster for me?
0:12:15 And then, you know, given our impact piece
0:12:17 of like going forward, like, is it also,
0:12:20 what’s the carbon usage of each one of them, right?
0:12:23 And how do we make that clear to those people
0:12:26 that are buying and using those APIs,
0:12:29 as well as sort of the fourth important factor
0:12:33 for any chain or workflow that you’re gonna put together?
0:12:34 – So many questions.
0:12:36 Maybe I’ll ask you, and this is sort of a bad question
0:12:39 for an audio only format, but we’re gonna,
0:12:39 that’s what we do here.
0:12:40 So we’re gonna do it.
0:12:42 A new user comes to GUI.
0:12:43 – Yes.
0:12:45 – Someone who perhaps, you know, understands
0:12:48 when an LLM is, how technology works.
0:12:49 They know what an API is.
0:12:51 Maybe they copy and pasted some code once or twice.
0:12:54 You know, how do they get started?
0:12:56 Is it drag and drop?
0:12:58 Is it writing things into, you know,
0:13:01 sort of a chatbot, text-only interface?
0:13:03 How does the platform actually work for the user?
0:13:05 – So at a high level, you know,
0:13:07 this is something we took from Bob a Job.
0:13:09 We did pretty good at SEO.
0:13:11 And the reason that we do well at SEO
0:13:13 is we try to lower friction.
0:13:18 So when you come in, if you Google like AI document search
0:13:21 or, you know, agricultural bot or AI animation,
0:13:23 you can come in there and you can see the prompts
0:13:25 and the output directly.
0:13:27 And you can go into examples
0:13:28 and you can see a bunch of other ones
0:13:30 that we hopefully gets better and better
0:13:33 and are relevant for your field is sort of a UGC model.
0:13:34 And every single one of them,
0:13:35 you can just say, great, I like that.
0:13:36 I’m gonna tweak it, right?
0:13:38 – Right, so you grab a pre-exhausting.
0:13:40 – Yep, I’m gonna change what this model is.
0:13:43 I’m gonna, and so we’re always having this kind of,
0:13:45 we’re definitely inspired by like Replicate, right?
0:13:47 Which is definitely, you know, this idea of like,
0:13:49 what are the inputs somebody else used?
0:13:50 What are the outputs?
0:13:52 But to do so in a way that we’re chaining together
0:13:54 many of these different components,
0:13:56 just basically see something also.
0:13:57 So that’s kind of it.
0:13:58 – And it’s going to be a drag and dropy
0:13:59 and more kind of pull-down menu.
0:14:01 – Yes, yeah.
0:14:03 – ‘Cause the idea there is transparency.
0:14:04 – Yeah.
0:14:05 – Like for a lot of other sites,
0:14:08 I would argue they hide the prompt
0:14:10 that’s actually making the magic happen.
0:14:12 Or they hide the model that’s making the magic happen.
0:14:15 And so for us, we have a belief of like,
0:14:19 no, the model that you’re using for today versus tomorrow
0:14:21 and all of the prompts and everything else.
0:14:22 – Small change of the black box here.
0:14:25 It’s all digestible and viewable and respectable
0:14:26 all the way down.
0:14:29 – We kind of call these recipes for long times.
0:14:30 – Yeah, sure.
0:14:31 – Before we close it.
0:14:32 – Yes.
0:14:32 – Really like, these are the ingredients.
0:14:35 This is how we made this mix and, you know,
0:14:37 you can make your starting from there, yeah.
0:14:38 – So everything is forkable.
0:14:40 And then again, just like JS Fiddle,
0:14:42 you make one change, that’s a new URL.
0:14:43 You can share it with your friends.
0:14:45 You know, they literally next week
0:14:46 will come out with workspaces.
0:14:48 So you can work on these things collaboratively
0:14:49 with version histories.
0:14:52 So you can say, hey, I have a static endpoint
0:14:55 of like my cool co-pilot, we can work on that together.
0:14:57 And then you can do things like hook that up
0:14:58 directly inside the platform
0:15:00 to things like WhatsApp or Slack.
0:15:01 – Facebook.
0:15:02 – Or Facebook.
0:15:04 And that’s actually, I feel like an underestimated part
0:15:08 of getting these things to work on the communication tools
0:15:12 that really help you is way harder than you have.
0:15:14 – Well, so I wanted to–
0:15:16 – I have a friction there that we try to take away.
0:15:18 – Right, and I wanted to mention, you know,
0:15:20 and I don’t know, this is back in the day
0:15:22 I covered the mobile phone industry.
0:15:25 And I don’t know, maybe we have a great audience
0:15:26 so they probably know, but, you know,
0:15:29 for sort of a US centric point of view,
0:15:31 people don’t necessarily understand
0:15:33 that in so many parts of the world,
0:15:35 your phone is your computer period.
0:15:37 And people are sharing phones or, you know,
0:15:39 getting a phone to use for a day or that kind of thing.
0:15:40 But it all happens on the phone.
0:15:43 There’s no laptop, there’s no desktop workstation,
0:15:44 all that stuff.
0:15:46 And so when I was, you know, researching,
0:15:50 prepping for the taping and reading about how,
0:15:53 Oolongeezy, the farmer chat bot went through WhatsApp.
0:15:55 You know, it was like, oh, cool.
0:15:56 And I was like, well, of course it does
0:15:58 because that’s how people work.
0:16:01 So, you know, maybe to get back to what you were saying,
0:16:03 Sean, about putting these tools
0:16:05 into the communication platforms.
0:16:07 What were some of the hurdles, some of the challenges?
0:16:09 Maybe some of the pleasant surprises and working on that.
0:16:11 Oh, there’s a ton, right?
0:16:14 And so we’ve got a talk that’s on our site
0:16:17 called like the stuff that goes wrong, right?
0:16:19 Which is basically like, so, you know,
0:16:22 right after we began, again, our old friend
0:16:23 and, you know, director of the lab,
0:16:28 Annandana said, hey, Rickon who runs Digital Green
0:16:31 is working and the whole space is like, you know,
0:16:34 bots for development is going to be a thing.
0:16:36 Which is basically getting this part of,
0:16:38 if we want to convince every farmer in the world
0:16:41 to basically change their livelihood
0:16:44 and the crops they grow because climate change
0:16:46 is necessitating that over the next decade.
0:16:48 We’ve got to convince all of them
0:16:51 to change literally the source of their income.
0:16:53 That is a hard challenge.
0:16:55 Every government in the world has this challenge
0:16:57 for the, I don’t know, several billion people
0:16:58 on earth that are farmers.
0:17:01 So there’s this piece that he recognized like,
0:17:03 okay, bots are going to be a thing.
0:17:04 Why don’t you get together?
0:17:07 And so what we did there was to say, hey, you know,
0:17:09 in the case of Digital Green,
0:17:12 they had an incredible library of thousands of videos,
0:17:16 basically of one farmer recording how they do a technique
0:17:19 better that was then shown to other nearby farmers.
0:17:20 – Farming best practice.
0:17:21 – Farming best practice.
0:17:24 There’s also, you can think of it as like all of the fact
0:17:26 of every question that everybody’s asked
0:17:28 and the ag space that goes to the government
0:17:31 and then a bunch of like local knowledge
0:17:33 in the form of like Google Docs of like,
0:17:36 what people should do coming from local
0:17:37 on the ground NGOs.
0:17:39 And so what we did is to say, hey,
0:17:41 we’ve built this extensive platform.
0:17:42 We can have rad bots.
0:17:43 We know how to do speech recognition.
0:17:47 Well, we’re running private keys to all of the best services.
0:17:50 Plus we have our own a 100 infrastructure
0:17:51 and GPU or core orchestration.
0:17:53 So we can run any public model too.
0:17:54 So then we can say, great,
0:17:56 we can take all those videos which are not in English,
0:17:59 right, transcribe them, basically use a bunch
0:18:02 of DP24 scripts to create synthetic data around them
0:18:04 so that it’s not just this transcript,
0:18:07 but it’s also like, what is the question
0:18:09 that a practitioner might actually ask
0:18:10 and what’s the answer here?
0:18:13 And then use all of that to basically shove that
0:18:15 into a big vector DB, right?
0:18:18 And then say, okay, we then hook that up on WhatsApp.
0:18:20 And then you put in translation APIs
0:18:22 and speech recognition APIs in front of that.
0:18:24 And then boom, you suddenly have something
0:18:28 that works in multiple languages in multiple countries
0:18:31 using locally referenced content with citations back
0:18:34 that can speak any language that is actually useful
0:18:36 to folks on the ground or small shareholder farmers.
0:18:39 – That was what we demoed at the UN with Rican
0:18:43 in April, 2023 at their general assembly science panel, right?
0:18:46 And so you now look across the world,
0:18:47 do you have bots or a thing?
0:18:49 And I’m not saying like, obviously we weren’t,
0:18:51 the only people involved in this kind of transition.
0:18:54 But the thing that I think for us was exciting
0:18:58 is a bunch of people in the private sector also noticed.
0:19:00 And they said, hey, if you’re looking
0:19:03 at how do I make frontline workers productive,
0:19:06 people that need to fix your AC or do plumbing.
0:19:06 – Right, right.
0:19:08 – They have the same issues of like,
0:19:11 I need to aggregate every manual
0:19:13 of every AC sold in America,
0:19:15 plus all of the training videos around them,
0:19:18 plus ask any hard question in order for me to do my job.
0:19:20 And oh, by the way, all the master technicians
0:19:22 in that field retired with COVID, right?
0:19:24 And so there’s none left, right?
0:19:27 And so, but the technology that you need
0:19:29 to make that happen is actually the same.
0:19:32 And hence, you know, you’ll see us,
0:19:34 we talk a lot about frontline worker productivity
0:19:36 because I think we do this really well
0:19:40 by essentially aggregating all of these different parts.
0:19:41 That was the long answer.
0:19:44 – Yeah, one of the things that you mentioned a few times
0:19:48 is languages and, you know, a lot of the models,
0:19:50 I mean, English for better or for worse
0:19:54 is taking over, spreading, ubiquitous, et cetera, right?
0:19:56 And a lot of the models trained on English,
0:19:58 you’re working with all kinds of languages,
0:20:00 including, from my understanding,
0:20:03 tons of local dialects and, you know,
0:20:05 the kinds of things that the models
0:20:07 aren’t necessarily trained on.
0:20:09 Tackling that, right?
0:20:12 Talking about translations and all that kind of stuff.
0:20:15 Are you also working with, you know,
0:20:18 training foundational models in these languages,
0:20:22 or is it just a better way to tackle it by doing,
0:20:24 and I may have this wrong, so please correct me,
0:20:25 but doing what I think I understood
0:20:27 as translating back to English
0:20:30 and then using that to work with the LLMs.
0:20:33 – Again, it goes back to the sort of core philosophy of GUI
0:20:35 that we always wanna be the super set
0:20:37 of everything else out there.
0:20:41 I personally think as a small startup by small,
0:20:43 I mean, under a billion dollars in funding,
0:20:45 it is fool’s errand to try to train
0:20:48 any foundational models, right?
0:20:50 Because every six months you’re gonna be outclassed.
0:20:53 And so I’m gonna leave that to the people
0:20:55 that can put a hundred billion or more into it.
0:20:58 And yet, every single day I wanna know,
0:21:01 does that work better for my use case?
0:21:04 And we take this very use case specific
0:21:06 evaluation methodology, which is this golden questions,
0:21:10 and then apply that to, hey, I have 50 farmers
0:21:12 outside of Patna in India,
0:21:16 speaking this particular dialect of Bhojpuri, right?
0:21:18 Here’s the questions that they ask.
0:21:20 Here is the expert translation or transcription
0:21:21 into Bhojpuri.
0:21:23 Here’s the expert translation of that question.
0:21:25 That is my golden set.
0:21:28 And then what we allow you to do is to say,
0:21:31 I’m gonna run this essentially custom made
0:21:34 evaluation framework across every model
0:21:36 and every combination of those things,
0:21:39 so that this week I can tell you, huh,
0:21:42 the Facebook MMS large model works actually better
0:21:45 than Google’s USM, which may suddenly work better
0:21:48 than GPT4O audio, right?
0:21:52 And to basically allow organizations to evaluate
0:21:55 which of the current state of the art models,
0:21:57 and in particular, the combinations of those
0:22:00 work best for their use case.
0:22:02 So we have an evaluation level, not the training level.
0:22:05 – Right, is that a hands-on user thing,
0:22:08 figuring out which model, which combinations to use,
0:22:11 or is that something the platform does for the users?
0:22:12 – That itself is another workflow.
0:22:14 So goo.ic/bulk, right?
0:22:16 You can upload your own golden data set,
0:22:19 and then you can then say, great, I wanna do us,
0:22:21 and again, you can see all of the work
0:22:23 that we’ve done for other organizations,
0:22:25 and then you can just sort of say, great,
0:22:28 this is how they did it, I can copy, not copy,
0:22:30 I can just fork their recipe on the website.
0:22:32 – And the advantage there is you don’t have to run
0:22:35 the DevOps to run all of those new state of the art models.
0:22:36 – Yeah, absolutely.
0:22:40 I’m speaking with Sean Blagsfett and Archna Prasad.
0:22:43 They are the co-founders of GUI AI,
0:22:48 a low code change the world, literally change the world.
0:22:51 A lot of people say that, but I think y’all are doing it.
0:22:54 Change the world platform for using AI models
0:22:56 or all kinds of things, but we’re talking particularly
0:22:59 about frontline workers, be it an HVAC technician
0:23:03 or a farmer in a rural community in Africa.
0:23:06 Sean, you mentioned, I tease this at the beginning,
0:23:08 you talked a little bit now about the golden sets
0:23:11 and the golden Q&As, so I wanna ask you about that
0:23:13 and about issues around hallucinations.
0:23:17 It’s one thing if I’m using a chatbot to help me
0:23:19 in my writing work, and it hallucinates,
0:23:20 and I can sort of read it.
0:23:23 It’s another thing if a founder or anybody else
0:23:26 is asking a chatbot for best practices
0:23:29 for their livelihood, hallucinations literally,
0:23:31 life or death there, how do you deal with that?
0:23:35 – So there’s a variety of techniques that I’d say out there.
0:23:37 You should be suspicious of anytime anybody says
0:23:40 we’re 100% hallucination free in general.
0:23:43 So there’s the rag pattern, which says,
0:23:46 hey, I will search your documents or video
0:23:49 or whatever you put in there and I’ll only return,
0:23:51 well, then you get back those snippets
0:23:53 and then you ask the LM to summarize it.
0:23:56 The risk of hallucination there goes down, right?
0:23:58 Because you said, hey, I’m summarizing
0:24:00 some simple paragraphs.
0:24:04 That’s probably okay, honestly for things like farming.
0:24:07 It may not be okay for things like healthcare
0:24:09 because the other thing that happens often
0:24:11 in our pipelines is you take that, you know,
0:24:14 kind of summarization and then you do a translation.
0:24:17 And that translation, you know, for English to Spanish,
0:24:19 great, we’re not gonna probably have a problem,
0:24:22 but English to Swahili, English to Kikwa,
0:24:23 you’re like, I don’t trust that.
0:24:27 So without other techniques that we see out there
0:24:30 where if you really wanna do hallucination free,
0:24:34 then what you do is you sort of translate the user’s query
0:24:36 into a vector search of which question
0:24:38 that’s already in your data bank
0:24:41 whose answer has already been approved by say a doctor.
0:24:44 Does your question most align to?
0:24:46 And then the information you give back
0:24:48 is not the answer to the user’s question.
0:24:52 It’s, hey, here’s a related question
0:24:55 that I think is very semantically similar to your question
0:24:57 with a doctor approved answer.
0:24:59 And then you use essentially your analytics, right,
0:25:02 to say, hey, how often and how far away
0:25:05 is the user’s query to the question bank that I have?
0:25:08 And then, you know, I can then go get more questions
0:25:10 that can have verified answers from doctors
0:25:13 and make that bank bigger and bigger and bigger over time.
0:25:16 And that’s how you actually get hallucination free
0:25:17 ’cause it’s a search, right?
0:25:22 So that golden set is the vetted questions and answers
0:25:24 that you’re then searching for to see.
0:25:25 – Well, that’s kind of–
0:25:27 – Users can’t see this, Sean made a face
0:25:28 and looked up, so I stopped.
0:25:29 – Oh, yeah.
0:25:30 So those are two different things.
0:25:31 – Okay.
0:25:33 – Like what I was talking about is what is the knowledge base?
0:25:35 It, you know, kind of a rag pattern.
0:25:36 – Yes.
0:25:38 – Golden answers is really the use case
0:25:39 specific evaluation frame.
0:25:40 – Okay, okay.
0:25:43 – And so you can think of it as most LLMs
0:25:46 look at like the MMLU as the benchmark
0:25:49 that they should be rated against,
0:25:51 which asks a bunch of multiple choice questions
0:25:53 for graduate students and things like organic chemistry.
0:25:56 That doesn’t tell you how to fix an AC.
0:25:59 It doesn’t tell you how to plant if there’s been a rainstorm
0:26:01 and you’re using this particular fertilizer
0:26:03 in the middle of, you know, Uganda.
0:26:07 For that, you need a different evaluation set, right?
0:26:09 And so that golden set is basically our answer
0:26:12 to how does somebody bring in their own use case
0:26:14 specific evaluation set?
0:26:16 And then we have a set of, you know, basically
0:26:18 you upload that was the question and answer pairs.
0:26:20 And then you say, here’s one version of the bot
0:26:21 using GPT-4.
0:26:23 Here’s one version using Gemini.
0:26:24 Here’s one version using Claude.
0:26:25 I’m gonna run them all.
0:26:28 And then what we do is we allow you to specify
0:26:30 and we have some default ones,
0:26:33 which answer is semantically most similar
0:26:34 to your golden answer.
0:26:35 And then we create a score out of that.
0:26:38 And then we just, you know, average that score
0:26:38 and then give you an answer.
0:26:39 That’s it.
0:26:42 And so this allows for a very flexible framework
0:26:44 for you to do your evaluation.
0:26:47 – Anything to, yeah, it was a long technical aside
0:26:47 of like, how do we–
0:26:49 – No, it’s, it’s good, it’s good.
0:26:53 – So we get the institute where looking at
0:26:56 how can we enable community specifically women
0:26:59 and minority genders to kind of define
0:27:03 what their own data set would look like.
0:27:05 How to create a data set that best represents
0:27:08 their community or their values.
0:27:11 How could they use those data sets to then create,
0:27:14 you know, fine-tuned models that enable others
0:27:18 within their community or outside to make imagery
0:27:20 and potentially animation even,
0:27:23 using those data sets that they have created.
0:27:26 And so that’s an exciting new project
0:27:28 that we’re gonna take off on this month.
0:27:31 And with Udav actually were looking at how,
0:27:35 and I think they kind of instigated the workspace feature
0:27:36 that we’ve kind of pulled out now,
0:27:39 which is how can we bring their young graduates
0:27:42 and even their PhD folks to start using AI tools,
0:27:46 quickly play with it without having to know
0:27:48 how to do the DevOps part.
0:27:51 I wouldn’t, it would take me another portion of my brain
0:27:52 to figure that out.
0:27:54 – I’m with you.
0:27:56 – So, you know, how do we make it possible
0:27:59 for like groups of people in their programs?
0:28:01 We’re looking at the DX arts program,
0:28:04 which is experimental arts program graduates
0:28:06 to be able to, you know, start creating stuff quickly
0:28:08 without all of the underlying stuff
0:28:12 that Sean eloquently and in great detail
0:28:13 has explained somehow.
0:28:17 – But also to do this in a collaborative way, right?
0:28:19 And I feel like that’s like the metaphor part
0:28:22 that will sort of get back into the AI workflow standards,
0:28:25 which is to say, you know, there was word around
0:28:28 for a long time and then we went to Google Docs
0:28:30 and we had a huge unlock of what it means
0:28:33 for to do real-time collaboration on document.
0:28:35 And you’re like, “Wow, I can be a lot more productive.”
0:28:37 – Sure, together. – Together.
0:28:41 – Look at like analytics and you take something like amplitude.
0:28:43 Amplitude say, “Well, you used to have data analytics
0:28:45 and like I ran a company where I would do
0:28:47 SQL training classes because I wanted
0:28:50 to democratize data analysis at night at my company.”
0:28:52 But then tabla or, you know, in the case of amplitude,
0:28:54 amplitude comes along and around.
0:28:56 And I can just share a URL with you,
0:28:58 which is like, you know, looking at our user analytics.
0:29:00 And if you want to change that from a weekly view
0:29:03 to a daily view, it’s just a dropdown, right?
0:29:05 And then, you know, Webflow arguably did the same thing
0:29:09 from like Photoshop, right, as a standalone desktop tool
0:29:11 to something that is collaborative in the client.
0:29:13 We think we can do the same thing
0:29:15 for the AI workflows themselves, right?
0:29:18 So that, again, we are working on these things
0:29:19 and I don’t have to worry about the underlying models
0:29:20 that are underneath them.
0:29:23 And you’re working at this higher level of abstraction
0:29:26 where I get to work and see outputs in a team environment.
0:29:28 And that’s very useful for learning,
0:29:29 which is the DX arts piece.
0:29:31 And, you know, it’s very useful
0:29:33 for improving frontline worker productivity.
0:29:35 And then as we make these things bigger and bigger,
0:29:37 you know, you want to do the same thing of,
0:29:39 hey, if I’ve got an image set
0:29:42 that we feel is underrepresented in something like Dolly,
0:29:44 I can take that image set and make my own model
0:29:46 and boom, suddenly make animation styles
0:29:48 around an indigenous art form, right?
0:29:49 That doesn’t exist there
0:29:50 ’cause the data doesn’t exist.
0:29:52 And that’s really the work that we’ll do with Gotay.
0:29:55 But it’s kind of like the same metaphors
0:29:57 keep getting built on top of each other.
0:29:59 And that’s the part that I think we find very exciting.
0:30:03 – Archnef, when you’re working with whether it’s women,
0:30:06 minorities, whatever sort of underrepresented community,
0:30:10 and, you know, particularly in a more rural place
0:30:13 where, again, you know, there’s access via phone
0:30:16 and things like trying to find a way to use Sora online,
0:30:19 right, just isn’t even in the, it’s a different perspective.
0:30:22 Are you finding that people are interested
0:30:26 and enthusiastic about not just learning how to use AI tools
0:30:29 but being represented in the data sets?
0:30:31 Is that something that you kind of have to explain
0:30:32 from the ground up?
0:30:34 And I’m asking in part because, you know,
0:30:36 talking about arts in particular, right,
0:30:38 and underrepresented communities, you know,
0:30:41 there’s been a lot of blowback in people talking about,
0:30:43 you know, being up underrepresented
0:30:47 or having their work used
0:30:49 without having been asked for consent.
0:30:51 And so kind of looking at the other side of it,
0:30:54 what’s the experience like in working with folks
0:30:57 who are coming from this totally different perspective?
0:30:58 – And thank you for that, Noah.
0:31:01 That’s a fantastic question, actually.
0:31:03 So I was, you know, recently in Manchester
0:31:05 and with friends at Islington Mill,
0:31:07 and we had a pretty deep conversation
0:31:09 pretty much around the same thing that you asked,
0:31:12 which is artists, creators definitely feel
0:31:13 there is a lot of pushback.
0:31:15 They have been exploited, their work,
0:31:17 their life’s works have been exploited.
0:31:19 Now, however, the cats out of the bag,
0:31:23 we’re not gonna be able to rewind some of this stuff,
0:31:27 but if we have to take kind of a peek into the future,
0:31:29 one of the missions I personally have
0:31:30 and feel very deeply about,
0:31:33 and I know that Goy is right there with me on that,
0:31:35 is that we’re kind of past the moment,
0:31:37 and like, you know, three years ago, four years ago
0:31:39 when we were doing the Radbots project,
0:31:41 it was, hey, can we enable the artists?
0:31:43 Can we give them the tools?
0:31:45 And then can they make what they would like to make?
0:31:47 I think we’re past that moment.
0:31:49 I think where we are at is
0:31:50 they need to make their own tools
0:31:53 and then make the things that they want to make
0:31:55 with the tools that are best servicing their needs.
0:31:59 That’s kind of where we’re at with Goy right now.
0:32:02 How do we enable people to make their own fine-tuned models
0:32:04 that allow them to, for example,
0:32:08 create imagery or animation that they would like to see,
0:32:10 that they would like to be represented with?
0:32:13 It’s just one example of how that could play out,
0:32:16 and I feel like there’s a significant urgency around that.
0:32:19 One is that in the making of those tools,
0:32:22 they get more aware, we all learn together,
0:32:25 and, you know, the workplace model is also very much that,
0:32:29 is that we learn better together, we make better together,
0:32:33 and the more we can get people, especially creative thinkers
0:32:36 and activists on this technology,
0:32:37 the better that world will be.
0:32:41 Absolutely, no, that’s great, absolutely.
0:32:44 So getting into kind of a last topic before we wrap up, standards.
0:32:45 Yes.
0:32:50 Sean, you were talking about the move from, you know, Word to Google Docs
0:32:52 and this collaborative environment.
0:32:56 HTML, obviously, is a great example of, you know,
0:32:58 a standard that has evolved, splintered,
0:33:01 would have you over time, but we all use the web, right?
0:33:05 How do you approach standards in this, you know,
0:33:06 new fast-moving world of AI?
0:33:09 So there’s always lessons from the past, right?
0:33:12 And so we hope so anyway.
0:33:13 We hope so, right?
0:33:15 We hope we learn the wisdom from the past.
0:33:17 But if you look at HTML,
0:33:20 HTML allowed for computer-to-computer communication
0:33:21 between networks, right?
0:33:23 But also had this other factor,
0:33:25 which I feel is completely under-appreciated,
0:33:27 which was view source, right?
0:33:29 Like the way that I learned to code
0:33:31 and figure out what HTML layout would happen
0:33:34 is ’cause I dissected the discovery home page.
0:33:36 And then, but there’s other ones that are kind of more recent
0:33:39 that I think are also indicative, like Kubernetes, right?
0:33:43 Google, like, you know, you rewind the clock 12 years,
0:33:46 Amazon had a lock on essentially cloud server configuration
0:33:49 and deployment, hence then Kubernetes came along
0:33:51 from an essentially upstart number two
0:33:53 and number three players like Google, right?
0:33:55 It was said, “Hey, I want to make it really easy
0:33:58 “to move from one platform to another.
0:34:00 “If I had a standard that could describe
0:34:02 “the configuration that I need,
0:34:03 “then suddenly you don’t have vendor law.”
0:34:06 And that has allowed the cloud infrastructure business
0:34:07 not be dominated by one company,
0:34:10 but to have, you know, there’s at least now big three
0:34:13 plus a bunch of local vendors globally.
0:34:15 And you can use the same Kubernetes file
0:34:17 to go and say, “This is what I need for all of them.”
0:34:19 So we think there’s a similar thing
0:34:21 around AI workflows and it already happens now.
0:34:23 Like you have tools like Open Router
0:34:26 that allows you to really easily switch your LLM,
0:34:29 but, you know, our take is if you can define
0:34:30 those kind of like high level interfaces,
0:34:32 like what’s an LLM do?
0:34:34 You put some text in, you get some text out.
0:34:36 Maybe you put some text and an image in
0:34:38 and then you get, you know, some text out,
0:34:40 maybe now some audio, right?
0:34:42 But, you know, you look at what is the interface
0:34:43 of a speech recognition model.
0:34:45 It’s like, well, you put some audio in
0:34:47 and maybe give it a language hint
0:34:48 and you expect some text out.
0:34:50 And then again, you want to swap, right,
0:34:51 for any model that’s underneath.
0:34:54 So part of it is there’s some standard interfaces
0:34:57 for these models and then those become steps.
0:35:01 And then you can compose those into essentially a chain,
0:35:03 a LLM chain or something like that,
0:35:05 but kind of a slightly higher level.
0:35:08 And then those steps end up becoming your recipe.
0:35:11 But the thing that travels with it is that golden data set.
0:35:12 So that allows you to say,
0:35:17 “Hey, I have my desired set of inputs and outputs
0:35:21 “and then I have my current set of steps that I should take.
0:35:24 “And then I can automatically just swap out the models
0:35:27 “as new ones are released and then boom, just tell you,
0:35:30 “you should really use this one, it’s better, cheaper, faster.”
0:35:31 And then that high level thing,
0:35:33 that is the AI workflow standard.
0:35:36 It’s basically like, what are your steps
0:35:38 extracted above the use of any given AI model?
0:35:40 Maybe you have a little bit of like,
0:35:41 what are the function calls that you’re going to expose
0:35:46 in there as well, kind of as, you know, open API configs.
0:35:47 Then what’s the evaluation set?
0:35:50 And our belief is if you had that higher level thing,
0:35:51 then you can take that and say,
0:35:53 “Oh, I want to run that on Claude
0:35:55 “or I want to run that on GPT builder.
0:35:58 “I want to run that on GUI or DeFi or Relevance.”
0:36:01 Then we suddenly have this, again, portable thing
0:36:02 that allows you to run.
0:36:05 – For folks listening and, you know, anybody,
0:36:07 but I want to kind of gear it towards that,
0:36:09 kind of new to the technology
0:36:13 or coming from less of a dev dev ops background
0:36:16 and more of a, you know, artist, activist,
0:36:19 writer type background or, you know,
0:36:21 the dev dev ops folks who are working with those people
0:36:25 who think it’s important to elevate those voices
0:36:27 and help them create the tools that they want to use, right?
0:36:30 What advice would you give to somebody out here listening
0:36:33 who thinks they have a new way to do it
0:36:35 or just wants to get involved with an organization
0:36:36 who’s doing it?
0:36:37 What would you tell them?
0:36:39 – Get started, get on GUI.ai.
0:36:40 It’s easy.
0:36:43 And if there’s any hiccups, contact us.
0:36:44 They’re very easy to catch.
0:36:45 And it’s not as hard.
0:36:48 It’s not as complicated as it feels.
0:36:49 There’s our platform.
0:36:50 There are others, too,
0:36:54 that are really trying to make these processes
0:36:57 simpler, faster, quicker, or more efficient.
0:36:59 And I don’t think there’s time to be wasted.
0:37:01 I think it’s now.
0:37:03 And there’s no point sitting in the sidelines
0:37:05 worrying about it or critiquing it.
0:37:08 Kind of got to get in there, make the stuff,
0:37:10 and then, possibly, make the barriers and the guardrails
0:37:12 that you need, as well.
0:37:15 You know, kind of take the bull by its horns.
0:37:15 – Yeah, excellent.
0:37:16 Yep.
0:37:20 The GUI.ai website, GUI.ai, is great.
0:37:23 Lots of use cases, lots of technical info, videos,
0:37:24 fantastic resource.
0:37:27 Are there other places you would direct listeners
0:37:28 to go in addition?
0:37:30 Social media, partner projects,
0:37:33 anywhere else besides the GUI.ai website.
0:37:34 And I’ll spell it while you’re thinking
0:37:37 G-O-O-E-Y for less fancy words.
0:37:38 Yeah.
0:37:41 I guess one thing that I’ll add to that is,
0:37:44 you can’t do good technology that changes the world
0:37:46 just by focusing on the technology, right?
0:37:49 That actually is just a means to the end.
0:37:53 And so, I think the thing for people to get started with
0:37:55 is, for me, it actually gets back to, like,
0:37:57 what’s the problem you’re solving?
0:37:58 Do you actually have something that looks like
0:38:00 golden questions?
0:38:01 And what does that mean?
0:38:03 It means, like, if you could imagine that,
0:38:06 hey, we could give great public defenders
0:38:10 for everyone in the country at no cost,
0:38:12 what would that look like, right?
0:38:13 What would be that set of expertise?
0:38:17 If we could say, hey, for any frontline worker,
0:38:19 I will be the nurse mentor for them,
0:38:21 helping them with triage and dealing with every
0:38:23 WHO guideline that they can imagine
0:38:25 and give them the right piece of advice
0:38:26 in their own language, right?
0:38:30 That is a real need for a real expert system.
0:38:32 And so, to think not so much of, like,
0:38:33 what’s the technology piece?
0:38:35 But what is actually the problem
0:38:38 where there’s a kind of expert out there right now
0:38:41 that’s expensive from a capacity-building perspective?
0:38:42 Right, right.
0:38:45 This is a place where AI can actually be really great,
0:38:47 which is we have collected wisdom from people
0:38:49 and processes and meta-processes,
0:38:51 all of 01 and documents and video.
0:38:53 And I feel like in the next year,
0:38:56 even with the current limitations we see around LLMs,
0:38:58 we can do this one well.
0:39:00 And so, for people, I would say you’d have to find
0:39:03 the problem worth solving in your community
0:39:04 or your business.
0:39:07 And say, if I could enable people to have that expert here,
0:39:10 they would earn more money, do their job better,
0:39:13 live longer, you know, have a better life.
0:39:15 And so, to focus not so much on the tech, but that part.
0:39:17 And then if you can get that, then, you know,
0:39:18 the tech tools are easy.
0:39:20 Arshna Prasad, Sean Blakesfett.
0:39:22 Thank you so much for joining the podcast,
0:39:24 telling us about GUI.AI.
0:39:27 I’ll say it again for the listeners, GUI.AI.
0:39:28 It’s easy. Check it out.
0:39:31 There’s so much to be done, so much you can do.
0:39:34 And thank you to folks like you who are making it easier
0:39:36 for more and more people to get involved,
0:39:39 be represented, and create the tools they need
0:39:40 to solve the problems they have.
0:39:41 Thank you.
0:39:42 Thank you.
0:39:45 [MUSIC PLAYING]
0:39:49 [MUSIC PLAYING]
0:39:52 [MUSIC PLAYING]
0:39:56 [MUSIC PLAYING]
0:39:59 [MUSIC PLAYING]
0:40:03 [MUSIC PLAYING]
0:40:06 [MUSIC PLAYING]
0:40:10 [MUSIC PLAYING]
0:40:13 [MUSIC PLAYING]
0:40:16 [MUSIC PLAYING]
0:40:20 [MUSIC PLAYING]
0:40:23 [MUSIC PLAYING]
0:40:26 [MUSIC PLAYING]
0:40:30 [MUSIC PLAYING]
0:40:33 [MUSIC PLAYING]
0:40:43 [BLANK_AUDIO]
Co-founders Sean Blagsvedt and Archana Prasad of Gooey.AI discuss how their platform is making AI more accessible across communities. The platform enables teams to leverage multiple AI tools, enhancing productivity in sectors like agriculture, healthcare, and frontline services. Key applications include multilingual chatbots that support African farmers through WhatsApp and AI assistants that help HVAC technicians access technical documentation.
-
AI Agents Take Digital Experiences to the Next Level in Gaming and Beyond, Featuring Chris Covert from Inworld AI – Episode 243
AI transcript
0:00:10 [MUSIC]
0:00:13 Hello, and welcome to the NVIDIA AI podcast.
0:00:15 I’m your host, Noah Kravitz.
0:00:19 Digital humans and AI agents are poised to be big news this year.
0:00:21 They’re already making waves, in fact.
0:00:26 At CES 2025 in Las Vegas, Logitech G’s Stream Labs unveiled its intelligence
0:00:31 streaming assistant, powered by technologies from in-world AI and NVIDIA.
0:00:35 The intelligence streaming assistant is an AI agent designed to provide real-time
0:00:39 commentary during downtime and amplify excitement during high stakes moments
0:00:41 like boss fights or chases.
0:00:45 The collaboration brings together Stream Labs expertise in live streaming tools,
0:00:49 NVIDIA ace technology for digital humans, including AI vision models that can
0:00:53 understand what’s happening on screen, and in-world’s advanced generative AI
0:00:57 capabilities for perception, cognition, and adaptive output.
0:00:58 But what are digital humans?
0:01:00 Where are they going to be used?
0:01:03 And where are they going to make an impact in the enterprise and
0:01:05 in gaming and entertainment in particular?
0:01:08 And how does a designer design in the age of digital humans,
0:01:10 agentic AI, and beyond?
0:01:14 Chris Covert, director of product experiences in in-world AI,
0:01:18 who’s press release, by the way, or blog post, I paraphrased heartily from in
0:01:21 that introduction, so credit to.
0:01:24 Chris is here to dive into these points and more as we talk about some of the
0:01:28 technology that I think is really poised to really make a big impact this year
0:01:33 going forward and kind of shape this new area of digital experience for all of us
0:01:35 in what we’ll broadly call the AI age.
0:01:36 But at any rate, Chris is here.
0:01:41 So Chris, thank you so much for joining the AI podcast and welcome.
0:01:42 Thank you for having me today.
0:01:45 It’s a pleasure to be here, not only as a longtime partner of NVIDIA,
0:01:49 but also as a huge fan of your recent announcements in AI at CES this year as
0:01:53 well, genuinely amazing time to be talking about AI in this space.
0:01:54 It really is.
0:01:55 And so you were at the show.
0:01:56 You were in Vegas.
0:01:57 Couldn’t make it personally.
0:01:58 I’ll be doing dice.
0:02:00 I’ll be doing GDC, but our team was there.
0:02:02 We worked on the project.
0:02:03 All right, so let’s get into it.
0:02:08 Tell us a bit about in-world, in-world AI for listeners who may not know.
0:02:11 We can get deeper into the assistant in particular,
0:02:14 a little later in the conversation, if that’s cool, just so you can kind of set
0:02:18 the stage for all of us about what it is that we’re going to be talking about.
0:02:20 Yeah, so I, you know, extremely biased.
0:02:22 I have the best job in the world.
0:02:25 And in-world, we make the leading AI engine for video games.
0:02:28 And I get to work with the most creative minds in the industry,
0:02:31 most creative minds in the world of gaming and entertainment to answer the
0:02:35 question, how can we make fun more accessible to the people that make our
0:02:36 favorite experiences?
0:02:40 And I’m not going to tell you it sounds like a great job.
0:02:42 It is, it is.
0:02:43 And I’m not just blowing smoke here.
0:02:47 And we’ll definitely, as we go through this, you know, emphasize fun,
0:02:50 accessible in people, and why I think those are the most important here.
0:02:54 But in-world’s mission is to be that AI engine for this industry, right?
0:02:58 The more that the technology progresses, the lower the barrier
0:03:00 to entry to make AI experiences.
0:03:04 But we find there are still challenges, regardless of if you’re a studio
0:03:08 with a massive staff or you’re an AI kind of native company building
0:03:12 up these experiences from grassroots, that there’s a massive gap
0:03:14 between prototyping and deployment of AI systems.
0:03:19 And then to do that and find the fun for a user for an experience
0:03:20 is still incredibly challenging.
0:03:24 So we offer not only the platform, but the services to help you
0:03:26 create that fun in a deployable manner.
0:03:31 So kind of briefly run down maybe what you guys talked about at CES this year
0:03:34 and then we’ll put a pin in the assistant, like I said, and come back.
0:03:35 Yeah, that’s awesome.
0:03:38 So blog posts is out there, fantastic videos out there.
0:03:41 A lot of demonstrations of what happened, but succinctly, you know,
0:03:45 to plug the collaborative effort here in World NVIDIA and Streamlabs
0:03:50 got together for CES and we put to the test this, you know, streaming companion
0:03:52 that could not only act as a cohost to a streamer,
0:03:54 but also serve to support the creator in other ways,
0:03:58 like handling production for them on their stream as well.
0:04:01 We showed a demo of this happening at the conference, always a risk
0:04:04 because you’re playing in a big space with low, you know, internet bandwidth.
0:04:08 But we played this over Fortnite and the streamer, the demo where in this case
0:04:11 and their agentic cohost, you know, chatting during the game,
0:04:15 grilling each other for bad rounds, reacting to trends that are happening in chat,
0:04:17 cheering together when you manage to get kills.
0:04:21 But then interestingly enough, you know, when chat or when the streamer wants a replay,
0:04:24 streamer has to just ask the agent, they’ll clip it,
0:04:27 they’ll set up the screen for that replay and they’ll keep that stream feeling fluid.
0:04:30 So it’s one of those use cases, again, trying to align on like,
0:04:35 where do agentic systems actually play in enterprise and this entertainment industry
0:04:37 is we’re finding they’re hyper specialized.
0:04:40 And in this use case, we have this perfect, you know,
0:04:43 one streamer has to wear all of these different hats during their stream.
0:04:45 How can an agentic system come assist them
0:04:48 so that they can focus on making the best content
0:04:51 and all of the other things in managing that content can be done for them.
0:04:54 So back in the day, and this is part of why we got to come back to it.
0:04:58 I did a lot of YouTube stuff and we tried to do some live streaming.
0:05:00 This was in like the early mid 2000s.
0:05:05 And I remember setting up, you know, open source plugins to try to get graphics
0:05:06 over the live stream.
0:05:10 And then I had like a secondary camera, you know,
0:05:13 like a phone with a camera with a USB cable and all this stuff.
0:05:14 But that sounds amazing.
0:05:17 So I’m excited to talk about that to kind of then back into it
0:05:19 or start with the technology and go forward.
0:05:23 A genetic AI cohost digital humans.
0:05:24 Let’s talk about those things.
0:05:27 You want to start with digital humans because we’ve talked about a genetic AI
0:05:30 and we’ll get to it and it has the different, you know, what an agent is or isn’t.
0:05:33 I think it’s still a little bit malleable and context specific.
0:05:37 But when we talk about digital humans, when InWorld talks about digital humans,
0:05:38 what are we talking about?
0:05:39 You know, that’s such a great question.
0:05:41 And it brings up an important distinction.
0:05:44 When most people say digital humans, they’re thinking of chatbots.
0:05:48 Text-based tools that respond with mostly pre-programmed scripts
0:05:52 to help a specific task or guide users through a set of questions or instructions.
0:05:56 But at InWorld, we focus on AI agents that go far beyond simple chat.
0:05:59 These agents can autonomously plan, take actions,
0:06:01 and proactively engage with their environment.
0:06:07 Unlike a typical chatbot, which waits for a user to have some kind of question
0:06:12 or response in and then that AI agent provides a canned response,
0:06:15 our agents at InWorld are designed to interpret multiple types of inputs,
0:06:19 whether that’s speech or text or even vision or sensor data.
0:06:22 And then dynamically think and respond in real time.
0:06:23 They don’t just answer questions.
0:06:25 They can initiate tasks.
0:06:26 They can adapt to different contexts.
0:06:28 And they can carry out sophisticated interactions
0:06:31 that make sense within the context of what you’re doing.
0:06:34 So if you look at where digital humans are used mostly today,
0:06:35 and again, I’m super biased,
0:06:38 it’s often in the high volume chatbot or, you know,
0:06:40 personal digital assistance space.
0:06:43 But if you consider where they can have the biggest impact,
0:06:45 that’s when they function as these truly autonomous agents.
0:06:48 Planning and acting and proactively helping people
0:06:50 in ways that traditional chatbots can’t.
0:06:52 It’s not question and answer.
0:06:54 It’s answer or question that I didn’t even know I had yet.
0:06:56 And that’s the core of our focus at InWorld,
0:06:58 building AI agents that can not only look human,
0:07:02 but can think and react and solve problems like one,
0:07:05 using multiple modalities to interact with their world.
0:07:06 In our case, a digital one.
0:07:09 When it comes to building with agentic AI,
0:07:13 do you feel even like you have kind of a solid vision
0:07:15 of what is to come even near term?
0:07:20 Or is it still very early days of figuring out everything from,
0:07:23 you know, trying different types of infrastructure
0:07:26 to, you know, sort of the design and higher level thinking
0:07:28 about how to architect these things?
0:07:28 That’s a great question.
0:07:32 And the reality is we’re shifting toward an agentic AI framework,
0:07:33 as I mentioned before.
0:07:37 And shifting toward that framework is no longer really an option.
0:07:40 It’s becoming essential for us and our partners.
0:07:44 When you need any sense of autonomy or kind of real time
0:07:46 or runtime adaptability, latency optimization,
0:07:48 any sense of custom decision making,
0:07:52 you need a robust framework that lets you take control of that stack.
0:07:55 No one platform is going to serve the needs of our industry
0:07:56 as standalone.
0:07:58 So building a framework that is flexible,
0:08:00 that is robust to the needs and growing needs,
0:08:03 changing needs, super diverse needs of our industry
0:08:05 is incredibly important.
0:08:06 What we’re seeing more and more of is enterprises
0:08:09 that want to own that architecture end to end.
0:08:11 What I mean by that is they need the flexibility
0:08:14 to decide what models to use, where to use them,
0:08:16 how data is fed into their systems,
0:08:18 how to manage compliance and risk,
0:08:21 all of these factors that require really custom logic
0:08:22 and custom implementations.
0:08:24 So it’s adopting this, you know,
0:08:28 agentic AI framework that really puts the power in their hands.
0:08:30 It’s not just about future proofing.
0:08:33 It’s about giving them full control over how AI evolves
0:08:35 within their organization at any given time, right?
0:08:37 The industry is going to move quickly.
0:08:38 We want to make sure our partners can move
0:08:41 just as quickly as they get inspired.
0:08:45 But to me, you know, my favorite key differentiator here,
0:08:47 Future of Inworld, why the framework model,
0:08:50 why all of these changes is that such a framework
0:08:52 doesn’t just guarantee technical flexibility.
0:08:54 That’s great, and that’s going to make a lot of people happy,
0:08:57 but it also opens up the creative design space as well, right?
0:09:00 By collaborating with people that are thinking
0:09:02 of the most outrageously beautiful,
0:09:05 outrageously wonderfully weird experiences
0:09:07 you could possibly imagine and giving them tools
0:09:09 that are opened up to really let them craft
0:09:11 their own AI architectures.
0:09:14 We’re really, you know, helping push the boundaries
0:09:18 of what and how we can deliver innovative experiences, right?
0:09:21 We made the jokes earlier, but if we say that conversational AI,
0:09:22 you know, this like chat bot nature,
0:09:26 is now feeling very entry level, it’s feeling baseline,
0:09:28 then what we’re doing with this agentic framework
0:09:30 is building out that blueprint for the future
0:09:32 so that our partners and our customers
0:09:37 can help imagine, build, and deploy AI experiences
0:09:40 that genuinely didn’t feel possible even six months ago, right?
0:09:41 Taking that to a whole new level.
0:09:44 If we go more than like two years out,
0:09:47 I’m at a 10% confidence level of where this is actually going to be.
0:09:50 Yeah, I appreciate the candor.
0:09:52 Yeah, it’s just, you know,
0:09:54 and I’ve been doing this a couple of times in my career throughout
0:09:57 where I’ll work with somebody in an advisory capacity
0:10:00 where we’re looking at the future three to five years out
0:10:02 and by the time we finish our analysis,
0:10:04 the thing that we had two years out is now open sourced
0:10:06 and you’re like, cool, so back to the drawing board
0:10:09 because it’s just the industry moves so quickly
0:10:13 that what is possible, it’s actually really hard to nail down
0:10:16 and we will get into kind of, hopefully in this conversation,
0:10:20 how designing around AI and these systems is that challenge
0:10:23 because I think we’ve learned some fantastic lessons
0:10:25 that hopefully people listening can take away as well.
0:10:29 I really see there, you know, when it comes to trying not to sound too obvious
0:10:31 when we’re talking about agents,
0:10:34 the evolutionary chain of how an agent will grow,
0:10:36 how an agent, you know, the technology behind it
0:10:38 will proceed in even the near future.
0:10:41 It comes all down to agency and again, it’s agents.
0:10:44 It sounds obvious, but what I really mean by that
0:10:46 is the complexity of that cognition engine
0:10:49 and I’ll try and crack this net with an analogy
0:10:51 and I’ll try and make it fast because I could talk about this.
0:10:52 Now, take your time.
0:10:53 We’ll take your time with the analogy anyway.
0:10:58 We have this, you know, first again is that conversational AI phase
0:11:00 and I’ll use a gaming analogy, right?
0:11:04 The conversational AI phase gives avatars, gives agents,
0:11:05 I’ll use them interchangeably today,
0:11:08 extremely little agency in doing anything other than speaking, right?
0:11:12 It may be able to respond to my input if I ask it to do something,
0:11:14 but it’s not going to physically change the state of something
0:11:17 other than the dialogue it’s going to tell me back.
0:11:19 So, I think in terms of an analogy,
0:11:21 like a city-building simulation game,
0:11:24 in this phase, conversational AI,
0:11:25 I can ask it about architecture
0:11:27 and it could give me some facts about architecture.
0:11:30 I can ask it about plans I have to build this little area
0:11:31 and it could give me some advice.
0:11:34 I could ask it, “Hey, how do I limit the congestion
0:11:37 or how do I maximize traffic over this little commercial area?”
0:11:41 And it could tell me in words how to potentially do those things,
0:11:43 but it won’t be able to do things like place buildings
0:11:46 or reconstruct roads or even select the optimal facade
0:11:48 for the style of the region
0:11:50 because that’s completely out of its scope.
0:11:50 And not to…
0:11:51 It can’t act.
0:11:52 Yeah, it can’t act.
0:11:53 It can’t act at all.
0:11:53 It is…
0:11:55 Okay, just making sure that I’m fine.
0:11:55 Yeah.
0:11:58 It is a very rudimentary perception,
0:11:59 kind of what is it seeing,
0:12:02 what context does it have and cognition,
0:12:04 how is it planning, how is it reasoning,
0:12:06 very simple engines that drive that.
0:12:08 Really, I’m going to talk to it.
0:12:10 It’s going to say, “Oh, you know,
0:12:12 this person cares about architecture.
0:12:14 I’m going to pull everything I know about architecture,
0:12:16 turn that into a nice response,
0:12:19 weave together kind of their question into my response,
0:12:22 and then boom, now we have simple conversation AI.”
0:12:24 But not too far ahead of that.
0:12:26 And I say that lovingly and endearingly
0:12:29 is a simple task completion AI.
0:12:31 It’s very much still call and response.
0:12:33 You have an action engine.
0:12:34 You say, “I’m having an agent
0:12:37 that can only construct new buildings
0:12:38 exactly where and when I tell them.
0:12:40 I can say place a building at this intersection
0:12:41 and it does it, boom.
0:12:43 I can say change the building to this type
0:12:44 and boom, it does it.”
0:12:44 Right.
0:12:46 And while you argue that
0:12:47 that requires a little bit more perception,
0:12:49 maybe a little bit more reasoning,
0:12:50 definitely some action,
0:12:53 it’s limited to an extremely small set of actions
0:12:56 and it’s pretty much just scripting with extra steps.
0:12:58 It knows what it’s going to expect.
0:12:59 It’s waiting for me to say it in a way
0:13:02 that it can map to that is the place of building action.
0:13:04 I’m going to place a building done.
0:13:08 If a language model provides the description,
0:13:11 including like I wish I could change the background
0:13:13 to mountains or whatever, right?
0:13:15 Or envision a background with mountains
0:13:16 and it can’t act, right?
0:13:17 Can the system be automated?
0:13:18 What you’re talking about
0:13:23 so that the language model can tell the action cognition model?
0:13:24 What to do?
0:13:25 I forget what term you used.
0:13:28 I have a large action model in my head,
0:13:30 which yeah, I have thoughts about that terminology,
0:13:32 but I don’t know if they’re my own thoughts or not.
0:13:34 So we’ll get into it.
0:13:36 But you’re definitely hinting at, and again,
0:13:38 I should have upfront said this.
0:13:39 There are four phases.
0:13:41 We are at phase two.
0:13:43 I think a lot of virtual assistants are at that phase two.
0:13:46 What you’re describing in kind of intent recognition
0:13:49 or kind of passive intent recognition even
0:13:51 is there are no formal names for this,
0:13:54 but I’ll call this like an adaptive partner phase
0:13:56 where the AI is observing
0:13:58 and responding to changes on its own, right?
0:13:59 And this is the natural evolution
0:14:02 of where a lot of AI systems are heading today.
0:14:03 CES, a good example of that.
0:14:05 I have no doubt throughout the year,
0:14:07 you’ll get really close to getting here
0:14:09 as another core standard.
0:14:10 I think right now,
0:14:12 a lot of the kind of simple task completion AI
0:14:15 is a core standard across enterprise and gaming.
0:14:16 This adaptive partner phase
0:14:20 where this agent in this analogy, extended analogy,
0:14:22 would notice changes like newly built roads
0:14:25 or new residents influx into this tiny area
0:14:28 and automatically adapt construction plans.
0:14:30 It’s not micromanaging every decision,
0:14:33 but it feels like you’re collaborating with an agent
0:14:36 or a unit that has just enough context
0:14:37 to make smart decisions on its own,
0:14:40 like an evolution of a recommendation engine
0:14:43 being driven by a cognition engine here.
0:14:44 So it’s not just learning,
0:14:47 but it feels like it’s learning what we need
0:14:48 even before we ask it.
0:14:51 And I expect to see, again, a lot of this in the next year.
0:14:53 Again, I think that’s phase three.
0:14:55 I think there’s still a phase four.
0:14:57 And I think that’s a fully autonomous agent.
0:15:00 And that stage, again, continuing our analogy,
0:15:02 is a player two, right?
0:15:04 Where player three is it’s adapting to us.
0:15:07 Stage four is, hey, this thing is an agent all on its own.
0:15:10 It feels like I’m playing against another human.
0:15:11 It is making decisions that feel optimal
0:15:14 to its own objectives, alike to mine or not.
0:15:18 And I think this is where a lot of people think
0:15:19 about agentic AI.
0:15:21 They think we are there today.
0:15:24 There’s quite a big gap in the deployment of these systems
0:15:26 to get to that truly autonomous agent.
0:15:29 That is kind of the gold rush right now
0:15:32 is creating a cognition and a perception
0:15:34 and an action engine ecosystem
0:15:38 that can feel natural to a user.
0:15:39 I’m speaking with Chris Covert.
0:15:43 Chris is Director of Product Experiences at N-World AI.
0:15:45 I made him kind of, I made us kind of put a pin
0:15:47 in their product announcement to kind of bring this
0:15:50 into the, from the abstract a little bit into the concrete.
0:15:52 I’m glad you humor me, Chris, because I think kind of digging
0:15:54 into, well, what does this stuff actually mean?
0:15:55 And it’s sort of moving.
0:15:57 And one of the things I should say it’s kind of moving
0:15:58 and evolving.
0:16:00 Maybe the lingo is moving, targeting,
0:16:03 but the tech and what’s actually happening is evolving
0:16:04 and still so fast.
0:16:04 All right.
0:16:07 So let’s talk about the big announcement from CES.
0:16:10 You said you wanted to talk through kind of the design
0:16:12 process a little bit and how you make these choices.
0:16:14 So if that’s something that makes sense to you to do
0:16:17 in the context of talking about the streaming assistant,
0:16:18 great.
0:16:20 And if not, take it whichever way you want to go first.
0:16:21 That sounds perfect.
0:16:21 Yeah.
0:16:23 So I’ll be honest around here, right?
0:16:26 In any demo that you see publicly,
0:16:28 especially when you have more than one company demoing it,
0:16:31 there’s a lot of design, but also a lot of urgency
0:16:33 to show something quickly and cool.
0:16:37 So of the things that I’ll say are design axioms
0:16:40 that I love to live by, we had to sacrifice some
0:16:42 just to make sure that something awesome could
0:16:44 be demonstrated at CES.
0:16:46 Putting a timeline on creative vision is always challenging.
0:16:47 But–
0:16:47 Right.
0:16:49 And that’s like you can’t– we live in–
0:16:51 I mean, I don’t build software for a living.
0:16:52 You do.
0:16:55 But we live in such an age of like, we’re past this state,
0:16:58 where beta became, all right, everybody ships beta.
0:16:59 And then we got used to it.
0:17:01 And it’s like, right, there’ll be updates
0:17:03 and downloadable content and this and that.
0:17:05 But not CES.
0:17:05 That’s the date.
0:17:06 Show up or don’t.
0:17:07 So–
0:17:08 Exactly.
0:17:08 Exactly.
0:17:11 But in terms of design, and this might sound odd,
0:17:13 coming from a tech startup on the call here today.
0:17:16 But my first piece of advice, regardless of who we’re
0:17:18 working with, is always to ignore the technology
0:17:20 and focus solely on the experience
0:17:22 that you really want to create.
0:17:24 Where is the value in the experience?
0:17:25 The why?
0:17:28 And as soon as you can lead, or not as soon as you can lead
0:17:31 with the tech, but as soon as you do lead with the tech,
0:17:32 hey, I have an idea.
0:17:34 What if we had an LLM that could?
0:17:36 You already are risking thinking too narrowly
0:17:38 and missing bigger opportunities.
0:17:41 And I genuinely would love to say the reason behind this
0:17:42 is because it’s human-centered.
0:17:45 I come from the intersection of autonomous systems, AI,
0:17:46 and human-centered design.
0:17:49 So I would love to say that it’s a human-centered approach
0:17:50 to design, and that’s why I do it.
0:17:53 But in reality, is that the technology just advances
0:17:55 so quickly by the time your idea goes into production,
0:17:58 that tech-grounded idea that you had now
0:18:00 looks like a stone wheel compared to the race car
0:18:01 you thought it was six months ago.
0:18:05 So again, like the urgency behind some of these things
0:18:08 is challenging, especially when you come from a tech-grounded
0:18:10 design principle standard.
0:18:14 So I’m a firm believer in the moonshot first approach,
0:18:17 where you begin by clarifying why you want to build something
0:18:20 before how you decide to build it, especially for anything
0:18:22 with a shelf life greater than just a few months at this stage.
0:18:24 Again, if you start with how you end up
0:18:26 with a bunch of low-hanging fruit,
0:18:28 and a bunch of low-hanging fruit makes for a really bad smoothie.
0:18:32 So what is a moonshot experience?
0:18:35 Like, I often refer in design thinking workshops
0:18:38 to the classic impact versus feasibility matrix.
0:18:41 And this isn’t like a very exciting conversation.
0:18:42 So I’ll try and keep it high energy.
0:18:44 But typically, when you’re looking at impacts–
0:18:46 I was just saying the smoothie.
0:18:48 Sorry, I couldn’t let the smoothie go unacknowledged,
0:18:50 but I didn’t want to interrupt it.
0:18:51 Well, we’ll come back to smoothie right for now.
0:18:52 OK, good.
0:18:55 Because again, this is that it’s in that quadrant,
0:18:58 or it’s in that matrix, impact feasibility.
0:19:00 Your low-hanging fruit ideas are the things
0:19:03 that are going to be low feasibility or low impact.
0:19:06 But typically, you’re aiming for the upper right quadrant,
0:19:09 which is the highest feasibility and the highest value,
0:19:11 where impact and feasibility are so high,
0:19:13 the ideas feel easy to build
0:19:15 and have a lot of inherent value in them.
0:19:17 But what I’ve experienced when working with partners
0:19:20 and their AI ambitions is that that’s actually a trap,
0:19:23 that that quadrant right there is doomed for failure.
0:19:28 Because often the ideas that are the ones worthy of pursuit
0:19:31 the most are the ideas that almost feel impossible,
0:19:34 but if real would deliver extraordinary impact.
0:19:35 Right, that’s right.
0:19:37 As you get closer to building that value,
0:19:40 and you’re going from idea to prototype to implementation,
0:19:45 the technology is likely grown in not only capability,
0:19:47 efficiency, but also accessibility in ways
0:19:51 that probably outpace our own kind of traditional lens
0:19:54 of what is feasible and our imagination of where the tech can go.
0:19:56 So that sounds like a lot of nothing.
0:19:58 So my actual advice is,
0:20:01 when we’re designing these types of experiences,
0:20:03 start from that moonshot that feels impossible
0:20:06 and then break it down into a functional roadmap.
0:20:09 What you saw at CES in this demo with Streamlabs
0:20:11 and NVIDIA and in-world technology
0:20:13 creating the streamer assistant
0:20:15 is actually just Horizon 1, that insight.
0:20:17 What can we build that gives us insight
0:20:20 into whether this is something that provides value?
0:20:22 Where does it provide value and to who?
0:20:25 The thing we demonstrated at CES
0:20:30 was just the first pass proof of concept of this capability.
0:20:34 The kind of upper potential of where this can go,
0:20:35 where we want this to go,
0:20:37 and where we’re going to explore its value
0:20:42 is still very much architected out toward a much longer roadmap.
0:20:45 And again, showing what we showed at CES
0:20:47 and what I love about this industry right now
0:20:50 is we’re able to show really compelling integrations
0:20:53 with very meaningful use cases in this industry.
0:20:55 I’m not about to wax poetic,
0:20:58 like our industry is the same as healthcare
0:20:59 or things like that and enterprise,
0:21:02 but showing digital companions, virtual assistants,
0:21:05 digital humans at the state that we’re showing them now,
0:21:08 knowing what’s to come is an incredible place
0:21:10 to look ahead and say, “Okay, cool.
0:21:13 This scratches an itch that either the market needed
0:21:16 or that technologically is doing something
0:21:17 that could never be done traditionally before.
0:21:23 Where this moves in the next six months to six years
0:21:24 is anyone’s game.”
0:21:27 Okay. So two questions, but they’re related.
0:21:29 One’s easy. The other’s a follow-up.
0:21:32 What is the availability of the assistant?
0:21:34 It was demo state very early.
0:21:36 Do you have kind of a roadmap for that?
0:21:38 I shouldn’t say availability of a roadmap.
0:21:40 And that might be the answer to the second part,
0:21:44 but what else is on the horizon for you for in-world AI?
0:21:46 I mean, you talked a little bit before
0:21:49 about the broader horizon for agentic AI
0:21:51 and avatars and assistants and such,
0:21:52 but you can go back there if you like.
0:21:53 What’s coming up this year,
0:21:56 kind of maybe near-term that you’re excited about?
0:21:58 Well, man, this year being near-term
0:22:01 is funny at InWorld because we move very quickly.
0:22:03 This year is many near-terms stacked against each other.
0:22:06 Well, right. And I’m figuring we’re taping now,
0:22:08 but it’s going to go live.
0:22:10 You know, there’s a little bit of a buffer
0:22:12 and we reference CES.
0:22:14 Yeah. So we certainly have productization ambitions
0:22:15 for this demo.
0:22:18 What we’re doing post-CES is we’re taking the feedback
0:22:21 of what we’ve built and we’re augmenting it in many new ways.
0:22:24 Again, what was built for the demo was a proof of concept.
0:22:27 As many proof of concepts on show floors are,
0:22:29 it wasn’t robust to all the inputs we would want.
0:22:30 It wasn’t robust to all the games
0:22:32 that we would want it to be playable with.
0:22:34 So we’re trying to build out that ecosystem
0:22:35 in an intelligent strategic way
0:22:37 so that if it were to go to market,
0:22:40 it would be usable to as many streamers as possible
0:22:42 who wanted to leverage this type of technology.
0:22:46 So keep your ears on the beat for what’s about to come out.
0:22:50 I have no doubt that between Nvidia, InWorld, and Streamlabs,
0:22:52 all announcements and all possible show floors
0:22:54 that we can show our advancements on
0:22:56 will be shown at the right time.
0:22:57 So super exciting for that.
0:22:59 As it relates to InWorld, oh boy,
0:23:01 it’s such a fascinating question
0:23:03 because a lot of what we’re doing,
0:23:04 we’re doing with partners
0:23:07 with such fascinating game development lead times
0:23:09 that I hope that we’ll be playing
0:23:11 more in-world driven experiences
0:23:13 in the next, you know, when this releases,
0:23:15 but also over the next four or five years,
0:23:17 depending on the scale of the company we’re working with.
0:23:21 So I’m genuinely excited for what’s to hold
0:23:22 as our platform develops.
0:23:25 Again, as we’ve seen in just the last year or so
0:23:27 with more and more competitive players
0:23:30 in the space of providing AI tools,
0:23:32 and then just fantastic partners
0:23:35 that are helping this industry become more accessible
0:23:37 to people of all different backgrounds,
0:23:41 really hope to see the, I wouldn’t say consolidation of tools,
0:23:43 but the accessibility, I’ll keep using that word,
0:23:47 the accessibility of different AI platforms.
0:23:50 Hey, I want to use this model, but in this engine,
0:23:51 to become a lot easier.
0:23:53 And InWorld’s goal is certainly to make that happen
0:23:55 for as many industries as we can,
0:23:58 in particular the gaming and entertainment industry,
0:24:00 but it doesn’t happen without partners like Nvidia
0:24:02 and the work that you guys are doing with ACE.
0:24:05 So where I, you know, where I think InWorld is going
0:24:08 is to make that easier, to continue to work with studios
0:24:11 to find the fun and to convince every player.
0:24:14 And in particular, you know, let me be honest,
0:24:16 the YouTube commenters that, hey,
0:24:19 there is actually a world here where this technology
0:24:21 is not only fun and immersive,
0:24:23 but it’s something that the entire industry views
0:24:25 as it’s gold standard.
0:24:27 So I think we’re there a lot sooner than we think.
0:24:29 I think it’s right around the corner,
0:24:33 but could not be more excited to continue
0:24:35 to work with creatives to help them tell stories,
0:24:39 to help them, you know, flex and use their imagination
0:24:42 as much as possible to make the best possible experiences.
0:24:43 We do it every day.
0:24:45 We may not see it in the near term
0:24:47 because games take a while to make,
0:24:49 but genuinely excited.
0:24:51 Excellent. Well, I’ll offer more fun.
0:24:53 The world always needs more fun to counterbalance everything else.
0:24:56 Chris, for listeners who would like to learn more
0:24:59 about anything specific we might have talked about
0:25:01 or just broadly about InWorld,
0:25:02 where can they go online?
0:25:05 Website, obviously, but are there social handles?
0:25:08 Is there a separate blog or even research blog?
0:25:09 Where would you direct folks?
0:25:10 Yes, to all of the above.
0:25:13 You can find all of that at inworld.ai.
0:25:17 And we have our blog for partners and experiences there.
0:25:18 We have technical releases.
0:25:19 We have research done.
0:25:21 All of our socials are linked there.
0:25:22 If you want to stay up to date
0:25:24 with all the announcements that we make,
0:25:26 we have a lot of fun and we like to talk about it.
0:25:28 So definitely stay up to date.
0:25:30 Fantastic. Well, thanks for taking a minute
0:25:31 to come talk about it with us.
0:25:32 We appreciate it.
0:25:35 And best of luck to you and InWorld AI
0:25:36 with everything you’re doing this year.
0:25:39 And maybe we can do it again down the road.
0:25:41 I love it. Thank you so much, Noah. Appreciate it.
0:25:44 [MUSIC PLAYING]
0:25:48 [MUSIC PLAYING]
0:25:51 [MUSIC PLAYING]
0:25:55 [MUSIC PLAYING]
0:25:58 [MUSIC PLAYING]
0:26:02 [MUSIC PLAYING]
0:26:06 [MUSIC PLAYING]
0:26:09 [MUSIC PLAYING]
0:26:13 [MUSIC PLAYING]
0:26:16 [MUSIC PLAYING]
0:26:20 [MUSIC PLAYING]
0:26:23 [MUSIC PLAYING]
0:26:26 [MUSIC PLAYING]
0:26:30 [MUSIC PLAYING]
0:26:40 [BLANK_AUDIO]
AI agents with advanced perception and cognition capabilities are making digital experiences more dynamic and personalized across industries. In this episode of the NVIDIA AI Podcast, Inworld AI’s Chris Covert discusses how intelligent digital humans are reshaping interactive experiences, from gaming to healthcare, and emphasizes that the key to meaningful AI experiences lies in focusing on user value rather than just technology.
-
Firsthand’s Jon Heller Shares How AI Agents Enhance Consumer Journeys in Retail – Episode 242
AI transcript
0:00:11 [music]
0:00:15 Hello and welcome to the NVIDIA AI podcast. I’m your host, Noah Kravitz.
0:00:21 The AI community is a buzz with agents. As NVIDIA CEO Jensen Huang said in his CES keynote this past
0:00:27 January, “Agentic AI represents the next wave in the evolution of generative AI and a multi-trillion
0:00:32 dollar opportunity at that.” AI agents enable applications to move beyond simple chatbot
0:00:38 interactions to tackle complex multi-step problems through sophisticated reasoning and planning.
0:00:42 They’re expected to become a centerpiece of enterprise AI systems going forward.
0:00:47 Our guest is John Heller, co-CEO and founder of First Hand, whose brand agent platform is
0:00:52 transforming how advertisers and publishers engage consumers online. First Hand is also a
0:00:57 member of Inception, NVIDIA’s program for startups. John, welcome and thanks so much for taking the
0:01:02 time to join the AI podcast. Hey Noah, thank you. It’s great to be here. Very excited.
0:01:06 So let’s start with the basics. If you don’t mind, tell us a little bit about First Hand.
0:01:12 Sure. As you said, First Hand is building the brand agent platform and that’s because when we
0:01:19 see AI, we believe it is itself a new medium, not just a technology. Perhaps I can explain that a
0:01:26 little bit about how we came to be. So I’d been working in advertising tech since the
0:01:32 internet started in the 90s and worked at DoubleClick for quite a long time. More recently,
0:01:37 it was the co-CEO and co-founder of FreeWheel, which is now Comcast ad-serving platform.
0:01:43 And I’ve been doing ad tech for a long time. Actually, when I left Comcast, I went back to
0:01:48 school finger quoted in AI, so online courses, because I wanted to work in something that was
0:01:54 much more broadly applicable and much earlier in its life. And it actually started by working on
0:02:00 reinforcement learning robots to act as minions and artificial teammates at video games.
0:02:05 And that’s how I started learning the guts of it. I’m biting my tongue, so we don’t take us off
0:02:09 course, because that sounds fascinating as well. Oh yeah, that could be a separate conversation.
0:02:14 But I’d been working in the gaming world and some of the generative AI abilities for gaming assets
0:02:21 when language models really came out. And something struck us, something very powerful,
0:02:29 which is, and this is a metaphor for the math inside, but AI now understands the ideas and
0:02:34 intense needs you may have from what you’re reading, what you’re watching, what you might
0:02:40 ask it outright. And it can go find the right response or take the right responding action.
0:02:43 And everything is presented to you in a very natural human way.
0:02:50 And if you back up the step and think of that happening all the way through a consumer’s use
0:02:54 of the digital world from when they’re searching and becoming aware of things they might need,
0:02:59 when they do some investigation and read up on products or services, when they go to
0:03:04 browse or shop, when they buy, all of those modes change pretty fundamentally. They don’t
0:03:10 replace, we think they get enhanced, because instead of the world of the past, where I maybe
0:03:16 did a search, got some directions and a link went to a place, read up on something,
0:03:20 browsed for something, went to saw an ad, maybe went to another place to try to find the version
0:03:27 I want. Those are all sort of separate hops, the internet where it’s the same content everybody
0:03:35 sees. AI instead is going to understand and learn at each moment what it is you need. And as with
0:03:41 most things AI data is the core, the people who have the most and best data about a product or
0:03:46 service are the brands. They are the retailers and the people who sell it. So they can create
0:03:52 brand agents, which means your experience on the internet at all of those moments in the journey
0:03:57 from first learning about it to figuring out what the right configuration is and comparing and
0:04:03 browsing and buying is going to adapt on the fly through these agents. So it doesn’t replace the
0:04:09 web, but it changes things from you looking at stuff someone wrote to something that’s
0:04:14 partially adapting to what you actually need, understanding your needs. But the agents that
0:04:19 are doing that for you are from the retailers and brands themselves, because it’s their data that
0:04:25 is what you need. And that sort of changes the internet and the kind of your internet for both
0:04:33 parties. And that really makes it a much more adaptive form fitting to what you need in enhancement
0:04:39 to what’s already there. But that from a media marketing perspective changes everything.
0:04:44 Yeah, sure. It’s interesting to hear you talk about that from the media marketing perspective.
0:04:50 We’re recording this in early 2025. Happy New Year everybody. Last year I had, especially towards
0:04:55 the end of the year, had a lot of conversations with people about agents and agent to AI,
0:05:02 a few of them on the podcast. And as a sort of consumer end user of things, I keep thinking about
0:05:07 the, and I like the way that you put it, changing it from the web to your web. I keep
0:05:13 thinking about it from the perspective of, oh, maybe in the future, I won’t have to go seek out
0:05:18 the information I want because my agents will know me and know what to go find and bring it back to
0:05:24 me. But it’s interesting to hear you talk about the agents actually being on, I guess, the advertiser
0:05:30 brand publisher side, because they already have the data about my habits. Well, it’s not that they
0:05:34 have the data about your habits. They haven’t about their products and services. So I’m honestly
0:05:39 correct. If you look ahead, consumers have, and we’ll have agents that understand them,
0:05:45 but that’s half the conversation, if you will. Got it. Okay. Those agents know what they need,
0:05:51 perhaps, and what they’re trying to do, but no one’s going to know more about the uses and
0:05:59 values of their sauces or beauty products or furnishings or cars than the brands and the
0:06:06 retailers that sell them. So we see sort of two halves to this. There’s the agents that consumers
0:06:12 may use, which may just be enhancements to the web as is. They may grow from that into something new,
0:06:16 but the folks who can answer their needs and help them not just to learn more, but actually
0:06:21 finish and convert, take the action are the brands and publishers. So for them to be able
0:06:26 to equip their agents, but then send them to all the different places consumers need,
0:06:30 that’s a big change. You’re kind of indicating the knowledge and expertise
0:06:33 out to where the people needed at their different moments on their journey,
0:06:37 which is wildly different than having to get them all to come to you.
0:06:43 Right. Got it. No, it makes a lot of sense. And so first hand is young. When was it founded and
0:06:48 what was kind of, you got into this a little bit, but what was kind of the moment where it came
0:06:53 together and you decided to launch the company? So I was working in generative AR trying to figure
0:06:59 out how to make gameplayable assets much more easy to produce and some commerce ideas around that.
0:07:05 When the language models first came out and we understood this change, that the folks who have
0:07:10 the best information about the products and services you might need are going to want to
0:07:14 speak in their own voice through their own agents, but everywhere. Right. And that that changes
0:07:19 everything and that the folks who have additional information, the publishers who have the expertise
0:07:26 on the trends or topics in general also want to have a voice in that conversation. That really
0:07:31 changed things again from that sort of the internet to your internet, then that sort of was the spark.
0:07:36 That’s like that’s white space. That is not how any advertising is done today. That’s not how the
0:07:40 technology stacks work. That’s not how the data management works. It’s not how the measurement
0:07:46 works. And from an entrepreneur’s point of view, white space is fantastic because you have to,
0:07:51 first of all, it’s just fun because you have to invent it. Right. But you’re not retrofitting or
0:07:56 dealing with, there’s less legacy to try to deal with. And you can, and at this point,
0:08:00 having done marketing for this many decades, there’s a lot of lessons learned about now
0:08:04 that I can start with a clean slate, I’m going to do it this way. And then the power in AI is just,
0:08:11 it’s astounding. So that was just very exciting. And it does change retail, commerce, advertising,
0:08:18 martech, customer research. So that’s a lot of playing fields. So we, I need co-founders that
0:08:22 was thought number one. But having done this for a while, I have just very good friends that I’ve
0:08:30 worked with for decades. And Michael Rubenstein, my co-CEO is someone I’ve known since late 90s,
0:08:35 double click, and way my CTO co-founder, someone that was instrumental in the engineering and
0:08:40 freewheel, you get to work with your friends that are industry experts in white space on
0:08:45 something that is this exciting, not just because of the change in the industry, but,
0:08:50 you know, just a little full disclosure here. The technology is just really fun to understand
0:08:56 and work with. Isn’t it? Yeah. It’s fantastic. Yeah. And then just, if I may, because advertising tech
0:09:02 is just a massive, massive scale, the inference volumes and speeds and things, just a total
0:09:07 different world. Working with companies like Nvidia on figuring out how to make that actually
0:09:13 properly for that kind of an industry. I mean, it sleeves rolled up thumb. Right. Yeah. We formed
0:09:24 in August of 2023. So not too too old, but we are 27 people now. We were live in the summer. We’ve
0:09:28 got, you know, we haven’t done case studies public yet, but we’ve got, you know, actual results that
0:09:35 when you actually adapt the content and the websites and the journey, if you will, for consumers
0:09:41 into something that’s fitting the needs for them, the effects are or multiples greater than
0:09:44 traditional advertising. I don’t want to harp on the nomenclature here, but I want to ask you a
0:09:50 little bit about the word agent and brand agents in particular. I read a definition of agents
0:09:55 in the intro, but, you know, for every conversation I’ve had with people about agents, there’s been
0:10:01 at least one new kind of working definition about what that might mean. Can you talk about the concept
0:10:06 of a brand agent as firsthand uses it? And specifically, how should companies be thinking
0:10:12 about brand agents as different from other types of AI agents? So I like to think of it from what
0:10:16 are you trying, what’s it supposed to accomplish? Right. So when I think about agents as I read
0:10:25 about them, most of what I’ve observed is here’s a somewhat autonomous execution engine that will
0:10:30 go create productivity by making something more efficient. That’s a solid definition. I’ll take
0:10:36 it. Yeah. That’s fantastic. That’s a wonderful, useful, make what you already do work better
0:10:41 use of AI. Right. But when you’re talking about how to participate in a brand new medium,
0:10:46 it’s a very different set of use cases because you’re talking about how do I help a prospect
0:10:51 and a consumer through, call it a connected journey from being aware of something to learning about
0:10:55 what you might want, to finding what version you want, to actually getting it. There’s many
0:11:01 different objectives and purposes there. So for us, an agent is actually a composable collection
0:11:06 of AI capabilities. So you might even think of an agent as containing many agents. We use the
0:11:13 right capability. And you would do one with a particular composition for say, awareness raising,
0:11:19 a different one for acquisition, maybe different one for conversion improvement in upsell. And
0:11:24 what that means is you do a few things. First, underneath the whole product is something called
0:11:31 Lakebed, which is a kind of intellectual property rights data management platform so that agents
0:11:37 in whatever form they take can cooperate on data across parties. And just to clarify, Lakebed is
0:11:41 a first-hand product. Yeah. It always comes back to data in order for an agent to do whatever its
0:11:47 job is. If it’s happening out, if it’s for a retailer and it’s happening out on a publisher,
0:11:52 it needs to be able to, because it benefits both parties, safely use the information of the
0:11:57 publisher about, if you’re on a website about decorating and decor and home and furnishings,
0:12:03 and a retailer wants to help you with that, then they need to know not only all of their products
0:12:08 and services and offerings, but everything about what you’re looking at and the content of the page,
0:12:12 and then it’s beneficial to what parties, if it’s done with full intellectual property and
0:12:17 privacy control rights, because no one exits an event with data they didn’t enter the event with,
0:12:22 if you will. But it makes the agent capable. And then that agent might help you instead of
0:12:27 seeing what you would traditionally see as an ad, it may say, “Look, are you trying to furnish a new
0:12:32 home? Are you trying to refresh for holiday entertaining? Are you downsizing and looking
0:12:37 to sort of deal with a smaller property?” And it learns, as I said, this is the new medium of
0:12:42 effect. That agent, even though it’s happening, retail agent is happening on the publisher property,
0:12:47 it’s learning about what you need, and it’s presenting the information. And here’s where
0:12:52 the capabilities come into play. It could simply be content presentation, if you want to have open
0:12:56 Q&A, you could add a chat capability. You could close the capabilities that make sense for what
0:13:01 you’re accomplishing. But it learns, and let’s say if someone is furnishing a new home, then it’s
0:13:05 going to present them with like, here’s the five key starter pieces you need, and here’s how to
0:13:11 turn a house into a home quickly with color and accessories. And then this is new medium again.
0:13:15 Once someone engages with that agent and the retailer, that’s the first party relationship
0:13:20 between the retailer and the consumer, which is an arcane element in marketing. But that means that
0:13:25 when that consumer leaves that agent and arrives at the retailer, the retailer is allowed to
0:13:30 understand everything that occurred in that agent out in the wild. So that means when they arrive,
0:13:35 instead of arriving in the past, it would be the landing page everybody sees, the landing page itself
0:13:41 can be another agent that is therefore composing the content of the page to fit the needs of what
0:13:47 brought you there in the first case, and offering up additional help to find the right configuration
0:13:52 that makes sense in this need, etc, etc, which is beneficial to the consumer, but obviously helps
0:13:56 sell more product and is beneficial to the retailer and obviously creating great value
0:14:00 for the publisher and selling that media opportunity. Everyone has to cooperate in a way
0:14:06 that’s safe for each other. And this is why we think we’re seeing results multiples and multiples
0:14:10 better than traditional ads, is because you’re creating a utility for the consumer that happens
0:14:16 to be directly beneficial to the publisher and retailer doing it, or the brand if they’re the
0:14:20 ones who are on the other side of the conversation. Right, right. I’m going to ask you one of those
0:14:24 questions where I think I know what the answer is, and usually those are the ones that are good to
0:14:30 ask because I’m often quite wrong. Can you kind of describe the difference sort of in as plain
0:14:36 terms as you like between what you’re describing and the chatbot experience that consumers may
0:14:41 be having now? So the key is that in different circumstances with different needs, if it’s
0:14:46 on a search engine, if it’s in social, if it’s on a publisher, if it’s on your own retail site or
0:14:51 the manufacturer brand’s home, their site, what you want to do, again, starts with what its intention
0:14:58 is, then we separate the brain of the agent, which is a few pieces from Lake Bend. What is it
0:15:03 allowed to know? So a retailer may want it to know only about their house brand products,
0:15:07 or they may want it to know about every product they offer. So that’s sort of a business choice.
0:15:11 Right, right. And then you choose the sets of capabilities you want it to produce.
0:15:17 So it may be simply content, the appropriate content selection and presentation. So it’s
0:15:22 essentially building a mini website, if you will, on the fly to help them with set up a choose your
0:15:28 own adventure structure. If you want to add to that chat so that they can ask a question you
0:15:32 didn’t interpret appropriately, then that’s just additional capability. And that’s the agent that
0:15:36 may happen out in the wild. You sort of wrap that up as a campaign and send it out there and say,
0:15:41 I want it to happen this many times in these places. And if someone goes through that agent
0:15:46 onto the destination site, so for example, and what we’ve seen so far, folks are, when they’re
0:15:51 looking at an agent, they’re going to the page about the actual thing that that agent talked to
0:15:56 them about multiples on multiples on multiples higher rates than an ad does. And they’re going
0:16:01 to many, many, many of them. So hundreds of different ones, which means you’re getting people
0:16:07 to what they actually need more efficiently. And that’s not a chat experience, it’s more interwoven
0:16:13 into the way people already serve, because that last part of the agent is something we call a
0:16:18 frame. And you can call that its UI. What is the user experience you want to present? Is it image
0:16:24 rich or is it text centric? It’s almost more of an app in a way than an agent, because how it
0:16:30 appears and the way you interact with it is at least as important, frankly, for marketing
0:16:34 as what it’s able to say. Right, no, that makes a lot of sense. And then, frankly, it continues its
0:16:40 help to you through your connected journey. So search from the publisher onto the retailer,
0:16:44 perhaps onto the brand, it has contacts throughout, which I think, you know, no one likes to start
0:16:50 from scratch again. Again, so an agent is this mix of what is it supposed to know? What is it
0:16:56 objective? Is it content presentation plus open dialogue? Is it actually constructing the page
0:17:02 you’re on while it’s talking to you? So you pick the set’s capabilities, then you layer on that UI.
0:17:06 But what that does is that’s the changing the internet to your internet, because it’s taking
0:17:12 those discrete hops. I searched for something, I read an article, I saw an ad, I clicked on it,
0:17:16 I went to the site, I started from the beginning, I found the product, and it’s turning that into
0:17:21 one connected journey. Right. It understands and helps me with what I need at each of those steps,
0:17:26 and it’s with me the whole way with full context, because that’s the relationship between me and
0:17:32 the retailer. The location is less important. So you’ve spoken to this, but maybe we can put a
0:17:37 point or you can put kind of multiple points on it. But what are some of the most exciting
0:17:44 areas of potential or even specific problems that brand agents can solve for marketers and kind of,
0:17:49 you know, ways that the retail experience, I mean, you’ve been talking about it with this idea of
0:17:55 composing, you know, your web on the fly to suit what you need. Are there other specific problems
0:17:59 or big things that you’re excited about when it comes to how the experience is going to be
0:18:05 transformed by brand agents? Well, I think the very, very fundamental change is being able to take
0:18:11 the expertise and the knowledge of the retailer or the market or the brand and put it in the moment
0:18:17 wherever the consumer is. So taking their ability and putting it into an agent and sending that
0:18:23 agent everywhere. So you’re helping the consumer at all those moments is a massive change because
0:18:29 that’s why instead of trying to sort of common denominator, pick them one message and figure
0:18:34 out the targeting about exactly what this one thing should match against the AI. This is what AI
0:18:39 is designed for. It understands and learns what’s the right thing in this situation, which means
0:18:45 because your internet is just as true for the marketer as it is for the consumer. So it’s taking
0:18:49 of all the things you could do. This is what is useful to this person here in this circumstance.
0:18:56 So that means instead of having to fine tune and sort of try to hit the bullseye, you can kind of
0:19:01 let this find all the bullseyes. And you can see that evidencing by the fact that it’s sending
0:19:06 people to hundreds of different destinations instead of just the one or two. But then if you
0:19:12 can continue that connected journey when they arrive and have that fit to what they need,
0:19:17 and that covers everything from like food and beauty to finance and auto and all forms of
0:19:21 retail, the odds of them getting to the thing they need and getting the version of it that
0:19:25 they need and being able to ask any follow on questions about how to actually pick the right
0:19:30 set of it. We don’t have data on this yet, but I cannot wait to do conversion tracking because
0:19:37 I am very excited about that. So I think that is a wonderful opportunity that has never existed
0:19:42 before as opposed to a problem solved. Right. Something you said earlier, the way you described
0:19:49 the current process of you have something that you want to find or you need or you’re decorating
0:19:54 your home and don’t know where to start. And so you do a web search and you get a list of results
0:20:00 and you pick one and you go to it and then you read the page, scan it to find what you’re looking for.
0:20:05 And the idea of that process changing, it makes me think of things that marketers, I’ve read and
0:20:13 talked to marketers saying that AI may fundamentally change the whole web experience starting with
0:20:18 breaking SEO. Do you have thoughts on that? So a couple things here. The world has been built
0:20:24 at this point where there is a fairly central place to go ask your question, looking backward.
0:20:33 Right. And that sent traffic out to the places to go read the details where people might run ads
0:20:38 to then get you to go to the place where you can consummate the purchase and learn more about the
0:20:44 purchase. But that hop, hop, hop is completely changing because the knowledge is pushing out
0:20:49 to the consumer at all those different places. They still want to go to publications and read
0:20:54 about the details and what’s the best version and then to understand what’s new and trendy. So
0:21:00 everything still exists. It’s just more adaptive and it’s more sort of distributed out. I think it’s
0:21:06 less centralized trying to get someone at that first question and get them at all the moments
0:21:10 of question throughout all their moments of need on that journey. And if you can make that smooth
0:21:15 for someone and in context throughout, that’s so much better for the consumer. But it is a,
0:21:19 it is still an enhancement to the internet. I think the way digital is great. It just makes it
0:21:23 much more, it’s the internet to your internet. It makes it richer and easier to do stuff.
0:21:30 Yeah. I’m speaking with John Heller. John is co-CEO and founder of First Hand. He’s an ad tech
0:21:37 veteran, started First Hand with friends from his ad tech days. And they are using AI, AI agents
0:21:43 specifically to transform the way that publishers and advertisers connect with consumers, engage
0:21:47 with them, help them throughout their journeys. Everything John’s been talking about to this
0:21:53 point. John, you’ve been speaking about all the different ways sort of conceptually and concretely
0:21:59 that AI, AI agents, the brand agent platform that First Hand is building can change the way
0:22:04 that consumers, you know, go on their journeys and that advertisers, publishers engage with them.
0:22:11 Are there specific industries, specific experience types? I mean, to me, this sounds like a
0:22:17 transformative thing that’s, you know, far reaching and then some. But are there certain areas that you
0:22:21 think are really, you know, right for transformation and you’re particularly excited about?
0:22:27 So one of the reasons this has been so much fun and starting something that’s so white-space
0:22:33 is so enjoyable is you are learning things that you didn’t understand. I started thinking this is
0:22:39 for high consideration stuff, financial services, automotive, telecom, consumer, things where you
0:22:46 have lots and lots of details to go figure out. Yeah. Nope. I mean, yes, but more, you know, food,
0:22:50 like what dishes would make sense for the holidays and how can I sort of help you figure out the
0:22:57 shopping list for a great Thanksgiving? Or did you know that these products just lead for balanced
0:23:04 healthy life? So what we found is as much interest in how this can benefit your lifestyle in that
0:23:09 path as much as how much this product is useful for its features. So the interest has been very
0:23:15 broad and it’s not just been broad from industries, it’s been broad for sort of where in the distribution
0:23:21 chain it fits as much on their own properties as it is distributed out across where they market.
0:23:26 So it’s as much customer as prospect. Right. We build a platform. This is the reason our
0:23:31 agents are very composable and collections of abilities as opposed to just an agent is we
0:23:36 thought that they would have broad application and the breadth surprised me. Yeah. Talk a little bit
0:23:43 about the experience of building brand agents and when, you know, in listening to the past 20 minutes,
0:23:50 I’ve been thinking about LLMs and the chatbot experience and using RAG to, you know, put
0:23:55 my company’s specific product information or what have you in there to let the chatbot find,
0:24:00 you know, the correct information. How is the process of, you know, thinking about and building
0:24:06 and deploying agents and brand agents specifically? What’s that like? Is it similar to building with
0:24:12 an LLM and RAG? Is it radically different? Talk about that if you would. It’s much more nuanced.
0:24:16 And the reason I say that, I’ll use an example. If it’s a retailer with several hundred thousand
0:24:23 SKUs, then the objective they’re trying to solve for is different than someone who sells
0:24:29 like three packages, an energy package, a sports package, and a family package. So that affects
0:24:36 how you retrieve, right? You actually, basically the platform we built is exceptionally composable
0:24:43 for many, many reasons because the thing you’re making the agent solve for affects the way you
0:24:47 embed and tokenize. It affects the way you retrieve. It expects the way you evaluate what you
0:24:54 retrieved and ranked. It affects how you generate and the instructions you give it and then how you
0:24:59 evaluate what was generated to make sure it was doing what you needed it to do. Because one of
0:25:04 the interesting things about marketing is there’s almost as many rules about what not to do as there
0:25:10 to do. So there’s quite a lot of highly flexible control layers around making sure it didn’t do
0:25:14 the thing it wasn’t supposed to do as well as did what you wanted. And all of that needs to be very
0:25:20 composable to the objectives of the parties involved. It’s always a supply chain across
0:25:27 multiple parties. So lots of rights management. And then it’s generating information that’s never
0:25:33 existed before because every agent, because that’s a first party interaction, is not just
0:25:37 generating metrics. How many times does someone engage with it? What did they click on? Did they
0:25:44 go somewhere? Did they click on a citation or call to action or a product image? But it’s recording
0:25:48 the transcripts of what was actually presented. What was the text of what you have a full survey
0:25:53 dialogue transcript of all these interactions, which is almost as if all your marketing is also
0:26:01 a customer research survey at the same time. And that can be used to learn how to be better.
0:26:06 Yeah, not so wealth of data. So it has feedback loops, again, completely IP rights managed
0:26:12 protected feedback loops, so that it’s improving for the person whose data it’s using to improve.
0:26:18 But those feedback loops make it understand in this context, this configuration for this piece
0:26:24 of the puzzle fits. So I think the key thing is high configurable composability. And then also,
0:26:30 just frankly, in enterprise world, people like to use like we made it so the foundation underlying
0:26:34 models you want to use or a choice, right, you know, the one you want as a cartridge, partly
0:26:39 because different companies have different desires, or maybe they’ve tuned a model, especially for
0:26:45 their world that they like. But this is going to change so fast, we need to be able to plug cartridge
0:26:50 A out and put improved cartridge B in and let the rest of the framework still operate. Right. So
0:26:56 it is, I say the composability and the learning and checking whether it did the right thing and
0:27:02 can it do better is the key part. When it comes to learning and checking to see if it did the
0:27:08 right thing, evaluation, and maybe even dealing with hallucinations, is there or are there,
0:27:13 you know, a genetic specific techniques that you’re either using or sort of discovering is,
0:27:19 you know, evaluating a response different when you’re working with agents? Or is it kind of the
0:27:24 same? I think the key difference is you have to have lots of different types of evaluations.
0:27:30 And this is, everything’s gold-dependent. So if it’s this type of a marketer with this type of
0:27:35 an objective, if it’s a recirculation agent to increase use of time on site for a publisher,
0:27:39 that’s a different case than if it’s a retailer trying to wrap people to the right product or a
0:27:45 brand trying to get people to understand that these are great recipe shopping lists for this holiday.
0:27:51 So that doesn’t just change what you retrieve and how you generate, but it changes which sets of
0:27:56 evaluations you invoke. So that configurability matters a great deal. And all these feedback
0:28:04 loops are sort of configuration specific and company specific because you would tune a house
0:28:10 brand, you know, holiday effort is different than what you would do for everything I have on my
0:28:16 shelves, just people to the site type of thing. So I think, again, it’s being able to manage all
0:28:22 that to fit its purpose that is proving really important. Perhaps a naive question here. So
0:28:29 forgive me, but is sort of troubleshooting and running tech support for your customers made
0:28:35 infinitely more complicated because of all these, you know, nuanced specific use cases? Or is it
0:28:40 just kind of a new version of things that people in the ad tech world have been dealing with for years?
0:28:47 Interestingly, we have an entire experimentation module, which means you can run this in sort of
0:28:54 laboratory mode, right, right, watch all of its behaviors before it goes in front of consumers,
0:28:59 where you get yet another important feedback loop, which is, you know, no one is more expert
0:29:04 in the quality of their output and the appropriateness of it than the editors and the marketing
0:29:10 staffs of the brand. So they get to go into the experimentation module, have it run in lab mode
0:29:14 and say, that was great. That was good. That was great. And that creates yet another feedback loop,
0:29:21 so it improves to past muster. Once you’ve got it to a state where you’re happy with what it does,
0:29:28 that reduces the once it’s live issue set. And again, to go back to the capabilities list,
0:29:33 this is how it’s different from chat. If you picked what we call a guided conversation,
0:29:37 which is sort of a choose your adventure motif, it’s only picking from the content you’ve already
0:29:41 told it’s allowed to know. And it’s simply surfacing the right things, phrase the right way in the
0:29:46 right moment. And this gets back to how agents are built. If you wanted to add chat to the bottom
0:29:51 of that, then someone can ask whatever they want. That obviously triggers a whole different set of
0:29:57 protections and evaluations to keep that on track. So again, that configurability is quite important.
0:30:01 Kind of at a really high level, how do brand agents, how does your technology
0:30:08 change the way that your customers, the brand’s publishers, think about how they want to engage
0:30:14 with customers? So in the past, you would try to get a list of target identities. So you’re
0:30:20 trying to find a list of folks, because that list is a proxy for flags that mean what they care about
0:30:26 and what they’re interested in. And then you would create a message that was appropriate to that list,
0:30:31 and then you got to go find where those folks might be and try to show that message to them.
0:30:36 And that’s when they’re out beyond your borders, if you will. When they arrive at your site,
0:30:40 you don’t know how they got there with any particular detail, and you’re sort of again
0:30:45 starting from scratch. So what we would say is you have a great set of products and services,
0:30:51 figure out how you want to talk to your different typical customer as opposed to specific target
0:30:59 list, and then let the AI do what it does. Put it in all the places where they might be
0:31:04 beyond your property and on your property, and then it will surface to them the things that
0:31:09 matter to them, and they’re going to end up coming to you already having indicated what’s
0:31:14 important and being qualified. So you receive them. So here’s a big change. You should have agents
0:31:21 present when they arrive that receive them with that knowledge and start from step five, not step
0:31:29 zero. And that changes how you think about campaigns, how you construct them. Interestingly,
0:31:35 this is a bit arcane for ad tech, it’s vastly simpler operationally to set this up than to try
0:31:42 to write like a hundred line item insertion order, trying to manually, this is a classic AI,
0:31:48 do I build a massive if then else statement, an IO with hundreds of line items,
0:31:55 or do I have an agent able to figure it out? Yeah. So what’s next? What’s next for firsthand?
0:32:00 What’s the future of AI agents like? Take it wherever you like. How do you envision the technology,
0:32:06 the use of it, the implications kind of evolving over the next, I’m going to say three to five years,
0:32:13 but you can tweak that to suit as you like. Well, I think the ability for the internet to come
0:32:18 your internet has a lot of the new medium, it has a lot of implications. So I think
0:32:23 where I see agents is they’ll start to be more wisdom around this is the right configuration
0:32:27 at this point in the customer journey. This is a productive configuration, these abilities,
0:32:32 it’s going to be this guided conversation with presentation, you know, when they arrive,
0:32:36 I want page composition, but only in these portions of the site. So I think the very
0:32:43 construction of how marketing campaigns and sites are put together changes. And when it
0:32:47 becomes this connected journey where I’ve understood the consumer all along the way,
0:32:54 and I’ve helped them all along the way, and I have that data, I think that data starts to drive
0:32:59 a whole lot of additional insight and therefore evolution. Obviously, it feedback loops into
0:33:04 making the models better, but if you can ask your marketing data, you know, what are people asking
0:33:08 me for? I don’t sell it. Why didn’t they like the new product launcher? Why did they like the
0:33:12 new product launcher? You look at all the things and say, what were the phrasings and offerings
0:33:16 that got a lot of engagement? What got no engagement? All of your marketing is customer
0:33:22 research at the same time. So you’re learning in census level what to tune next. That’s just
0:33:28 never existed before. Yeah, interesting times ahead, to say the least. John Heller, for folks
0:33:34 listening who would like to learn more about firsthand, obviously the website firsthand.ai.
0:33:39 Other places they can go online, social media accounts, a blog. Where would you direct them?
0:33:44 Well, we have a LinkedIn presence, but the website is probably the best place to start.
0:33:48 Fantastic. Thank you again so much for joining the podcast. Fascinating conversation, obviously,
0:33:53 with implications for retail and brands and ad tech, but just thinking about the future of
0:33:58 the web and the age of AI and now a gentek AI. It’s, I mean, just to echo what you said,
0:34:02 it must be a blast getting to build all this stuff from scratch.
0:34:08 It’s a rare opportunity to be able to work in a world that’s changing so much with something so
0:34:15 powerful and with partner companies that are just so capable and fantastic. And it is just fun.
0:34:19 Excellent. Well, best of luck to you and look forward to keeping track of your progress and
0:34:35 maybe catching up again down the road. Alrighty. Thank you so much.
0:34:45 So,
0:34:57 ,
0:35:13 you.
0:35:23 [BLANK_AUDIO]
With AI agents, organizations can reshape the landscape in retail and beyond. In this episode of the NVIDIA AI Podcast, Jon Heller of Firsthand discusses how AI Brand Agents are transforming online shopping and digital marketing by personalizing customer journeys and turning marketing interactions into valuable research data.
-
How SDSC Uses AI to Transform Surgical Training and Practice – Episode 241
AI transcript
0:00:11 [MUSIC]
0:00:14 Hello, and welcome to the NVIDIA AI podcast.
0:00:16 I’m your host, Noah Krabitz.
0:00:20 Our guest today is a machine learning and AI leader who’s worked on projects ranging from
0:00:26 sustainability and reforestation efforts to creating a virtual closed swap platform for
0:00:30 environmentally friendly fashion and steps. But for the past year and a half or so, she’s been
0:00:35 serving as director of machine learning at the non-profit Surgical Data Science Collective,
0:00:40 where she leads research focused on utilizing video data from surgeries to develop tools
0:00:45 that can provide surgeons with immediate feedback and insights on their performance.
0:00:50 Just a recently gala TEDx talk titled, “Why You Want AI to Watch Your Surgery?”
0:00:54 which I encourage you all to go check out on YouTube after you listen to our conversation.
0:00:59 Because she’s here right now to talk with us about the potential for AI to help surgeons bring
0:01:05 better health care to everyone. Margot Mason Forsythe, welcome, and thank you so much for joining
0:01:10 the NVIDIA AI podcast. Hi, Noah. Thanks for having me. Margot, maybe we can still
0:01:15 do a little bit about your background. I alluded just a little bit in the intro. You’ve
0:01:20 worked on various projects after studying machine learning and leveraging your skills
0:01:24 and experience for a lot of AI for what we’d call AI for good projects, but really just
0:01:28 in my view, projects that are helping improve quality of life for everyone.
0:01:33 So maybe you can detail that a little bit and then kind of to bring us up to the present
0:01:37 with how you started that involved with the Surgical Data Science Collective.
0:01:42 Yeah, sure. So like you said, I’ve been working in the AI field for quite some time.
0:01:48 My first field of studies was come to science, so I’ve done enough software development,
0:01:54 and then I realized that I wanted to do a bit more scientific projects. So I went back
0:02:00 to finish my masters and specialized in computer vision. Computer vision is something that I really,
0:02:07 really love because I’m a very visual person. And so analyzing images and videos is something
0:02:14 that I’m pretty passionate about, I would say. And from that, I’ve worked on many different
0:02:21 projects with very big video and image files. So like you said, I’ve worked on several projects
0:02:28 that go from analyzing lumber scanning, for example, to satellite imagery, to detect a
0:02:36 deforestation, or now surgical videos. So it’s been quite a ride for sure. And I’ve learned
0:02:44 a lot mostly on how to producturize AI and how we can use AI to make an impact in the world and
0:02:50 have a focus on is it actually going to be useful, you know, and make a difference at some point.
0:02:53 So that’s kind of how I see my career so far.
0:02:59 Have you found that the projects you’ve been drawn to, has it been kind of being drawn to the
0:03:05 next sort of technical or scientific challenge, kind of pursuing the craft and kind of pushing
0:03:10 the boundaries of what computer vision and image and video analysis can do? Or have you been driven
0:03:15 more by the mission of these different projects or have you just kind of worked out that you sort
0:03:18 of been able to follow both of your blessings, so to speak?
0:03:23 I would say both actually, yes. Slightly luckily, always.
0:03:31 I mean, I’ve always been passionate about different sciences. So I actually have a
0:03:37 hard time focusing on only one thing or one project. And when I learned about climate tech,
0:03:43 for example, I really wanted to see how I could help and how AI could help process all of this
0:03:48 giant xactolite and imagery, you know, remote sensing is really hard to process.
0:03:55 And then I learned about surgical videos and I learned that there were thousands of terabytes
0:04:01 of surgical videos that were not used. And it’s a pretty big challenge because surgical videos are
0:04:06 really heavy, really long, you can imagine eight hours procedure. No one really wants to watch
0:04:14 those videos. So that’s when I was thinking that AI is indeed the perfect tool for this kind of data
0:04:21 that is really being really long, but also has a temporal element to it, which is quite difficult.
0:04:28 And when I learned about that, it was a really interesting impactful project, but also in terms
0:04:32 of technical challenges, I thought it was really interesting. And that’s actually what made me
0:04:37 join SDSC in the first place. So how did you find out what drew you into the
0:04:44 troubles of surgical videos? I started to work at the surgical data science collective SDSC
0:04:50 when I met the founder, who is a pediatric neurosurgeon, Dr. Denoho. And he introduced me
0:04:56 to this issue, you know, he was telling me I have all these videos saved in drives and I don’t do
0:05:02 anything with them. And I know any of my coworkers and friends and other surgeons have drives with
0:05:07 hours and hours of surgical videos, and they are just sitting on the desk, not really doing anything
0:05:13 with them. Right. Just a context set for the audience, for me too. Is it standard procedure that
0:05:19 surgeries are just all video to these cameras that are inside of people, sort of, you know,
0:05:25 guiding the surgeons? Or are they sort of operating room overhead cameras? Or how does this, how does
0:05:32 it work surgical video? That’s a great question. It’s pretty diverse. We have a lot of endoscopic
0:05:37 videos and microscopic videos. So endoscopic will go, for example, through the nose or any
0:05:43 other part of the body where you need to see inside. And actually, the endoscopic videos are
0:05:50 a really good data point for us because we see the computer vision algorithm sees what the surgeon
0:05:55 sees. Right. And that is, you know, golden because there is a lot of information in these videos.
0:06:00 For the microscopic videos, it is also used by the surgeons sometimes when they do the surgery to
0:06:06 really magnify what they’re looking at. For example, if they’re operating on very small arteries,
0:06:12 they need to have this big intense zone. That’s what they’re going to be using. Often it seems 3D
0:06:18 for them, which we have some of those 3D videos as well. So yeah, it’s a mix of microscopic surgical
0:06:25 videos and doscopic videos. And we don’t have yet the video, you know, kind of like camera security
0:06:30 from the OR, but some of our collaborators use this kind of videos to get a sense of what is
0:06:38 happening in the operating room. Right. And so you met the founder. So the surgical data science
0:06:43 collective was already in existence and you met the founder and got involved? Yes. It was pretty
0:06:50 early on. So the surgical data science collective is a non-profit organization that was started by
0:06:58 Dr. Daniel Ho and the main mission and the main idea was to create and analyze a repository of
0:07:05 surgical videos in order to improve surgical techniques and patient outcomes. Because still
0:07:13 today, there are 5 billion people who lack access to safe surgery and there are at least 4.2 million
0:07:21 people around the world who die within 30 days of surgery. So if we consider surgery as a disease,
0:07:27 it would be the third leading cause of death. That’s why, you know, the goal of SDSC is to
0:07:34 utilize this annual surgical videos to identify best practices, support medical education,
0:07:39 or even predict potential outcomes and complications in advance of surgery.
0:07:45 Right. So we’ve had other healthcare practitioners and people from the health industry on the podcast
0:07:51 talking about ripping the cums to mind is talking about analyzing still images, you know, x-rays
0:07:57 and scams and using AI to discover things often kind of, as you said, at that super zoomed in level
0:08:02 that, you know, cancer prediction and that kind of thing. Tell us about some of the opportunities
0:08:08 and challenges involved with analyzing all of the surgical video. I would assume, you know,
0:08:12 the first challenge is just gathering the data and then processing all of it. But
0:08:17 kind of take us through it. What is it that, you know, you said improving best practices and real
0:08:22 time feedback. So maybe you can speak to that as well. Yes. So the first challenge, like you said,
0:08:27 is actually to gather all of this data. And like I said earlier, a lot of these surgical
0:08:33 videos are stored on drives and it’s really difficult to get access to these drives. Sometimes,
0:08:38 you know, you kind of have to go and fly somewhere and meet with the surgeons to be able to get the
0:08:44 videos. Most of the time, the videos are not even recorded because people don’t know what
0:08:49 they can be used for. So why would they record them? So actually, one of the biggest challenge
0:08:55 is asking people to press the record button. Right. I mean, I’m laughing, but I’m imagining,
0:08:59 you know, if I was a surgeon, that’d probably be the last thing on my mind, right? So yeah.
0:09:05 Exactly. Yeah. I mean, that is definitely not the priority. And then if they think about recording,
0:09:12 so pressing this button, they have to export the video from the device, walk around with a USB key,
0:09:18 upload the videos on a laptop, upload to the cloud. So there are so many steps here for these people
0:09:24 who are extremely busy, who have so many other important things to do of their day. It’s definitely
0:09:29 not a priority. So that is our first challenge, and that has been one of the biggest challenges
0:09:35 that we’ve had. But we’ve been pretty successful in gathering at least a good first base of a
0:09:42 surgical video library. By now, we are about 40 terabytes of surgical videos. Okay. And we expect
0:09:50 to get more, you know, and the other challenge here is to get diverse surgical videos. We don’t want
0:09:57 obviously for an AI model, we don’t want videos from one surgeon in one hospital doing the same
0:10:01 procedure. Hundreds of hours of tonsillectomies is only going to get you so far, I’d imagine.
0:10:09 Yes, exactly. So that is the other challenge is how do we get these videos from diverse sources
0:10:16 and diverse fields, which is also a lot of networking, because you have to go and talk
0:10:21 to the people and ask them to record and then do they want to work with us so that we can start
0:10:27 gathering these videos. So this is the other second part of this challenge of data collection.
0:10:36 But in terms of the other challenges, obviously, surgical videos are quite long, but they are also
0:10:43 temporal. So videos, right? So it is a different type of models that you would use for still images.
0:10:47 We have the kind of the same architecture that you would use for other models,
0:10:52 but we always have to think about the temporality of what is happening in the video and that is
0:10:58 actually how we implement most of our models. Let’s say if you’re trained to track surgical
0:11:03 tools, you know, you have to think about all of the challenge that comes with that in surgical
0:11:10 videos, which are going to be obstructions and can sometimes you have, you know, explosion of blood
0:11:16 or something like that. And you want to be able to subtract the tools without losing them or
0:11:21 with dealing with these problems, which are pretty similar to other computer vision problems.
0:11:28 But it is slightly more challenging because of how messy these environments are.
0:11:35 Sure, I can only imagine. And so from a technical perspective, you know, obviously there are,
0:11:40 we’re hearing all the time these days about video models in the news kind of being opened up for
0:11:44 you know, consumer use, that kind of thing. You’ve been working with the Data Science Collective
0:11:51 for going on two years now. Is that right? Yes. So are you using, are you are you building tools
0:11:58 yourself? Are you using off the shelf to kind of modifying them to suit? Do you have partnerships
0:12:03 with other AI labs? How are you kind of fine tuning the tools to get what you need out of them?
0:12:11 A mix of all of what you just said, we usually will. So, you know, we’re a pretty small team and
0:12:17 a nonprofit organization. So we will try to use the most efficient methods for us. A lot of the
0:12:23 times that we’ll be reusing some architectures that are existing and then fine tuning them to our
0:12:28 needs. Combining some architectures is something that we’ve done a lot, especially with the temporal
0:12:35 model. So having, you know, a mix of a CNN and a temporal architecture or we’ve been playing that
0:12:41 with vision transformers and more recently with vision text transformers, which are the big models
0:12:49 you’re talking about here. And a lot of the time we will always be careful about new technologies.
0:12:54 So we want to try them and we want to make sure that we stay on, you know, on top of the innovation
0:12:59 that is happening to see if we can apply it to the surgical data science field. Something that is
0:13:04 quite interesting and challenging with this kind of data is that it requires a lot of expertise
0:13:11 that as computer scientists, engineers, we don’t have. So we need to work very closely
0:13:16 with clinicians and surgical experts. And that’s where the most important part of the work is
0:13:24 happening, actually, not even the model architecture or the new cool AI tools. For us, it’s really
0:13:31 understanding what the expertise is and then what model should apply to bring that information that
0:13:37 will be useful to the surgeons. Can you maybe walk us through an example of, and correct me if I’m
0:13:42 wrong here, but I imagine you have partnerships with surgeons and other medical professionals
0:13:50 and institutions and are you sending them images or videos to just kind of analyze and let you know
0:13:55 what they see or kind of what’s the, I guess I’m wondering what the process is or what it’s like
0:14:02 getting from footage and people revealing the footage to then an outcome that other practitioners
0:14:07 can benefit from, whether it’s, you know, a new technique or refining a best practice or something
0:14:14 like that. So for most of our collaborations, we will work with clinicians and surgeons who have
0:14:19 videos, but they don’t have the computer science knowledge. So they will come to us to do all of
0:14:26 the computer vision and analysis. So when they start working with us on these projects, maybe I
0:14:33 can go through a concrete example. We’ve been working with several NGOs who are focused on
0:14:41 surgical education. One of them is called All Safe and they focus on teaching surgical procedures to
0:14:47 several students all across low income countries and they do it through a digital platform. So
0:14:53 it’s online courses and then the review is done through videos. And so what we’re trying to see
0:14:59 here is can we analyze these videos and give feedback to the students with computer vision
0:15:05 and that is useful. So that is the important point is that is useful. So then we will work on
0:15:10 in developing the computer vision models to extract the features that we need to be
0:15:16 extracted to do the analysis and then collaborate with the clinicians and the surgeons on what
0:15:22 exactly do they need to have in the feedback or what do they believe is something we should focus
0:15:28 on because most of the time, you know, we’re going to look at something and I’m going to think with
0:15:32 my engineer mind. Oh, I’m going to look at this feature and I’m going to make this graph and
0:15:39 it’s going to be amazing. And then I say the surgeons and they’re like, what? So that’s why
0:15:46 it’s called the social data science collective because it’s the first step is creating a community
0:15:53 with the clinicians and the computer science experts. And we also have some collaboration
0:16:00 with computer scientists groups. So where we will work with them to analyze some of the videos we
0:16:06 have. So that is almost a connection between we have surgeons who want to do something very specific
0:16:12 and we have computer scientists collaborators who can help us do that specific task that maybe we
0:16:18 don’t have bandwidth for. So we are trying to expand that part of our community as well to really,
0:16:24 you know, have a really impact and scale that because it’s not only going to be a whole community
0:16:32 effort. I’m speaking with Marder Mason Forsyth. Marder is the director of machine learning at
0:16:39 the Surgical Data Science Collective, a nonprofit that is using AI machine learning tools to analyze
0:16:45 video data from surgeries to develop tools and feedback loops and other mechanisms that can
0:16:51 help surgeons with insights and feedback on their procedures and techniques and really just
0:16:56 bring better health care to more people across the globe. As Margot was just talking about,
0:17:02 you mentioned, you know, being a nonprofit doing AI research is a little bit unusual right now.
0:17:06 What is that like? Are there big things that are either, you know, well, I mean, I’ll be
0:17:11 saying I would imagine something. Resources is an issue as it is for almost all nonprofits. But
0:17:17 there are things, are there things specific to being a nonprofit AI kind of research group
0:17:24 that stick out to you? So there are many interesting aspect that come from being a nonprofit. Like you
0:17:30 said, resources are indeed limited. So we have to be creative in the way we train computer vision
0:17:36 models. We will always start simple, which is actually something I’ve always done and advocated
0:17:41 for is if you want to start a computer vision project, maybe you don’t need to start with the
0:17:47 biggest model that exists. You know, start simple with a small data set, do a proof of concept,
0:17:54 and then iterate. So that is what we have as a development pipeline and research pipeline
0:18:02 process. We will always start simple and small and then scale. And that is limited because of
0:18:07 our resources, obviously. But to me, it’s something that is actually good that I would even, I would
0:18:13 do the same if I had, you know, 10x budget. I would probably do the same, but it helps in that way.
0:18:20 And then it brings a lot of different projects. Being a nonprofit, we’re able to work on a project
0:18:25 that maybe we wouldn’t be able to work on if we were a for-profit. For sure it would be, it would
0:18:31 actually be completely different. And that’s why I really wanted to give SDSC a shot when I first
0:18:37 met Dr. De Noho because I was curious about it. I was like, how are we gonna do that? You know,
0:18:43 I’ve never done AI. I’ve never seen AI done in a nonprofit. There are some others, but it’s
0:18:49 really research focused and community focused, which you wouldn’t really be able to do as well
0:18:55 in a for-profit, I believe. We’re gonna ask this an answer, please, based on what’s happened so far
0:19:00 and/or, you know, what you see coming in the near future. What are some of the big benefits
0:19:06 for clinicians, for patients that you’ve seen or expect to see from, you know, not just the work
0:19:11 that you’re doing at the collective, but more broadly leveraging AI to help with the surgical
0:19:17 process? The AI field will bring a lot of new and good things to the medical field, I believe.
0:19:25 In the surgical space, which is what I’ve been exposed to mostly, it will bring a lot of standardization.
0:19:31 I believe something I’ve discovered working in that field is that every surgeon and every hospital
0:19:37 will perform surgeries and procedures in different ways, and no one really knows
0:19:45 ABCD of how you’re supposed to do a specific procedure. So by having a tool here, AI, to first
0:19:50 encourage people to collect the data. So first, we’re gonna get the surgical videos, we’re gonna
0:19:56 finally start looking at these videos that are not being looked at, and then share them between
0:20:02 surgeons all across the globe that will bring a lot of standardization, or at least they will
0:20:07 start to talk to each other, which I think is kind of beautiful, because right now, you don’t really
0:20:13 have a good way to talk to each other. And through the surgical videos, the hope is that they will
0:20:18 start to talk to each other. And when you can imagine so many applications, for example, the one
0:20:25 that always comes back is education. Instead of using a medical textbook with drawings, the students
0:20:30 can watch a tutorial on how to do the specific procedure. So that’s a big difference that I
0:20:37 think will change a lot of things. And then being able to find best practices through this
0:20:44 analysis of surgical videos is gonna be pretty interesting, because who knows what is in these
0:20:51 videos. And there’s so much that has to be discovered, and there is a big need to be creative
0:20:56 when we think about this data, because no one has ever looked at this data, and no one has ever
0:21:02 really thought about what can we do with all of that, and what is my question that I want to be
0:21:08 answered. And that’s one of our challenges, actually, is sometimes we ask surgeons, “Oh,
0:21:13 what do you want to answer through all of these videos that you have?” And they don’t really know,
0:21:18 because they haven’t had this option before. Right. That’s interesting. It makes me think of
0:21:24 that. I mentioned there were some of the examples of MRI and scan analysis and cardiac care and that
0:21:32 kind of thing. And I’m thinking about the AI tools being able to help practitioners find differences
0:21:38 in cells on a very, very sort of nano basis, right? But even with that, I’m thinking, “Oh,
0:21:42 well, they know what they’re looking for.” Or even if it’s they’re looking for an anomaly,
0:21:48 it’s still kind of we know what we’re looking for. But yeah, with surgery, my very kind of naive,
0:21:53 not knowing much about the field coming in this conversation thinking, “Oh, well, video footage
0:22:00 is being used to train AI systems. Are we moving towards better education for humans or even training
0:22:04 robotic surgical algorithms or that kind of thing?” But it’s fascinating to hear people say that it
0:22:10 makes sense to me as a non-surgeon that what would they be looking for? It’s not the same as looking
0:22:16 for an anomaly in a cell that might stick out. I think at the beginning probably of the when
0:22:23 they first started to analyze MRI with AI, they also had to be creative because someone had to be
0:22:28 asking for these questions for it. And for surgical videos, one of the first steps would be to look
0:22:33 at anomalies, which actually was trying to do now is what are the outliers? Who is using this tool
0:22:38 and no one else is using it for the same procedure? So we are kind of starting with the low hanging
0:22:46 foods, I guess, but the deeper existential questions are not there yet. And I’m really excited to work
0:22:51 with the clinicians to help them come up with these questions by showing them the data because
0:22:56 no one else is going to come up with these questions. It has to be the people who are working
0:23:02 every day in the OR. And actually, the videos are a really great source of data, but there is so much
0:23:07 more going on. Obviously, there is the patient data, there’s the patient outcomes, there is
0:23:13 everything that is going on in the operating room. And all our engineers have actually been
0:23:19 in the operating room so that they understand what is happening behind that camera. And I’ve
0:23:25 been in the operating room to a couple of times now. And it’s really helped me understand better
0:23:29 what is happening. And sometimes when we have a new procedure type that we’re exposed to,
0:23:35 I go to the OR because I want to understand better like, oh, some random questions sometimes
0:23:40 like, where are you? Where is it in the body? Or how many people are operating? Because sometimes
0:23:45 you have more than one surgeon. It’s just so many things that you don’t capture in the video,
0:23:48 but there’s still obviously a lot of information in the videos.
0:23:55 Fantastic. I go for listeners who would like to learn more or hopefully perhaps there’s even
0:24:00 some surgeons, some clinicians listening who are thinking, oh, I have surgical video that, you know,
0:24:05 in a shelf on a drive somewhere, maybe I can send it and how about the cause? Where can listeners go
0:24:09 to find out more about the work that the Surgical Data Science Collective was doing,
0:24:14 the work that you were doing perhaps to get involved as a partner? Who knows? Where can listeners go
0:24:21 to learn more? So we have our website is thesurgicalvideo.io and we can also find us on social media
0:24:28 at Surgical Data Science Collective. And I would also encourage if anyone is a computer engineer,
0:24:32 computer scientist who wants to work on a different project that they’ve been working on
0:24:37 and are interested in surgical AI to also reach out to us because we are working with quite a lot
0:24:42 of different parts in this. So anyone who is interested should reach out to us.
0:24:47 Fantastic. Well, Margot, thank you so much for taking the time to stop by, join the podcast
0:24:52 and talk a lot about the work you’re doing. It’s, I don’t know, stories like this with the technical
0:24:57 aspects kind of match up with the societal impact. I think they’re just fantastic stories.
0:25:01 There’s sort of something for everybody, right? And it sounds like you’re finding a really interesting
0:25:06 path to fuse your technical interests with making an impact in your own work. So congratulations
0:25:11 and all the best developed to you and all of your partners and cohorts at the collective.
0:25:15 Well, thank you, Noah. Thanks for having me on the podcast. It was really enjoyed the conversation.
0:25:17 Me too, our pleasure.
0:25:21 [Music]
0:25:25 [Music]
0:25:37 [Music]
0:25:49 [Music]
0:26:01 [Music]
0:26:05 [Music]
0:26:13 [BLANK_AUDIO]
Margaux Masson-Forsythe, director of machine learning at the Surgical Data Science Collective (SDSC), discusses how AI-driven video analysis is transforming surgical training and practice, making surgery safer and more accessible to billions of people worldwide.
-
NVIDIA’s Ming-Yu Liu on How World Foundation Models Will Advance Physical AI – Episode 240
AI transcript
0:00:10 [MUSIC]
0:00:14 Hello, and welcome to the NVIDIA AI podcast.
0:00:16 I’m your host, Noah Kravitz.
0:00:21 NVIDIA CEO, Jensen Hoang, recently keynoted the CES Consumer
0:00:24 Electronic Show conference in Las Vegas, Nevada.
0:00:28 Amongst the many exciting announcements Jensen talked about
0:00:30 was NVIDIA Cosmos.
0:00:34 Cosmos is a development platform for world foundation models,
0:00:38 which I think we’re all going to be talking a lot about in the coming months and years.
0:00:40 What is a world foundation model?
0:00:44 Well, thankfully, we’ve got an expert here to tell us all about it.
0:00:47 Mingyu Liu is vice president of research at NVIDIA.
0:00:49 He’s also an IEEE fellow.
0:00:53 And he’s here to tell us all about world foundation models,
0:00:57 how they work, what they mean, and why we should care about them going forward.
0:01:03 So without further ado, Mingyu, thank you so much for joining the NVIDIA AI podcast, and welcome.
0:01:04 It’s great to be here.
0:01:07 So let’s start with the basics, if you would.
0:01:09 What is a world foundation model?
0:01:09 Sure.
0:01:16 So world foundation models are deep learning-based, space-time visual simulator
0:01:19 that can help us look into the future.
0:01:21 It can simulate visits.
0:01:25 It can simulate people’s intentions and activities.
0:01:30 It’s like a data exchange of AI, imagining many different environments
0:01:32 and can simulate the future.
0:01:35 So we can make good decisions based on the simulation.
0:01:38 We can leverage world foundation models in imagination
0:01:43 and simulation capability to help train physical AI agents.
0:01:46 We can also leverage this capability to help the Asian
0:01:49 make good decisions during the inference time.
0:01:54 You can generate a virtual world based on test prompts, image prompts, video prompts,
0:01:57 action prompts, and the layer combinations.
0:02:03 So we call it a world foundation model because it can generate many different worlds
0:02:08 and also because it can be customized to different physical AI setups
0:02:10 to become a customized world model.
0:02:14 So different physical AI have different number of cameras and different locations.
0:02:18 So we want the world foundation model to be customizable for different physical AI setups
0:02:20 so they can use in their settings.
0:02:27 So I want to ask you kind of how a world model is similar or different to an LLM
0:02:28 and other types of models.
0:02:32 But I think first I want to back up a step and ask you,
0:02:38 how is a world model similar or different to a model that generates video?
0:02:40 Because my understanding, and please correct me when I’m wrong,
0:02:45 my understanding is that you can prompt a world model to generate a video,
0:02:49 but that video is generated based on the things you were talking about,
0:02:54 based on understanding of physics and other things in the physical world,
0:02:55 and it’s a different process.
0:03:00 So I don’t know what the best way is to kind of unpack it for the listeners,
0:03:02 but one place to start might be,
0:03:07 how does a world model differentiate from an LLM or a generative AI video model?
0:03:15 So one model is different to LN in the sense that LN is focused on generating text description.
0:03:17 It generates understanding.
0:03:20 A world model is generating simulation,
0:03:26 and the most common form of simulation is videos, so they are generating pixels.
0:03:29 And so world models and video foundation models, they are related,
0:03:34 and video foundation model is a general model that generates videos.
0:03:39 It can be for creative use cases, it can be for other use cases.
0:03:45 In world models, we are focusing on this aspect of video generation.
0:03:50 Based on your current observation and the intention of the actors in your world,
0:03:52 you rule out the future.
0:03:55 So they are related, but with a different focus.
0:03:56 Gotcha, thank you.
0:03:58 So why do we need the world models?
0:04:01 I mean, I think I know part of the answer to the question,
0:04:05 we’re talking about simulating physical AI and all of these amazing things,
0:04:10 but tell us about the need for world foundation models from your perspective.
0:04:16 So I think world foundation models is important to physical AI developers.
0:04:21 Physical AI systems with AI deploy in the real world.
0:04:25 And different to digital AI, these physical AI systems,
0:04:29 they interact with the environment and create damage.
0:04:32 So this could be real hard.
0:04:37 Right, so a physical AI system might be controlling a robotic arm
0:04:41 or some other piece of equipment changing the physical world.
0:04:45 I think there are three major use cases for physical AI.
0:04:47 It’s all around simulation.
0:04:51 The first one is, when you train a physical AI system,
0:04:54 you train a deep learning model, you have a thousands of points.
0:04:57 Do you know which one you want to deploy?
0:05:00 And if you deploy individually, you’re going to be very time consuming,
0:05:03 and so then it’s bad, you’re going to damage your kitchen.
0:05:09 So with a world model, you can do verification in the simulation.
0:05:15 So you can quickly test out this policy in many, many different kitchens.
0:05:19 And before you deploy in the real kitchen,
0:05:24 and after this verification step, you may be narrowed down to three checkpoints,
0:05:26 and then you do the real deployments.
0:05:31 So you can have an easier life to deploy your physical AI.
0:05:35 It reminds me of when we’ve had podcasts about drug discovery,
0:05:40 and the guests talking about the ability to simulate experiments
0:05:44 and different molecular combinations and all of that work
0:05:47 so that they can narrow it down to the ones that are worth trying
0:05:49 in the actual, the physical lab, right?
0:05:53 So it sounds like, you know, similar like just being able to simulate everything
0:05:56 and narrow it down must be such a huge advantage to developers.
0:05:59 Yeah, and second application is, you know,
0:06:02 a world model, if you can predict the futures,
0:06:05 you have some kind of understanding of basics.
0:06:10 You might know the action required to drive the world toward that future.
0:06:15 And the policy model, you know, the typical one deploying physical AI
0:06:19 is all about putting the action, right action, given the observation.
0:06:23 And so a world model can be used as initialization for the policy model,
0:06:27 and then, you know, you can train the policy model with less amount of data
0:06:29 because the world model is already pretty trained
0:06:34 with many different observations that spawn the data assets.
0:06:39 So without a world model, what’s the procedure of training a policy like?
0:06:42 So one procedure is you collect data,
0:06:46 and then you start to do the supervised by tuning,
0:06:48 and then you may use, yeah.
0:06:52 So it’s hands on, it’s manual, you have to get all the data, it’s a lot, yeah.
0:06:59 Yeah, and third one is when world model is good enough, highly accurate and fast,
0:07:04 you know, before robot taking any actions, you just simulate different missions.
0:07:08 And the check which you want to really achieve your goal, and take that one.
0:07:12 You know, I have a data strange nest to you before you’re making any decision.
0:07:14 Would they be great?
0:07:17 You mentioned accuracy when the models are fast enough and accurate enough,
0:07:20 and I don’t know if it’s a fair question to ask,
0:07:22 so I ask it, interpret it the best way,
0:07:27 but like, how do you determine accuracy or measure accuracy on a world model,
0:07:31 and is there a benchmark that, you know, different benchmarks you need to hit
0:07:34 to deploy in different situations, or how does that work?
0:07:35 Yeah, it’s a great question.
0:07:39 So I think a world model development is still in its infancy.
0:07:40 Right.
0:07:46 So people are still trying to figure out the right way to measure the world model performance.
0:07:49 And I think there are several aspects a world model must have.
0:07:51 One is follow the law of physics.
0:07:54 When you’re dropping a ball, you should predict it, you know,
0:07:58 in the right position based on the physics laws, right?
0:08:03 And also in the 3D environment, we have to have object permits, right?
0:08:06 So when you turn back and come back, you know,
0:08:08 the object should remain there, right,
0:08:11 without any other players, it should remain in the same location.
0:08:14 So there are many different aspects I think we need to capture.
0:08:17 And I think an important part for the research community
0:08:20 is to come out with the right benchmark.
0:08:24 So that the community can move forward in the right location
0:08:26 to democratize this important area.
0:08:26 Right.
0:08:29 So speaking of moving forward, maybe we can talk a little bit,
0:08:34 or you can talk a little bit, about COSMOS and what was announced at CES.
0:08:41 So in CES, Jensen announced the COSMOS World Model Development Platform.
0:08:45 It’s a developer-first world model platform.
0:08:48 So in this platform, there are several components.
0:08:51 One is pre-trained world foundation models.
0:08:52 Right.
0:08:54 We have two kind of world foundation models.
0:08:58 One is based on diffusion, the other is based on autoregressive.
0:09:03 And we also have tokenizers for the world foundation models.
0:09:06 Tokenizers compress videos into token
0:09:09 so that transformers can consume for their task.
0:09:10 Right, right.
0:09:15 In addition to these two, we also provide post-training scripts
0:09:20 to help physical AI builder to fine-tune the pre-trained model
0:09:22 to their physical AI setup.
0:09:24 Some cars have A cameras, right?
0:09:29 And we rely on our world foundation model to predict A views.
0:09:35 And lastly, we also have this video curation toolkit.
0:09:40 So processing videos, a lot of video is a serrated computing task.
0:09:43 There are many people need to be processed.
0:09:48 And we gather libraries as they’re ready to compute computation code together.
0:09:53 Want to help the world model developers leverage the library to create data.
0:09:55 Either they want to build their own world models
0:10:00 or fine-tune one based on our pre-trained world foundation models.
0:10:03 So the models provided as part of COSMOS,
0:10:06 those are open to developers to use.
0:10:09 They open to other businesses, enterprises.
0:10:12 Yes, so this is an open-weight development platform.
0:10:15 So meaning that the model is open-weight,
0:10:18 the model weights are released before commercial use.
0:10:23 Before, this is important to physical AI builders, right?
0:10:27 So physical AI builders, they need to solve tons of problems
0:10:32 to be really useful robots, self-driving cars for our society.
0:10:36 There are so many problems, and world model is one of them.
0:10:44 And those companies, they may not have the resources or expertise to build a world model.
0:10:47 And media care about our developers,
0:10:51 and we know many of them are trying to make a huge impact in physical AI.
0:10:53 So we want to help them.
0:10:58 That’s why we create this world model development platform for them to leverage
0:11:01 so that they can handle other problems,
0:11:05 and we can contribute our art to the transformation of our society.
0:11:06 Absolutely.
0:11:07 I wanted to ask you,
0:11:12 can you explain a little bit about the difference between diffusion models
0:11:15 and autoregressive models, particularly in this context?
0:11:20 Why offer both, what are the use cases and pros and cons?
0:11:23 So autoregressive model or AR model,
0:11:27 it’s a model that we did talk once at a time,
0:11:30 conditioned on what has been observed, right?
0:11:35 So GPT is probably the most popular autoregressive model.
0:11:36 We did token at a time.
0:11:38 Diffusion, on the other hand,
0:11:44 is a model that we did a set of token together.
0:11:50 And through iteratively removed noises from these initial tokens.
0:11:53 And the difference is that for AR model,
0:11:56 with a significant amount of investment in GPT,
0:12:00 there are so many optimizations that they can run very fast.
0:12:04 And diffusion, because tokens are generated together,
0:12:08 so it’s easier to have coherent tokens.
0:12:11 The generation quality tends to be better.
0:12:14 And both of them are useful for physical eye builders.
0:12:18 So some of them need speed, some of them need high accuracy.
0:12:19 So both are good.
0:12:20 Excellent.
0:12:23 So far, the most successful autoregressive model
0:12:27 is based on discrete token prediction, like in GPT.
0:12:31 So you pretty much have a set of integers tokens
0:12:33 and you predict them during training.
0:12:35 And in the case of wall foundation models,
0:12:40 it means you have to organize videos into a set of integers.
0:12:44 And you can imagine it’s a challenging compression task.
0:12:46 And because of these compression,
0:12:51 the autoregressive model tends to struggle more on the accuracy.
0:12:53 But it has other benefits.
0:12:58 For example, its setting is integrated into the physical AI setup.
0:12:59 Got it.
0:13:01 I’m speaking with Mingyu Liu.
0:13:04 Mingyu is vice president of research at NVIDIA,
0:13:07 and he’s been telling us about world foundation models,
0:13:09 including the announcement of NVIDIA Cosmos,
0:13:11 the developer platform for world models
0:13:14 that was announced during Jensen’s CES keynote.
0:13:16 So we’ve been talking a lot about,
0:13:18 you’ve been explaining what a world model is,
0:13:21 how it’s similar and different to other types of AI models,
0:13:24 just now the difference between autoregression and diffusion.
0:13:26 Let’s kind of change gears a little bit
0:13:28 and talk about the applications.
0:13:31 How will Cosmos, how are our world foundation models
0:13:33 going to impact industries?
0:13:36 Yeah, so we believe that, first of all,
0:13:40 the world foundation model can be used as a synthetic data generation
0:13:43 engine to generate different synthetic data.
0:13:45 And like what I said earlier,
0:13:50 the world model can also be used as a policy evaluation tool
0:13:54 to determine which checkpoint or which policy
0:13:58 is a better candidate for you to test out in the physical world.
0:14:01 And also, if you can predict the future,
0:14:04 it probably can reconfigure it to predict the action
0:14:09 toward that future, so as a policy vision initialization.
0:14:14 And also to have a data stretch next to you before any endeavor.
0:14:16 So during the next time,
0:14:20 schedule a rollout and pick the best decision for each moment.
0:14:22 Are there particular industries?
0:14:26 I know working factories and industrial work,
0:14:27 anything involving robotics,
0:14:32 are there specific industries that you see benefiting from world models
0:14:33 maybe sooner than others?
0:14:38 Yes, I think the self-driving car industry and the human robot industry
0:14:42 will benefit a lot from these world model developments.
0:14:47 It can simulate different environments that will be difficult
0:14:53 to have in the real world to make sure the Asian is behaved effectively.
0:14:56 So I think these are two very exciting industries,
0:14:58 where the world models can impact.
0:15:01 And NVIDIA obviously has a long history, as you were saying,
0:15:04 of it’s not just about rolling out the hardware,
0:15:07 there’s the software, the stack, the ecosystem,
0:15:09 all of the work to support developers,
0:15:13 because if the devs aren’t building world-changing things with the products,
0:15:14 then there’s a problem, right?
0:15:18 What are some of the partnerships, the ecosystems,
0:15:20 relative to world foundation models?
0:15:23 And maybe there are some partners who are already doing some interesting stuff
0:15:25 with the tech you can talk about.
0:15:28 Yes, we are working with a couple of human-loving companies
0:15:30 and self-driving car companies,
0:15:36 including 1x, Wabi, Dioto, S10, and many others.
0:15:39 So NVIDIA believe in suffering.
0:15:43 We believe that true greatness comes from suffering.
0:15:45 So working with our partners,
0:15:50 we can look at the challenges they are facing to experience their pain
0:15:53 and to help us to build a world model platform
0:15:56 that is really beneficial to them.
0:15:57 Fantastic, yeah.
0:16:01 So I think this is the important part to make the field move faster.
0:16:02 Absolutely.
0:16:06 All right, so you talked about being able to predict the future
0:16:09 and you talked about just now that things are moving faster.
0:16:11 What do you see on the horizon?
0:16:13 What’s next for world foundation models?
0:16:16 Where do you see this going in the next five years
0:16:19 or adjust that time frame to whatever makes sense?
0:16:22 So I’m trying to be a world model now,
0:16:23 try to predict the future.
0:16:25 Exactly, yeah, for now it’s fine.
0:16:28 Yes, I believe we are still in the infancy
0:16:32 of world foundation model development.
0:16:35 The model can do phases to some extent,
0:16:37 but not well or robust enough.
0:16:42 That’s the critical point to make a huge transformation.
0:16:45 It’s useful, but we need to make it more useful.
0:16:49 So the field of AI advance very fast.
0:16:55 So from GBT-3 to CheGBT, it’s just a year or two.
0:16:57 Right, we forget it’s all going so quickly.
0:17:00 Yeah, it’s going so fast.
0:17:04 And I believe physical AI development will be very fast too,
0:17:07 because the infrastructure for a large-scale model
0:17:09 has been established.
0:17:14 So this large-density model transformation.
0:17:17 And there’s a strong need to have physical AI assistance.
0:17:20 So it’s been passed for humanoid.
0:17:22 And there are also a lot of investments.
0:17:25 So we have the great foundation.
0:17:30 And many young researchers want to make a difference.
0:17:33 And we also have great need and investments.
0:17:35 I think this is going to be a very exciting area
0:17:37 and it’s going to move very fast.
0:17:42 I don’t want to say that it will be solved in five years or three years.
0:17:45 So I think it’s still a long way.
0:17:48 And more importantly, we also need to study
0:17:52 how to best integrate these war models
0:17:56 into the physical AI systems in a way that can really benefit them.
0:18:00 Right, and does that come through just working with partners
0:18:02 out in the field, kind of combining research with application
0:18:05 and iterating and learning?
0:18:06 Yeah, I believe so.
0:18:07 I believe in suffering.
0:18:12 So I believe that to hand in hand with our partners,
0:18:16 understand their problems is the best way to make progress.
0:18:19 For folks who would like to learn more
0:18:22 about any aspects of what we’re talking about,
0:18:24 there are obviously resources on the NVIDIA site.
0:18:28 And of course, the coverage of Jensen’s keynote and the announcements.
0:18:30 Are there specific places, maybe a research blog,
0:18:34 maybe your own blog or social media channels,
0:18:39 where people can go to learn more about NVIDIA’s work with world models
0:18:42 and anything else you think the listeners might find interesting?
0:18:49 Yes, so we have a white paper written for the Cosmos war model iPhone.
0:18:52 And we welcome you to download and take a read
0:18:55 and let me know how, you know, whether it’s useful to you
0:18:59 and let me know the feedback and we will try to do better for the next one.
0:19:03 Excellent. Mingyu, it was an absolute pleasure talking to you.
0:19:05 I definitely learned more about world models
0:19:09 and some of the particulars and the applications going forward.
0:19:10 So I thank you for that.
0:19:12 I’m sure the audience did as well.
0:19:14 But, you know, the work that you’re doing, as you said,
0:19:16 it’s early innings and it’s all changing so fast.
0:19:19 So we will all keep an eye on the research that you’re doing
0:19:22 and the applications and best of luck with it.
0:19:24 And I look forward to catching up again
0:19:27 and seeing how quickly things evolve from here on out.
0:19:28 Thank you. Thanks for having me.
0:19:32 It’s been fun and I hope next time I can share more, you know,
0:19:35 maybe more advanced version of the world model.
0:19:38 Absolutely. Well, thank you again for joining the podcast.
0:19:39 Thank you.
0:19:42 (dramatic music)
0:19:44 (dramatic music)
0:19:47 (dramatic music)
0:19:50 (dramatic music)
0:19:53 (dramatic music)
0:19:55 (dramatic music)
0:19:58 (dramatic music)
0:20:01 (dramatic music)
0:20:04 (dramatic music)
0:20:07 (dramatic music)
0:20:09 (dramatic music)
0:20:12 (dramatic music)
0:20:15 (dramatic music)
0:20:18 (dramatic music)
0:20:20 (dramatic music)
0:20:23 (dramatic music)
0:20:26 (dramatic music)
0:20:29 (dramatic music)
0:20:39 [BLANK_AUDIO]
As AI continues to evolve rapidly, it is becoming more important to create models that can effectively simulate and predict outcomes in real-world environments. World foundation models are powerful neural networks that can simulate physical environments, enabling teams to enhance AI workflows and development. Ming-Yu Liu, vice president of research at NVIDIA and an IEEE Fellow, joined the NVIDIA AI Podcast to talk about world foundation models and how it will impact various industries.
https://blogs.nvidia.com/blog/world-foundation-models-advance-physical-ai/
https://www.nvidia.com/cosmos/