AI transcript
0:00:15 Hello, and welcome to the NVIDIA AI Podcast. I’m your host, Noah Kravitz.
0:00:20 This past August, three of NVIDIA’s research leaders gave a special address at SIGGRAPH,
0:00:24 the annual International Computer Graphics and Interactive Techniques Conference
0:00:29 that’s been running since 1974. One of those people is here with us today.
0:00:35 Sonia Fidler is VP of AI Research at NVIDIA, where she leads the NVIDIA Spatial Intelligence Lab
0:00:40 in Toronto, Ontario, Canada. Sonia is here to tell us about the lab, to talk about the research she’s
0:00:45 most excited about right now, including what was presented at SIGGRAPH, and to share a little bit
0:00:50 about her own journey through the worlds of research and artificial intelligence. So without further
0:00:55 ado, let’s get to it. Sonia Fidler, welcome, and thanks for joining the AI Podcast.
0:01:00 Hi, Noah, and hi, audience. I’m very excited to be on this AI Podcast.
0:01:06 We are very excited to have you. Thanks for taking the time. There’s a lot going on, obviously,
0:01:13 and congratulations on the special address and everything else at SIGGRAPH. So we wanted to
0:01:18 start with a little bit about your own journey. You followed your passion for computer vision
0:01:24 and artificial intelligence across Europe and into North America. Can you tell us a little bit about
0:01:28 what first got you interested in the field and how your journey took you to Toronto?
0:01:37 So, maybe I’ll start with my youth, and there were actually three important breakpoints that led me to
0:01:44 where I am. And the first one actually starts with my dad. So my dad would sit on a chair next to my
0:01:50 sister and me and tell us bedtime stories. And surprisingly, she was very good at it. He was a scientist,
0:01:57 scientist, and so he would tell us stories about scientists. For example, he would tell us about
0:02:02 Nikola Tesla, who was born in Croatia, and my mom was also born in Croatia. I was born in Slovenia.
0:02:04 And was this in Slovenia? Where did you grow up?
0:02:12 Yeah, I grew up in Slovenia. So my mom was born in Croatia, my dad was born in Slovenia, and I was born in
0:02:38 So he would tell us stories about how, at the young age, Nikola jumped from the roof of their house holding an open umbrella, thinking he would fly. And every night there would be a new episode about his inventions, the creation of radio, alternating current. Obviously, we didn’t understand what he made it sound. And the competition with Thomas Edison, right? It would be almost like a Netflix series.
0:02:51 And for a child, this was very exciting. So I just could not wait to hear more to the next day. So kind of like my childhood, heroes were not movie stars or music stars. They were scientists.
0:02:52 Awesome.
0:03:16 That really kind of shaped me. So perhaps not surprising, you know, one day I appear in front of my parents and proclaim, I want to be an inventor. And there was even a photo of me and my sister. Actually, my sister dressed as a robot, quite a bit of money for her to put some cardboard boxes around her. And maybe not surprising, she became an economist.
0:03:38 So this, you know, my mom pretty much settled my profession. I was going to be a mentor and this was very young age. The second moment was really thanks to my mom. And this was in primary school. I was, you know, very young. And at some point I got pretty ill. It was something like COVID almost. I think it was called whooping cough or something.
0:03:51 I was home. Fever, coughing two to three months. And I basically missed a lot of school. I missed like, you know, fractions. A whole big chapter on math.
0:04:12 I had no idea. So I come back to school and of course I didn’t understand anything they were talking about. And I developed some sort of resistance in going to school. And before a math test, I threw a tantrum, like crying hysterically on the floor. I hate math. I don’t want to go back to school.
0:04:20 And my mom is actually a teacher. And of course, you know, having a school-hating child, that was not an option.
0:04:23 Yeah, it happens.
0:04:37 So even though she was an English teacher, she would sit down with me and work with me on the math. And she made it really interesting. So she had this really nice way of teaching me through, giving me puzzles, math puzzles.
0:04:50 And I began to like really love it. You know, as kids understand things, they also love it. And I think to this day, what drives me at the core is solving problems. And I think that’s still kind of stuck with me.
0:04:58 It’s pretty much settled, you know, what I would study. I was determined age, I don’t know, 12, 13, I was going to study math.
0:05:11 And the third moment was really, you know, kind of thanks to my grandma. So I was already doing my PhD and I decided to work on computer vision.
0:05:20 And I saw this one talk. It started actually with math, even my PhD. And I saw this talk on someone recognizing cats and dogs. And I was very early AI at that point.
0:05:29 And it just kind of spoke to me, you know, I was always kind of dreaming of robots and computer vision felt like the first step to do.
0:05:41 So, you know, I was there doing my PhD and my grandma, you know, she was a very smart woman. She was actually one of the first female plastic surgeons in Yugoslavia.
0:05:42 Oh, wow.
0:05:53 Yeah, she was always telling me these stories, you know, how she graduated in med school. And the day they were graduating and they were out having fun and, you know, sirens came out, World War II started.
0:05:59 And, you know, she had to basically just go to the operating room and that was her next four years.
0:06:09 And, you know, like basically fear became alien to her. And it wasn’t for me really, right?
0:06:17 So I studied my PhD in Slovenia, really for the fear of leaving, leaving, right? To the wide open world alone as a woman.
0:06:23 And I was somehow not encouraged. My mom would scare the hell out of me of doing that.
0:06:32 So towards the end of my PhD, and I was kind of working on this AI, like something similar to deep networks, just my own take on it.
0:06:42 I was presenting at a conference and a famous professor at UC Berkeley stopped by the poster, really likes it and invites me to visit his group at Berkeley.
0:06:49 And, you know, I was beyond excited, but I still carried this kind of weight of fear and expectation.
0:06:55 And I talked to my grandma and she said, you know, Sonia, don’t listen to your mom. Just go.
0:07:00 Actually, she passed away a few months later. That was January 13, 2009.
0:07:03 And the next thing I remember, I’m sitting on a plane.
0:07:09 I look at my plane ticket to California and it was January 13, 2010.
0:07:12 It was exactly one year later.
0:07:13 Exactly a year.
0:07:17 Exactly. I’m not kidding. Like, this was exactly, it was all…
0:07:17 It was meant to be.
0:07:23 Meant to be. I was both scared and excited, but, you know, chapter two of my life was…
0:07:24 Right.
0:07:28 Was that your first time traveling abroad?
0:07:31 No, I would go before, you know, just visit New York with my family.
0:07:32 Okay.
0:07:35 This was the first time I went alone and, like, living.
0:07:35 Yeah, very different.
0:07:39 You know, I got my bags and here it was, you know.
0:07:40 Yeah.
0:07:41 It was scary, but…
0:07:43 And landed in Berkeley, of all places.
0:07:51 I spent there a few months, seven, eight months, and came back, graduated, and then I did my postdoc and I was at U of T.
0:07:54 So that’s kind of what brought me to Toronto.
0:07:56 Right. Amazing.
0:08:08 I feel like the graphics and interactive industry owes a big thank you to many members of your family for all the inspiring you and then grandma kind of giving you that nudge and everything.
0:08:09 That’s amazing.
0:08:10 Why Toronto?
0:08:13 What was the link that brought you to Toronto?
0:08:20 Yeah, actually, the U of T, University of Toronto, was doing really great stuff in deep learning.
0:08:23 And like I said before, that was kind of my PhD.
0:08:27 I was really inspired by doing this hierarchical representation to recognize objects.
0:08:35 I was reading all these, like, you know, neuroscience papers that basically said this is how the brain works, right?
0:08:35 Yeah.
0:08:40 And I was in Slovenia, I was isolated to kind of have my own take on how that would look like.
0:08:54 And then I was reading this deep learning paper, it was papers, it was really appealing to me and was kind of, you know, like going between Berkeley and U of T and I decided to go to U of T to kind of like learn, learn from, you know, Jeff Hinton and people like that.
0:08:57 And that’s why I landed here, yeah.
0:08:58 Amazing.
0:09:00 And so you’ve been in Toronto since?
0:09:09 Yeah, I mean, I was there for a postdoc and then I got, like, a research assistant professorship in Chicago.
0:09:13 So I did a small stop, you know, there for a year and a half.
0:09:14 Okay.
0:09:18 Then this position, faculty position, opened at U of T and then came back 2014.
0:09:19 Amazing.
0:09:24 And so now you head up the NVIDIA Spatial Intelligence Lab in Toronto.
0:09:24 That’s right.
0:09:25 Yeah, yeah.
0:09:27 I joined NVIDIA, that was 2018.
0:09:29 So about seven years ago.
0:09:29 Seven years ago.
0:09:34 I actually met Jensen at a computer vision conference, that was 2017.
0:09:38 And we had a really great chat about simulation.
0:09:43 I was already working on simulation for robotics at the time and I was telling him about it.
0:09:45 And I think he was also thinking about it, so it was a great conversation.
0:09:53 And then later he gave me a call or we went on a call and he said, you know, come work with me.
0:09:58 And I had other options, but the fact that he said, come work with me and not for me, just
0:10:04 told me everything about he was joining and that was it, you know?
0:10:05 What a great story.
0:10:05 That’s fantastic.
0:10:07 So tell us about the lab.
0:10:13 What is, for those listening who might not fully get the term, what does spatial intelligence
0:10:16 mean and what’s the charter of your team?
0:10:17 What are you doing at the lab?
0:10:22 And how, you may have just said, sorry, I was imagining Jensen, you know, that whole conversation.
0:10:24 So, but when was the lab founded?
0:10:26 2018, May.
0:10:28 That was basically with me.
0:10:31 And then we slowly grew and also increased scope.
0:10:34 So we recently renamed ourselves to spatial intelligence.
0:10:37 I would say it’s a, you know, new encompassing word.
0:10:42 So spatial intelligence essentially denotes intelligence in 3D, right?
0:10:44 Intelligence in a 3D world.
0:10:49 So the same as we have LLMs representing intelligence in language, you have this, all this family
0:10:53 of visual language models for intelligence into the images.
0:10:56 Now we need to build the same capabilities, but in 3D.
0:11:00 And the question is, of course, you know, what that is and why?
0:11:06 Maybe I’ll motivate with robots because really, you know, that’s one of the prime motivations.
0:11:13 So at the end of the day, like robots need to operate in the physical world, in our world.
0:11:18 And this world is three-dimensional and conforms to the laws of physics.
0:11:20 And there’s humans inside, right?
0:11:21 That we need to interact with.
0:11:28 You know, we typically hear the term such AI that operates in a real physical world as
0:11:28 physical AI.
0:11:31 So I’ll maybe use that term quite a lot, right?
0:11:32 Yep.
0:11:38 Physical AI is really kind of the upcoming big industry, very likely larger than generative
0:11:40 and agentic AI.
0:11:45 You know, Jensen typically says everything that moves, all devices that move will be autonomous,
0:11:46 right?
0:11:47 So that’s kind of the vision.
0:11:53 So a robot to operate in the real world, obviously needs to understand the world.
0:11:54 What am I seeing?
0:11:56 What is everything I’m seeing, doing?
0:12:00 How is it going to react to my action, right?
0:12:01 So understanding.
0:12:07 It needs to act, you know, if I want to drive you from A to B, make you dinner, you know, I
0:12:10 need to actually like control that robot to make an action.
0:12:16 But then there are two other capabilities needed that are perhaps a bit less obvious.
0:12:22 So basically, it’s 3D virtual world creation and modeling and simulation.
0:12:28 And the reason is that robots need to have like a virtual playground that almost perfectly
0:12:33 or like we would like it to mimic the real world as faithfully as possible, where basically
0:12:37 they can train their skills and also test their skills before we’re going to deploy them in
0:12:38 the real world.
0:12:43 Like this is basically the critical thing we need to solve for deployment of robots.
0:12:49 Basically, spatial intelligence kind of comprises this kind of four core capabilities, which
0:12:49 is modeling.
0:12:55 So creation of virtual world, but then also, you know, modeling it, how it evolves in time
0:13:00 based on our action, understanding and action in 3D world.
0:13:02 And applications are more than robots.
0:13:08 You know, architecture, construction, gaming, everyone that kind of has 3D data, 3D world data.
0:13:12 We first started with this virtual world creation.
0:13:19 So content creation and then we, because in order to develop the spatial intelligence, you
0:13:22 also need physics, which evolves in time and understand.
0:13:24 A year or so ago, maybe less.
0:13:28 There’s so much has happened with generative AI in particular in the past few years that
0:13:30 it’s kind of blurs together sometimes when I talk about it.
0:13:34 But I remember when video models started coming out.
0:13:36 The first, you know, Sora from OpenAI and some of the other ones.
0:13:43 And discussion around, well, these video models are actually also physics simulations.
0:13:47 You know, we’re discovering, we thought we were making a video model, but now we’re realizing
0:13:52 that, you know, there are properties of physics happening inside of the videos that are output
0:13:54 and all of these things.
0:13:56 What makes a good physics model?
0:14:01 And when you’re talking about modeling things that are going to happen in the future, I’ve
0:14:05 also heard, you know, that described as, well, what an AI model does is really predicting what’s
0:14:07 going to happen in the future, right?
0:14:10 And if it’s a video that’s output, it’s sort of frame by frame.
0:14:14 How do you think about the four things you just described relating to one another?
0:14:20 And I don’t know, maybe you can talk a little bit about the physical AI in particular and
0:14:25 how the evolution of how these models came to be, you know, so accurate that we can now
0:14:27 use them in simulations.
0:14:28 Yeah.
0:14:36 So NVIDIA Cosmos and the models you’re describing, right, Sora, VO3 and so on, learn their capabilities
0:14:36 from videos.
0:14:43 And especially NVIDIA Cosmos is kind of targeting physical AI, which really means that it’s doubling
0:14:49 down on modeling physics, not necessarily the creative aspects, but physics, capturing how
0:14:50 our world works.
0:14:56 So it’s forming this world simulation capabilities by learning purely with videos.
0:15:01 And we specifically target collecting videos that are real world recordings.
0:15:03 You know, there’s no human editing involved.
0:15:08 And if there’s any graphics data, it’s actually all physically simulated.
0:15:13 How we’re using physics is mainly for benchmarks, actually.
0:15:16 So you want to create, because you have full control, right?
0:15:21 I can have two bouncing balls, three bouncing balls with this material, you know, and more
0:15:22 complex world.
0:15:26 And there you can really go like, you know, every single test.
0:15:27 How good are you at that?
0:15:28 How good are you at that?
0:15:29 And that’s our test.
0:15:31 And you kind of hill climb that performance.
0:15:34 Yeah, it’s an evolution of models, right?
0:15:40 So the first world model came out, I think it was Juergen Schmidt-Huber, right?
0:15:42 2019.
0:15:44 It was almost parallel to us.
0:15:50 Our scheme like a few months later, where the idea was really kind of like AI replaces
0:15:51 the game engine, kind of.
0:15:54 You know, AI creates the world.
0:15:56 You have the user interaction.
0:15:58 Next frame is not human written code.
0:16:00 It’s the air.
0:16:01 It’s generated, yeah.
0:16:02 Obviously, that was early on.
0:16:05 It was, I forgot exactly what they were using.
0:16:06 We were using GAN.
0:16:07 Ours were called Game GAN.
0:16:13 So we trained it on Pac-Man, you know, so you could actually play Pac-Man on a key.
0:16:13 Right, right.
0:16:15 Like, the frames for AI.
0:16:20 We had an episode of the podcast with somebody who created GAN Theft Auto.
0:16:23 So like Grand Theft Auto, but being generated.
0:16:24 Oh, that was yours.
0:16:25 Okay.
0:16:26 Yeah, that was our stuff.
0:16:27 Cool, that’s cool.
0:16:28 Yeah, yeah, great.
0:16:31 I, forgive me, I don’t remember offhand who the guest was, but yep, that was so cool.
0:16:36 We used the code, so, you know, people just got crazy, and it was amazing to see what, you
0:16:37 know, where it went.
0:16:38 Yeah.
0:16:40 Yeah, we actually also applied it to driving.
0:16:41 That was 2021.
0:16:42 It was called DriveGAN.
0:16:49 You know, some technology, but just a lot of autonomous driving videos, and it almost kind
0:16:50 of became a driving simulator.
0:16:55 You know, Cosmos really took to new heights, but at the time, it was kind of like imagining
0:16:58 how this could be useful for physical applications.
0:17:03 So that was all kind of GAN-based with all kind of known limitations.
0:17:08 And, you know, in the meantime, diffusion models came out, and it was clear that, you know,
0:17:11 like that’s also the next big leap in video modeling.
0:17:18 And actually, 2023, we kind of partnered up with some of the students that did the latent
0:17:24 diffusion that really was kind of a big breakthrough in images because you didn’t model pixels anymore,
0:17:29 but these kind of latent codes made it significantly more efficient.
0:17:34 So we kind of applied that and extended that to video, and that led to video LDM, which really
0:17:38 became, you know, you could see the future by looking at those results.
0:17:44 Obviously, it was not Sora yet or, you know, Cosmos, but like we were on to something, right?
0:17:49 And then, you know, the industry actually kind of switched to this latent diffusion architecture.
0:17:55 And then, you know, then it’s about scaling, and obviously the architecture changed a little bit
0:17:58 behind the scenes and data and so on.
0:18:02 And that basically is creating the modern age models.
0:18:05 So I understand that your lab has grown recently.
0:18:11 Can you talk a little bit about the new areas that the lab’s now encompassing
0:18:15 and how that kind of furthers the overall goals, the overall charter of the lab?
0:18:16 Yeah, yeah.
0:18:22 So when I joined, we joined Rev’s organization, and Rev was building Omniverse.
0:18:27 Omniverse is this, you know, like state-of-the-art simulation platform
0:18:30 where robots can be robots, as Jensen says it.
0:18:30 Right.
0:18:35 And talking to Rev at the time, he mentioned, you know,
0:18:36 there was a huge team working on it.
0:18:39 Obviously, they were able to render really fast.
0:18:42 You know, they had this real-time ray tracing and so on.
0:18:46 So really kind of the key missing piece was content.
0:18:49 And mine, this was like 2018, right?
0:18:52 I was like baby primes for that.
0:18:54 And that’s how we started.
0:18:58 I said, okay, like, how can we actually make this platform workable,
0:19:03 especially for physical AI, where it’s really about modeling the world,
0:19:05 which is messy, diverse, you know?
0:19:08 Like, it’s really, like, challenging.
0:19:14 So we started with content, and yeah, we developed, you know, a bunch of techniques for that.
0:19:19 And through, you know, through kind of the period of our lab, we became more and more ambitious.
0:19:26 And, you know, we realized that the pipeline for physical AI or this 3D spatial intelligence
0:19:31 also needs to change because you need to have, you know, better physics algorithms.
0:19:33 Physical algorithms interact with each other.
0:19:37 Plastic, whether it’s water inside, I can put it on fire.
0:19:42 And, you know, there is no cheating, like, in a game where I can kind of stage it.
0:19:44 Like, this needs to be all simulated.
0:19:45 It’s real.
0:19:45 It’s real.
0:19:47 It needs to feel real, right?
0:19:48 I can put my finger on it.
0:19:49 Bad things happen, right?
0:19:53 Like, the robot, if it’s training there, it needs to kind of experience it in this way.
0:19:58 So, you know, it was clear, kind of, that we need the next evolution of physics, and I can
0:19:59 join the team.
0:20:02 And also, you know, perception is obviously important.
0:20:09 And Laura joined a team, and she was, she’s very interested in 3D perception, but going towards
0:20:15 open world, meaning, you know, like, anything, anything in this room, I should be able to recognize
0:20:18 it and understand my affordances with it.
0:20:20 And then, you know, that can lead to a better action.
0:20:26 So, we expanded the team, basically, like, by building blocks that we actually need, you
0:20:28 know, building the full stack for spatial intelligence.
0:20:30 And you mentioned Omniverse.
0:20:34 Your lab has been very involved with the creation of Omniverse.
0:20:39 What are some of the innovations, some of the research breakthroughs you mentioned, you know,
0:20:40 physics models improving?
0:20:44 What are some of the other innovations that really made Omniverse possible and helped to
0:20:45 grow into what it is today?
0:20:50 Yeah, I think, you know, first of all, Omniverse is created by many teams.
0:20:54 At NVIDIA, right, much, much, much larger than any single team.
0:20:57 Really, kind of the vision of Jensen and Rev.
0:21:04 It has a mountain of technology, you know, for real-time tracing power by this DLSS that
0:21:09 makes, you know, AI in the loop, AI-powered physics, you know, solvers, like I was saying.
0:21:11 So, that’s just scratching the surface.
0:21:14 And I really can’t take credit for any of that.
0:21:19 So, I can maybe tell you a little bit about what we were thinking when we started with
0:21:22 our 3D content creation work.
0:21:26 And I would really say that we doubled down on two directions.
0:21:29 We both turned out to be very important in the end.
0:21:34 And it’s really kind of this perseverance through time that created something of value.
0:21:39 So, the first one was, okay, you know, clearly there’s a graphics pipeline.
0:21:41 We know everything and how that works.
0:21:50 So, why don’t we lift images and videos to 3D to be fully compatible with existing graphics pipelines?
0:21:55 And we really doubled down on differentiable rendering as this foundational technology.
0:21:58 Meaning, you know, graphics goes from 3D and renders to images.
0:22:03 This is differentiable, meaning kind of like amenable to AI.
0:22:09 So, this path led to, you know, one of the first image-to-3D models that we’re called Danvers.
0:22:13 One of the first generative models of 3D assets, GAS 3D.
0:22:18 And as the latest achievement, we also made foundational improvements for 3D Gaussian splats.
0:22:24 I don’t know whether I need to explain that in further detail, but essentially, it’s a really, you know,
0:22:30 a really, like a new neural graphics primitive that you can easily optimize from videos.
0:22:33 And we added retracing capabilities to it.
0:22:41 And at Seagraph, we actually announced integration of, we call it 3D, G-R-U-T, 3D Groot, Omniverse.
0:22:46 So, basically, now you can download Omniverse or Isaac, which basically helps you train robots.
0:22:52 You can scan, you know, with your phone or whatnot, this environment, and boom, you have it in Isaac.
0:22:54 And you can start, you know, training robots just here.
0:22:57 Like, there is no, you know, you don’t take a few weeks for it.
0:22:58 It’s amazing.
0:23:03 It all makes sense in terms of, you know, looking at the way you’re describing the way things have built up
0:23:05 and building blocks and adding features.
0:23:06 And, oh, cool, that makes sense.
0:23:10 And then I sort of listened to you describe, like, oh, take your phone, wave it around the room,
0:23:12 and now the robot can train in the room.
0:23:15 And it’s still, it’s just so exciting.
0:23:15 It’s so mind-blowing.
0:23:17 It’s very cool.
0:23:18 Yeah, it’s exciting.
0:23:20 But that’s basically what you want, right?
0:23:20 Like, scale.
0:23:25 I want to just go and take what’s only here and then sim, right?
0:23:27 And boom, the robot is training.
0:23:36 So the second one, the second path is we kind of saw the fundamental, some fundamental limitations of this graphics pipeline
0:23:39 because, you know, you need to also model agents and physics.
0:23:42 Like, it all kind of, you know, also felt daunting.
0:23:47 So we also made this bold approach of AI that is basically the world model, right?
0:23:51 That does the whole content creation, world simulation based on user interaction and all it’s one.
0:23:54 And that was the chain of models that you described earlier, right?
0:24:01 So, like, two different things that all now kind of, like, came together in, like, really, I think, useful capabilities.
0:24:02 Yeah.
0:24:11 So how has the advent of AI and 3D content creation and sort of specifically in workflows changed the way that people get the work done,
0:24:17 the way that researchers or designers can create objects and create scenes and kind of manipulate things?
0:24:20 What’s the impact of AI been so far on these workflows?
0:24:26 Yeah, I think, like, this technology really democratizes access to these tools.
0:24:29 And basically, it gives everyone the chance to become a creator.
0:24:33 You know, I have no idea how to use 3D software.
0:24:36 I mean, try it a few times, but now I could be reasonable, you know?
0:24:42 I could be reasonable if I wanted to, you know, actually use robotics.
0:24:45 I can reasonably get, you know, this object in a simulated world.
0:24:54 The cool thing is that it also gives additional superpowers, you know, right, to creators that have the talent, you know?
0:25:01 So artists, designers, they can actually use this technology to now do many more creative things.
0:25:04 I have seen so much amazing stuff coming out that I wouldn’t even think of.
0:25:10 I think it’s really kind of empowering to the entire population in different ways, which is great to see.
0:25:13 We had Danny Wu from Canva on recently.
0:25:16 He’s the head of AI products there, if I got that right.
0:25:24 And he was describing a similar thing, but kind of more on a level I could relate to, because I write, I talk, I mostly work with words.
0:25:27 I can’t draw or paint to save my life, you know?
0:25:34 And so that ability, having that superpower, if I want to see how something might look, an idea, it lets me do that now, right?
0:25:40 And so I can only imagine the 3D physical world with, you know, 3D design and talking about simulations.
0:25:42 The stuff you’ve seen must be pretty cool.
0:25:43 Yeah, I think so.
0:25:58 So talking about this 3D world and physical AI, and you spoke to it a little bit earlier, but how are all of these advances with the technology and computer vision included, enabling robotics, autonomous vehicles?
0:26:06 You talked about it a little bit, but maybe you can kind of put a point on, you know, how physical AI has really started to take off.
0:26:16 Yeah, if it’s anything that people take away from this talk, is that physical AI can’t scale through real-world trial and error.
0:26:17 Yeah.
0:26:26 Like, it’s simply not possible to put my car out there or a robot out there, and it’s going to mess up my kitchen here by bumping everything and so on, right?
0:26:32 This is super expensive, unsafe, and it’s just going to take us forever to get there, right?
0:26:36 So simulation is really the answer here.
0:26:53 All right, and if we do it right, if we are actually able to use computer vision and other techniques to basically go somehow create, you know, these virtual worlds that feel real, then it’s possible to train this kind of parallel virtual universe and safely.
0:27:02 Essentially, basically, you know, accelerating time before we can deploy robots and also bringing the overall cost down, right?
0:27:06 Because now we are doing it in the cloud as opposed to having this very…
0:27:09 To remodel your kitchen after every test, yeah.
0:27:16 So what are some of the methods that are key to making simulations physically accurate or more physically accurate as we go?
0:27:30 Yeah, I think, like, the jury is still out how exactly to achieve, you know, physically accurate, like something I can completely trust, simulation at a scale, diversity, and realism of the real world.
0:27:36 It’s hard in a traditional way with this, like, different physics sellers, you know, that’s so hard.
0:27:40 For the world models, it’s also kind of hard.
0:27:48 There’s still hallucinations and, you know, sometimes objects disappear, go one in another, and obviously that’s going to keep improving.
0:27:53 So the likely success is going to come in some sort of a combination of both.
0:28:02 And obviously, we’re going to keep pushing on each direction as good as possible, and maybe in between until we reach a point.
0:28:04 There is a combination of that, right?
0:28:11 Using these world models with this more traditional approach that really makes sure that physics and simulation is correct.
0:28:24 Yeah, and the other very important message is that in computer vision and robotics, the really big breakthrough is a VLM, so visual language model, that is able to reason.
0:28:24 Okay.
0:28:32 This is basically, you know, how humans navigate the long tail of very diverse and rare scenarios, so the physical world.
0:28:38 So we’re kind of bringing that knowledge from language into, you know, the physical world.
0:28:49 We’re encountering completely new situations that we have never seen before in training, and now the VLM could come to our rescue, basically like release all this long tail.
0:28:53 And that is really kind of the discontinuity from before, right?
0:28:56 That is a tool that we have now that before was missing.
0:29:00 So that’s probably the most bold statement I can make right now.
0:29:02 Fair enough.
0:29:08 Do the traditional methods get baked into these models, or how do you go about combining them, the two approaches?
0:29:16 Yeah, what you could do, for example, is you can use kind of the traditional way, which is also not full of AI everywhere, right?
0:29:24 To kind of have a course simulation in 3D with solvers that we know, we know how to model certain effects, right?
0:29:30 So you can make that simulation, you can render it out, and that becomes a guidance to a world model.
0:29:32 You know, I take that as input.
0:29:32 Right, okay.
0:29:42 It’s kind of telling me, oh, I should be roughly here and here, and then it becomes much more feasible to create pretty pixels out of that, both in time and space.
0:29:44 So that’s kind of like what we’re thinking right now.
0:29:44 Yeah.
0:29:47 I’m speaking with Sonia Fidler.
0:30:06 Sonia is a vice president of AI research at NVIDIA, and we’re talking about the work that her spatial intelligence lab in Toronto has been doing, along with the evolution of AI and models and solvers in the mix and all the things that go into making these models more accurate so we can rely on them and trust them, Sonia, as you were saying.
0:30:15 And we’ve also talked a little bit about SIGGRAPH, and I mentioned at the top that you gave the keynote a special research address alongside some other NVIDians at this year’s SIGGRAPH.
0:30:22 What are some of the notable things from NVIDIA’s presence at the show that maybe we can impart to listeners here?
0:30:26 What are some of the things that they should take away from what NVIDIA did at SIGGRAPH?
0:30:34 Yeah, I mean, this year at SIGGRAPH, we really tried to send a message on physical AI in the keynote.
0:30:42 And the reason is because this is a really important area with big impact, and the SIGGRAPH community has a lot to give.
0:30:42 Yeah.
0:30:43 A lot to give.
0:30:45 A lot of expertise is already there.
0:30:49 The Gaussians, flats, nerves, I mean, that all comes out of that, right?
0:30:53 We discussed simulation is key, like I literally mean simulation is key.
0:31:00 So everyone in the audience should feel empowered to help us in this quest of robotics, you know?
0:31:04 And the cool thing is that it feels also early stage.
0:31:06 Like I said before, it’s open-ended.
0:31:08 We don’t kind of know yet, you know?
0:31:10 We are hypothesizing.
0:31:15 So we really hope the audience kind of connects with, here’s a new challenge for you.
0:31:15 Yeah, yeah.
0:31:19 How can you, you know, what do I do next, right?
0:31:25 And graphics is very mature, but here is a new challenge for you that maybe needs to, you know, think outside the box.
0:31:32 So I think the key to success, you know, we suspect will be the combination of Cosmos.
0:31:35 And so this is NVIDIA’s World Foundation platform.
0:31:40 And this is both video generation, so simulation of the world, as well as reasoning.
0:31:48 Reasoning about the laws of physics, reasoning about all the agents in the scene, the scenarios, and so on, and physics simulation.
0:31:59 So all these three pieces interacting together, you know, that’s our bet of creating really physically, but also semantically accurate simulations of the real world in the future.
0:32:05 So I think that’s really kind of like what I hope the Seagraph audience takes away from the keynote.
0:32:13 Yeah, that spirit of, I mean, it even goes back to what you said about when you met Jensen and he invited you to come work with him, right?
0:32:19 That spirit of collaboration, you know, open source being what it is, conferences, obviously.
0:32:33 But we’ve had guests across all different industries come on the show and talk about how important it is to, you know, share research and trade notes with other people who, on the other side of the world, working in other industries, et cetera.
0:32:43 As AI continues to touch and evolve and change, you know, virtually every industry you can think of, how important is that?
0:32:46 Or what are you getting from this experience of working across so many industries?
0:32:50 And does it feel like AI is kind of bringing industries together?
0:32:59 Or does it feel like, you know, different industries are kind of hunkering down and siloing in their own approach to how they use these immersion technologies?
0:33:03 Yeah, it’s definitely bringing them in, bringing them together, right?
0:33:06 Because the workflows essentially are very similar.
0:33:08 At some point, they’re very similar.
0:33:12 And the difference is the data and the expertise, domain expertise that’s different.
0:33:20 And actually, there is even sharing, you know, how I do autonomous driving versus humanoid robots versus factory simulation architecture.
0:33:38 There’s some commonalities between things that could be shared and, you know, having this kind of like data-driven approach to simulation could really bring industries together and benefit from one another and build tech that can essentially make all of us, all of these industries better.
0:33:40 And open source, you mentioned open source.
0:33:43 I am a believer in open source.
0:33:47 And it’s great to see NVIDIA is also a big believer in open source.
0:33:51 Like I said, you know, in a lot of areas, we’re also still early on.
0:33:56 And that’s the only way to keep progress going, you know, and really build up these capabilities.
0:34:09 Even though in a lot of ways, the timeframe that we’ve been talking about the last, you know, seven, eight years in particular isn’t all that long in AI terms and particularly in this recent, you know, kind of generative AI revolution, it’s a long time.
0:34:17 So you’ve been doing this, you know, almost since the beginning of early object recognition and AI kind of now through to everything we’re talking about today.
0:34:19 What’s next on the horizon?
0:34:32 And is there a breakthrough that you’re either sort of waiting for or, you know, maybe more secretly kind of thinking like, I think this is going to happen soon, whether it’s something that’s particular to your own work that you’re doing or kind of more broadly.
0:34:38 What’s the next big breakthrough in AI that you’re looking forward to or maybe just kind of hoping to see?
0:34:41 Well, I think it’s going to be robots.
0:34:41 All right.
0:34:45 So I started the story with my sister in a cart.
0:34:45 Your sister.
0:34:57 And me dreaming about a robot taking the dog out in the morning and my parents made me do and I really like to sleep in the morning.
0:34:59 That was kind of the early dream.
0:35:01 My grandma lived 30 years alone.
0:35:03 My grandpa died quite early.
0:35:15 So the first talks I gave as a faculty were all started with a grandma and a Wally cute like robot in a kitchen talking to each other, you know,
0:35:23 and robot helping where she’s kind of common thread of let’s build this technology because it can be really powerful and useful.
0:35:32 And I now believe after many years in this field that we’re likely going to see that in our lifetime.
0:35:32 Yeah.
0:35:39 Robots in some form, autonomous cars are already, you know, out there to some extent, right?
0:35:41 And more is coming.
0:35:45 So that’s that’s the breakthrough I’m looking forward to.
0:35:45 Yes.
0:35:49 Have you ever had a robot in your home with you for a period of time?
0:35:50 You ever lived with a robot?
0:35:54 I have a robot that kind of wipes the floor.
0:35:57 I would look to have something that does more than that.
0:36:05 So, Sonia, as we look to wrap up here, and this has been fantastic again, thank you for taking the time.
0:36:12 What advice would you give to researchers out there who are interested in the work the Spatial Intelligence Lab is doing,
0:36:18 might be interested in collaborating, working with you in some way, joining the lab, collaborating from afar?
0:36:28 And then in particular, what are some of the skills and research areas that you think are becoming increasingly important now and, you know, will continue to be at least for the next few years?
0:36:33 Yeah, actually, the bar that we have is both low and high.
0:36:34 So I’ll explain what I mean.
0:36:40 So I think what we are looking for is people with immense passion.
0:36:45 You know, I feel like I still haven’t lost the passion of the first day.
0:36:48 You know, I wake up and I am excited.
0:36:52 So I think the passion is what drives us forward, the energy.
0:36:54 This is only a podcast, but the passion comes through in your voice.
0:36:56 So yeah, I’d say you’re doing all right.
0:37:03 Yeah, I mean, that’s what drives us because, you know, as a researcher, life is not easy.
0:37:06 Most of the time things don’t work, you know, and that’s basically what we do.
0:37:09 It could be six months, a year where you don’t get that result.
0:37:15 So you’re really that kind of like, you know, passion and energy that makes you keep going.
0:37:21 I think kind of wanting and having the ability to go technically deep is very important.
0:37:29 You know, not jump from one thing to another as things go hard, but like, let’s learn really the fundamentals to the level we need so we can innovate.
0:37:35 And I guess like maybe to my first point, the high level of perseverance, right?
0:37:40 Like we want to keep going and there is no wall thick enough, right?
0:37:43 The rest, I think we can teach people.
0:37:48 Like if you have these basic things, a lot of the other stuff comes along.
0:37:48 Yeah.
0:37:52 So in terms of like, you know, it’s mostly also interest, right?
0:37:58 So we are very interested in this 3D world, modeling and understanding 3D worlds.
0:38:05 So people that are interested and shared passion for the same topic, you know, please contact us.
0:38:07 We would be ready to work with you.
0:38:08 Fantastic.
0:38:16 For listeners who want to learn more about the lab, about the work we’ve been talking about, where are the best places to go online?
0:38:17 Is there a homepage for the lab?
0:38:19 Is it on the NVIDIA site?
0:38:21 Any social media handles to follow?
0:38:23 Where would you send them?
0:38:24 It’s all on the NVIDIA website.
0:38:34 Probably if you search for any special intelligence lab NVIDIA or Toronto NVIDIA, which is our old name, it should pop out.
0:38:35 Yeah.
0:38:35 Fabulous.
0:38:37 Sonia, again, this has been great.
0:38:46 I think that the story, just imagining, you know, your dad telling you those stories as a kid and with your sister and then the advice your grandma gave you.
0:38:48 Overcome your fear, get out there.
0:38:49 Just absolutely fantastic.
0:38:56 Congratulations to you and all the team, everyone you work with for all the work you’ve been doing and SIGGRAPH, of course.
0:39:00 And we really look forward to following your progress in the future.
0:39:01 Best of luck.
0:39:01 Yeah, thanks.
0:39:03 It was really fun talking to you.
0:39:35 Best of luck.
0:39:40 Best of luck.
0:00:20 This past August, three of NVIDIA’s research leaders gave a special address at SIGGRAPH,
0:00:24 the annual International Computer Graphics and Interactive Techniques Conference
0:00:29 that’s been running since 1974. One of those people is here with us today.
0:00:35 Sonia Fidler is VP of AI Research at NVIDIA, where she leads the NVIDIA Spatial Intelligence Lab
0:00:40 in Toronto, Ontario, Canada. Sonia is here to tell us about the lab, to talk about the research she’s
0:00:45 most excited about right now, including what was presented at SIGGRAPH, and to share a little bit
0:00:50 about her own journey through the worlds of research and artificial intelligence. So without further
0:00:55 ado, let’s get to it. Sonia Fidler, welcome, and thanks for joining the AI Podcast.
0:01:00 Hi, Noah, and hi, audience. I’m very excited to be on this AI Podcast.
0:01:06 We are very excited to have you. Thanks for taking the time. There’s a lot going on, obviously,
0:01:13 and congratulations on the special address and everything else at SIGGRAPH. So we wanted to
0:01:18 start with a little bit about your own journey. You followed your passion for computer vision
0:01:24 and artificial intelligence across Europe and into North America. Can you tell us a little bit about
0:01:28 what first got you interested in the field and how your journey took you to Toronto?
0:01:37 So, maybe I’ll start with my youth, and there were actually three important breakpoints that led me to
0:01:44 where I am. And the first one actually starts with my dad. So my dad would sit on a chair next to my
0:01:50 sister and me and tell us bedtime stories. And surprisingly, she was very good at it. He was a scientist,
0:01:57 scientist, and so he would tell us stories about scientists. For example, he would tell us about
0:02:02 Nikola Tesla, who was born in Croatia, and my mom was also born in Croatia. I was born in Slovenia.
0:02:04 And was this in Slovenia? Where did you grow up?
0:02:12 Yeah, I grew up in Slovenia. So my mom was born in Croatia, my dad was born in Slovenia, and I was born in
0:02:38 So he would tell us stories about how, at the young age, Nikola jumped from the roof of their house holding an open umbrella, thinking he would fly. And every night there would be a new episode about his inventions, the creation of radio, alternating current. Obviously, we didn’t understand what he made it sound. And the competition with Thomas Edison, right? It would be almost like a Netflix series.
0:02:51 And for a child, this was very exciting. So I just could not wait to hear more to the next day. So kind of like my childhood, heroes were not movie stars or music stars. They were scientists.
0:02:52 Awesome.
0:03:16 That really kind of shaped me. So perhaps not surprising, you know, one day I appear in front of my parents and proclaim, I want to be an inventor. And there was even a photo of me and my sister. Actually, my sister dressed as a robot, quite a bit of money for her to put some cardboard boxes around her. And maybe not surprising, she became an economist.
0:03:38 So this, you know, my mom pretty much settled my profession. I was going to be a mentor and this was very young age. The second moment was really thanks to my mom. And this was in primary school. I was, you know, very young. And at some point I got pretty ill. It was something like COVID almost. I think it was called whooping cough or something.
0:03:51 I was home. Fever, coughing two to three months. And I basically missed a lot of school. I missed like, you know, fractions. A whole big chapter on math.
0:04:12 I had no idea. So I come back to school and of course I didn’t understand anything they were talking about. And I developed some sort of resistance in going to school. And before a math test, I threw a tantrum, like crying hysterically on the floor. I hate math. I don’t want to go back to school.
0:04:20 And my mom is actually a teacher. And of course, you know, having a school-hating child, that was not an option.
0:04:23 Yeah, it happens.
0:04:37 So even though she was an English teacher, she would sit down with me and work with me on the math. And she made it really interesting. So she had this really nice way of teaching me through, giving me puzzles, math puzzles.
0:04:50 And I began to like really love it. You know, as kids understand things, they also love it. And I think to this day, what drives me at the core is solving problems. And I think that’s still kind of stuck with me.
0:04:58 It’s pretty much settled, you know, what I would study. I was determined age, I don’t know, 12, 13, I was going to study math.
0:05:11 And the third moment was really, you know, kind of thanks to my grandma. So I was already doing my PhD and I decided to work on computer vision.
0:05:20 And I saw this one talk. It started actually with math, even my PhD. And I saw this talk on someone recognizing cats and dogs. And I was very early AI at that point.
0:05:29 And it just kind of spoke to me, you know, I was always kind of dreaming of robots and computer vision felt like the first step to do.
0:05:41 So, you know, I was there doing my PhD and my grandma, you know, she was a very smart woman. She was actually one of the first female plastic surgeons in Yugoslavia.
0:05:42 Oh, wow.
0:05:53 Yeah, she was always telling me these stories, you know, how she graduated in med school. And the day they were graduating and they were out having fun and, you know, sirens came out, World War II started.
0:05:59 And, you know, she had to basically just go to the operating room and that was her next four years.
0:06:09 And, you know, like basically fear became alien to her. And it wasn’t for me really, right?
0:06:17 So I studied my PhD in Slovenia, really for the fear of leaving, leaving, right? To the wide open world alone as a woman.
0:06:23 And I was somehow not encouraged. My mom would scare the hell out of me of doing that.
0:06:32 So towards the end of my PhD, and I was kind of working on this AI, like something similar to deep networks, just my own take on it.
0:06:42 I was presenting at a conference and a famous professor at UC Berkeley stopped by the poster, really likes it and invites me to visit his group at Berkeley.
0:06:49 And, you know, I was beyond excited, but I still carried this kind of weight of fear and expectation.
0:06:55 And I talked to my grandma and she said, you know, Sonia, don’t listen to your mom. Just go.
0:07:00 Actually, she passed away a few months later. That was January 13, 2009.
0:07:03 And the next thing I remember, I’m sitting on a plane.
0:07:09 I look at my plane ticket to California and it was January 13, 2010.
0:07:12 It was exactly one year later.
0:07:13 Exactly a year.
0:07:17 Exactly. I’m not kidding. Like, this was exactly, it was all…
0:07:17 It was meant to be.
0:07:23 Meant to be. I was both scared and excited, but, you know, chapter two of my life was…
0:07:24 Right.
0:07:28 Was that your first time traveling abroad?
0:07:31 No, I would go before, you know, just visit New York with my family.
0:07:32 Okay.
0:07:35 This was the first time I went alone and, like, living.
0:07:35 Yeah, very different.
0:07:39 You know, I got my bags and here it was, you know.
0:07:40 Yeah.
0:07:41 It was scary, but…
0:07:43 And landed in Berkeley, of all places.
0:07:51 I spent there a few months, seven, eight months, and came back, graduated, and then I did my postdoc and I was at U of T.
0:07:54 So that’s kind of what brought me to Toronto.
0:07:56 Right. Amazing.
0:08:08 I feel like the graphics and interactive industry owes a big thank you to many members of your family for all the inspiring you and then grandma kind of giving you that nudge and everything.
0:08:09 That’s amazing.
0:08:10 Why Toronto?
0:08:13 What was the link that brought you to Toronto?
0:08:20 Yeah, actually, the U of T, University of Toronto, was doing really great stuff in deep learning.
0:08:23 And like I said before, that was kind of my PhD.
0:08:27 I was really inspired by doing this hierarchical representation to recognize objects.
0:08:35 I was reading all these, like, you know, neuroscience papers that basically said this is how the brain works, right?
0:08:35 Yeah.
0:08:40 And I was in Slovenia, I was isolated to kind of have my own take on how that would look like.
0:08:54 And then I was reading this deep learning paper, it was papers, it was really appealing to me and was kind of, you know, like going between Berkeley and U of T and I decided to go to U of T to kind of like learn, learn from, you know, Jeff Hinton and people like that.
0:08:57 And that’s why I landed here, yeah.
0:08:58 Amazing.
0:09:00 And so you’ve been in Toronto since?
0:09:09 Yeah, I mean, I was there for a postdoc and then I got, like, a research assistant professorship in Chicago.
0:09:13 So I did a small stop, you know, there for a year and a half.
0:09:14 Okay.
0:09:18 Then this position, faculty position, opened at U of T and then came back 2014.
0:09:19 Amazing.
0:09:24 And so now you head up the NVIDIA Spatial Intelligence Lab in Toronto.
0:09:24 That’s right.
0:09:25 Yeah, yeah.
0:09:27 I joined NVIDIA, that was 2018.
0:09:29 So about seven years ago.
0:09:29 Seven years ago.
0:09:34 I actually met Jensen at a computer vision conference, that was 2017.
0:09:38 And we had a really great chat about simulation.
0:09:43 I was already working on simulation for robotics at the time and I was telling him about it.
0:09:45 And I think he was also thinking about it, so it was a great conversation.
0:09:53 And then later he gave me a call or we went on a call and he said, you know, come work with me.
0:09:58 And I had other options, but the fact that he said, come work with me and not for me, just
0:10:04 told me everything about he was joining and that was it, you know?
0:10:05 What a great story.
0:10:05 That’s fantastic.
0:10:07 So tell us about the lab.
0:10:13 What is, for those listening who might not fully get the term, what does spatial intelligence
0:10:16 mean and what’s the charter of your team?
0:10:17 What are you doing at the lab?
0:10:22 And how, you may have just said, sorry, I was imagining Jensen, you know, that whole conversation.
0:10:24 So, but when was the lab founded?
0:10:26 2018, May.
0:10:28 That was basically with me.
0:10:31 And then we slowly grew and also increased scope.
0:10:34 So we recently renamed ourselves to spatial intelligence.
0:10:37 I would say it’s a, you know, new encompassing word.
0:10:42 So spatial intelligence essentially denotes intelligence in 3D, right?
0:10:44 Intelligence in a 3D world.
0:10:49 So the same as we have LLMs representing intelligence in language, you have this, all this family
0:10:53 of visual language models for intelligence into the images.
0:10:56 Now we need to build the same capabilities, but in 3D.
0:11:00 And the question is, of course, you know, what that is and why?
0:11:06 Maybe I’ll motivate with robots because really, you know, that’s one of the prime motivations.
0:11:13 So at the end of the day, like robots need to operate in the physical world, in our world.
0:11:18 And this world is three-dimensional and conforms to the laws of physics.
0:11:20 And there’s humans inside, right?
0:11:21 That we need to interact with.
0:11:28 You know, we typically hear the term such AI that operates in a real physical world as
0:11:28 physical AI.
0:11:31 So I’ll maybe use that term quite a lot, right?
0:11:32 Yep.
0:11:38 Physical AI is really kind of the upcoming big industry, very likely larger than generative
0:11:40 and agentic AI.
0:11:45 You know, Jensen typically says everything that moves, all devices that move will be autonomous,
0:11:46 right?
0:11:47 So that’s kind of the vision.
0:11:53 So a robot to operate in the real world, obviously needs to understand the world.
0:11:54 What am I seeing?
0:11:56 What is everything I’m seeing, doing?
0:12:00 How is it going to react to my action, right?
0:12:01 So understanding.
0:12:07 It needs to act, you know, if I want to drive you from A to B, make you dinner, you know, I
0:12:10 need to actually like control that robot to make an action.
0:12:16 But then there are two other capabilities needed that are perhaps a bit less obvious.
0:12:22 So basically, it’s 3D virtual world creation and modeling and simulation.
0:12:28 And the reason is that robots need to have like a virtual playground that almost perfectly
0:12:33 or like we would like it to mimic the real world as faithfully as possible, where basically
0:12:37 they can train their skills and also test their skills before we’re going to deploy them in
0:12:38 the real world.
0:12:43 Like this is basically the critical thing we need to solve for deployment of robots.
0:12:49 Basically, spatial intelligence kind of comprises this kind of four core capabilities, which
0:12:49 is modeling.
0:12:55 So creation of virtual world, but then also, you know, modeling it, how it evolves in time
0:13:00 based on our action, understanding and action in 3D world.
0:13:02 And applications are more than robots.
0:13:08 You know, architecture, construction, gaming, everyone that kind of has 3D data, 3D world data.
0:13:12 We first started with this virtual world creation.
0:13:19 So content creation and then we, because in order to develop the spatial intelligence, you
0:13:22 also need physics, which evolves in time and understand.
0:13:24 A year or so ago, maybe less.
0:13:28 There’s so much has happened with generative AI in particular in the past few years that
0:13:30 it’s kind of blurs together sometimes when I talk about it.
0:13:34 But I remember when video models started coming out.
0:13:36 The first, you know, Sora from OpenAI and some of the other ones.
0:13:43 And discussion around, well, these video models are actually also physics simulations.
0:13:47 You know, we’re discovering, we thought we were making a video model, but now we’re realizing
0:13:52 that, you know, there are properties of physics happening inside of the videos that are output
0:13:54 and all of these things.
0:13:56 What makes a good physics model?
0:14:01 And when you’re talking about modeling things that are going to happen in the future, I’ve
0:14:05 also heard, you know, that described as, well, what an AI model does is really predicting what’s
0:14:07 going to happen in the future, right?
0:14:10 And if it’s a video that’s output, it’s sort of frame by frame.
0:14:14 How do you think about the four things you just described relating to one another?
0:14:20 And I don’t know, maybe you can talk a little bit about the physical AI in particular and
0:14:25 how the evolution of how these models came to be, you know, so accurate that we can now
0:14:27 use them in simulations.
0:14:28 Yeah.
0:14:36 So NVIDIA Cosmos and the models you’re describing, right, Sora, VO3 and so on, learn their capabilities
0:14:36 from videos.
0:14:43 And especially NVIDIA Cosmos is kind of targeting physical AI, which really means that it’s doubling
0:14:49 down on modeling physics, not necessarily the creative aspects, but physics, capturing how
0:14:50 our world works.
0:14:56 So it’s forming this world simulation capabilities by learning purely with videos.
0:15:01 And we specifically target collecting videos that are real world recordings.
0:15:03 You know, there’s no human editing involved.
0:15:08 And if there’s any graphics data, it’s actually all physically simulated.
0:15:13 How we’re using physics is mainly for benchmarks, actually.
0:15:16 So you want to create, because you have full control, right?
0:15:21 I can have two bouncing balls, three bouncing balls with this material, you know, and more
0:15:22 complex world.
0:15:26 And there you can really go like, you know, every single test.
0:15:27 How good are you at that?
0:15:28 How good are you at that?
0:15:29 And that’s our test.
0:15:31 And you kind of hill climb that performance.
0:15:34 Yeah, it’s an evolution of models, right?
0:15:40 So the first world model came out, I think it was Juergen Schmidt-Huber, right?
0:15:42 2019.
0:15:44 It was almost parallel to us.
0:15:50 Our scheme like a few months later, where the idea was really kind of like AI replaces
0:15:51 the game engine, kind of.
0:15:54 You know, AI creates the world.
0:15:56 You have the user interaction.
0:15:58 Next frame is not human written code.
0:16:00 It’s the air.
0:16:01 It’s generated, yeah.
0:16:02 Obviously, that was early on.
0:16:05 It was, I forgot exactly what they were using.
0:16:06 We were using GAN.
0:16:07 Ours were called Game GAN.
0:16:13 So we trained it on Pac-Man, you know, so you could actually play Pac-Man on a key.
0:16:13 Right, right.
0:16:15 Like, the frames for AI.
0:16:20 We had an episode of the podcast with somebody who created GAN Theft Auto.
0:16:23 So like Grand Theft Auto, but being generated.
0:16:24 Oh, that was yours.
0:16:25 Okay.
0:16:26 Yeah, that was our stuff.
0:16:27 Cool, that’s cool.
0:16:28 Yeah, yeah, great.
0:16:31 I, forgive me, I don’t remember offhand who the guest was, but yep, that was so cool.
0:16:36 We used the code, so, you know, people just got crazy, and it was amazing to see what, you
0:16:37 know, where it went.
0:16:38 Yeah.
0:16:40 Yeah, we actually also applied it to driving.
0:16:41 That was 2021.
0:16:42 It was called DriveGAN.
0:16:49 You know, some technology, but just a lot of autonomous driving videos, and it almost kind
0:16:50 of became a driving simulator.
0:16:55 You know, Cosmos really took to new heights, but at the time, it was kind of like imagining
0:16:58 how this could be useful for physical applications.
0:17:03 So that was all kind of GAN-based with all kind of known limitations.
0:17:08 And, you know, in the meantime, diffusion models came out, and it was clear that, you know,
0:17:11 like that’s also the next big leap in video modeling.
0:17:18 And actually, 2023, we kind of partnered up with some of the students that did the latent
0:17:24 diffusion that really was kind of a big breakthrough in images because you didn’t model pixels anymore,
0:17:29 but these kind of latent codes made it significantly more efficient.
0:17:34 So we kind of applied that and extended that to video, and that led to video LDM, which really
0:17:38 became, you know, you could see the future by looking at those results.
0:17:44 Obviously, it was not Sora yet or, you know, Cosmos, but like we were on to something, right?
0:17:49 And then, you know, the industry actually kind of switched to this latent diffusion architecture.
0:17:55 And then, you know, then it’s about scaling, and obviously the architecture changed a little bit
0:17:58 behind the scenes and data and so on.
0:18:02 And that basically is creating the modern age models.
0:18:05 So I understand that your lab has grown recently.
0:18:11 Can you talk a little bit about the new areas that the lab’s now encompassing
0:18:15 and how that kind of furthers the overall goals, the overall charter of the lab?
0:18:16 Yeah, yeah.
0:18:22 So when I joined, we joined Rev’s organization, and Rev was building Omniverse.
0:18:27 Omniverse is this, you know, like state-of-the-art simulation platform
0:18:30 where robots can be robots, as Jensen says it.
0:18:30 Right.
0:18:35 And talking to Rev at the time, he mentioned, you know,
0:18:36 there was a huge team working on it.
0:18:39 Obviously, they were able to render really fast.
0:18:42 You know, they had this real-time ray tracing and so on.
0:18:46 So really kind of the key missing piece was content.
0:18:49 And mine, this was like 2018, right?
0:18:52 I was like baby primes for that.
0:18:54 And that’s how we started.
0:18:58 I said, okay, like, how can we actually make this platform workable,
0:19:03 especially for physical AI, where it’s really about modeling the world,
0:19:05 which is messy, diverse, you know?
0:19:08 Like, it’s really, like, challenging.
0:19:14 So we started with content, and yeah, we developed, you know, a bunch of techniques for that.
0:19:19 And through, you know, through kind of the period of our lab, we became more and more ambitious.
0:19:26 And, you know, we realized that the pipeline for physical AI or this 3D spatial intelligence
0:19:31 also needs to change because you need to have, you know, better physics algorithms.
0:19:33 Physical algorithms interact with each other.
0:19:37 Plastic, whether it’s water inside, I can put it on fire.
0:19:42 And, you know, there is no cheating, like, in a game where I can kind of stage it.
0:19:44 Like, this needs to be all simulated.
0:19:45 It’s real.
0:19:45 It’s real.
0:19:47 It needs to feel real, right?
0:19:48 I can put my finger on it.
0:19:49 Bad things happen, right?
0:19:53 Like, the robot, if it’s training there, it needs to kind of experience it in this way.
0:19:58 So, you know, it was clear, kind of, that we need the next evolution of physics, and I can
0:19:59 join the team.
0:20:02 And also, you know, perception is obviously important.
0:20:09 And Laura joined a team, and she was, she’s very interested in 3D perception, but going towards
0:20:15 open world, meaning, you know, like, anything, anything in this room, I should be able to recognize
0:20:18 it and understand my affordances with it.
0:20:20 And then, you know, that can lead to a better action.
0:20:26 So, we expanded the team, basically, like, by building blocks that we actually need, you
0:20:28 know, building the full stack for spatial intelligence.
0:20:30 And you mentioned Omniverse.
0:20:34 Your lab has been very involved with the creation of Omniverse.
0:20:39 What are some of the innovations, some of the research breakthroughs you mentioned, you know,
0:20:40 physics models improving?
0:20:44 What are some of the other innovations that really made Omniverse possible and helped to
0:20:45 grow into what it is today?
0:20:50 Yeah, I think, you know, first of all, Omniverse is created by many teams.
0:20:54 At NVIDIA, right, much, much, much larger than any single team.
0:20:57 Really, kind of the vision of Jensen and Rev.
0:21:04 It has a mountain of technology, you know, for real-time tracing power by this DLSS that
0:21:09 makes, you know, AI in the loop, AI-powered physics, you know, solvers, like I was saying.
0:21:11 So, that’s just scratching the surface.
0:21:14 And I really can’t take credit for any of that.
0:21:19 So, I can maybe tell you a little bit about what we were thinking when we started with
0:21:22 our 3D content creation work.
0:21:26 And I would really say that we doubled down on two directions.
0:21:29 We both turned out to be very important in the end.
0:21:34 And it’s really kind of this perseverance through time that created something of value.
0:21:39 So, the first one was, okay, you know, clearly there’s a graphics pipeline.
0:21:41 We know everything and how that works.
0:21:50 So, why don’t we lift images and videos to 3D to be fully compatible with existing graphics pipelines?
0:21:55 And we really doubled down on differentiable rendering as this foundational technology.
0:21:58 Meaning, you know, graphics goes from 3D and renders to images.
0:22:03 This is differentiable, meaning kind of like amenable to AI.
0:22:09 So, this path led to, you know, one of the first image-to-3D models that we’re called Danvers.
0:22:13 One of the first generative models of 3D assets, GAS 3D.
0:22:18 And as the latest achievement, we also made foundational improvements for 3D Gaussian splats.
0:22:24 I don’t know whether I need to explain that in further detail, but essentially, it’s a really, you know,
0:22:30 a really, like a new neural graphics primitive that you can easily optimize from videos.
0:22:33 And we added retracing capabilities to it.
0:22:41 And at Seagraph, we actually announced integration of, we call it 3D, G-R-U-T, 3D Groot, Omniverse.
0:22:46 So, basically, now you can download Omniverse or Isaac, which basically helps you train robots.
0:22:52 You can scan, you know, with your phone or whatnot, this environment, and boom, you have it in Isaac.
0:22:54 And you can start, you know, training robots just here.
0:22:57 Like, there is no, you know, you don’t take a few weeks for it.
0:22:58 It’s amazing.
0:23:03 It all makes sense in terms of, you know, looking at the way you’re describing the way things have built up
0:23:05 and building blocks and adding features.
0:23:06 And, oh, cool, that makes sense.
0:23:10 And then I sort of listened to you describe, like, oh, take your phone, wave it around the room,
0:23:12 and now the robot can train in the room.
0:23:15 And it’s still, it’s just so exciting.
0:23:15 It’s so mind-blowing.
0:23:17 It’s very cool.
0:23:18 Yeah, it’s exciting.
0:23:20 But that’s basically what you want, right?
0:23:20 Like, scale.
0:23:25 I want to just go and take what’s only here and then sim, right?
0:23:27 And boom, the robot is training.
0:23:36 So the second one, the second path is we kind of saw the fundamental, some fundamental limitations of this graphics pipeline
0:23:39 because, you know, you need to also model agents and physics.
0:23:42 Like, it all kind of, you know, also felt daunting.
0:23:47 So we also made this bold approach of AI that is basically the world model, right?
0:23:51 That does the whole content creation, world simulation based on user interaction and all it’s one.
0:23:54 And that was the chain of models that you described earlier, right?
0:24:01 So, like, two different things that all now kind of, like, came together in, like, really, I think, useful capabilities.
0:24:02 Yeah.
0:24:11 So how has the advent of AI and 3D content creation and sort of specifically in workflows changed the way that people get the work done,
0:24:17 the way that researchers or designers can create objects and create scenes and kind of manipulate things?
0:24:20 What’s the impact of AI been so far on these workflows?
0:24:26 Yeah, I think, like, this technology really democratizes access to these tools.
0:24:29 And basically, it gives everyone the chance to become a creator.
0:24:33 You know, I have no idea how to use 3D software.
0:24:36 I mean, try it a few times, but now I could be reasonable, you know?
0:24:42 I could be reasonable if I wanted to, you know, actually use robotics.
0:24:45 I can reasonably get, you know, this object in a simulated world.
0:24:54 The cool thing is that it also gives additional superpowers, you know, right, to creators that have the talent, you know?
0:25:01 So artists, designers, they can actually use this technology to now do many more creative things.
0:25:04 I have seen so much amazing stuff coming out that I wouldn’t even think of.
0:25:10 I think it’s really kind of empowering to the entire population in different ways, which is great to see.
0:25:13 We had Danny Wu from Canva on recently.
0:25:16 He’s the head of AI products there, if I got that right.
0:25:24 And he was describing a similar thing, but kind of more on a level I could relate to, because I write, I talk, I mostly work with words.
0:25:27 I can’t draw or paint to save my life, you know?
0:25:34 And so that ability, having that superpower, if I want to see how something might look, an idea, it lets me do that now, right?
0:25:40 And so I can only imagine the 3D physical world with, you know, 3D design and talking about simulations.
0:25:42 The stuff you’ve seen must be pretty cool.
0:25:43 Yeah, I think so.
0:25:58 So talking about this 3D world and physical AI, and you spoke to it a little bit earlier, but how are all of these advances with the technology and computer vision included, enabling robotics, autonomous vehicles?
0:26:06 You talked about it a little bit, but maybe you can kind of put a point on, you know, how physical AI has really started to take off.
0:26:16 Yeah, if it’s anything that people take away from this talk, is that physical AI can’t scale through real-world trial and error.
0:26:17 Yeah.
0:26:26 Like, it’s simply not possible to put my car out there or a robot out there, and it’s going to mess up my kitchen here by bumping everything and so on, right?
0:26:32 This is super expensive, unsafe, and it’s just going to take us forever to get there, right?
0:26:36 So simulation is really the answer here.
0:26:53 All right, and if we do it right, if we are actually able to use computer vision and other techniques to basically go somehow create, you know, these virtual worlds that feel real, then it’s possible to train this kind of parallel virtual universe and safely.
0:27:02 Essentially, basically, you know, accelerating time before we can deploy robots and also bringing the overall cost down, right?
0:27:06 Because now we are doing it in the cloud as opposed to having this very…
0:27:09 To remodel your kitchen after every test, yeah.
0:27:16 So what are some of the methods that are key to making simulations physically accurate or more physically accurate as we go?
0:27:30 Yeah, I think, like, the jury is still out how exactly to achieve, you know, physically accurate, like something I can completely trust, simulation at a scale, diversity, and realism of the real world.
0:27:36 It’s hard in a traditional way with this, like, different physics sellers, you know, that’s so hard.
0:27:40 For the world models, it’s also kind of hard.
0:27:48 There’s still hallucinations and, you know, sometimes objects disappear, go one in another, and obviously that’s going to keep improving.
0:27:53 So the likely success is going to come in some sort of a combination of both.
0:28:02 And obviously, we’re going to keep pushing on each direction as good as possible, and maybe in between until we reach a point.
0:28:04 There is a combination of that, right?
0:28:11 Using these world models with this more traditional approach that really makes sure that physics and simulation is correct.
0:28:24 Yeah, and the other very important message is that in computer vision and robotics, the really big breakthrough is a VLM, so visual language model, that is able to reason.
0:28:24 Okay.
0:28:32 This is basically, you know, how humans navigate the long tail of very diverse and rare scenarios, so the physical world.
0:28:38 So we’re kind of bringing that knowledge from language into, you know, the physical world.
0:28:49 We’re encountering completely new situations that we have never seen before in training, and now the VLM could come to our rescue, basically like release all this long tail.
0:28:53 And that is really kind of the discontinuity from before, right?
0:28:56 That is a tool that we have now that before was missing.
0:29:00 So that’s probably the most bold statement I can make right now.
0:29:02 Fair enough.
0:29:08 Do the traditional methods get baked into these models, or how do you go about combining them, the two approaches?
0:29:16 Yeah, what you could do, for example, is you can use kind of the traditional way, which is also not full of AI everywhere, right?
0:29:24 To kind of have a course simulation in 3D with solvers that we know, we know how to model certain effects, right?
0:29:30 So you can make that simulation, you can render it out, and that becomes a guidance to a world model.
0:29:32 You know, I take that as input.
0:29:32 Right, okay.
0:29:42 It’s kind of telling me, oh, I should be roughly here and here, and then it becomes much more feasible to create pretty pixels out of that, both in time and space.
0:29:44 So that’s kind of like what we’re thinking right now.
0:29:44 Yeah.
0:29:47 I’m speaking with Sonia Fidler.
0:30:06 Sonia is a vice president of AI research at NVIDIA, and we’re talking about the work that her spatial intelligence lab in Toronto has been doing, along with the evolution of AI and models and solvers in the mix and all the things that go into making these models more accurate so we can rely on them and trust them, Sonia, as you were saying.
0:30:15 And we’ve also talked a little bit about SIGGRAPH, and I mentioned at the top that you gave the keynote a special research address alongside some other NVIDians at this year’s SIGGRAPH.
0:30:22 What are some of the notable things from NVIDIA’s presence at the show that maybe we can impart to listeners here?
0:30:26 What are some of the things that they should take away from what NVIDIA did at SIGGRAPH?
0:30:34 Yeah, I mean, this year at SIGGRAPH, we really tried to send a message on physical AI in the keynote.
0:30:42 And the reason is because this is a really important area with big impact, and the SIGGRAPH community has a lot to give.
0:30:42 Yeah.
0:30:43 A lot to give.
0:30:45 A lot of expertise is already there.
0:30:49 The Gaussians, flats, nerves, I mean, that all comes out of that, right?
0:30:53 We discussed simulation is key, like I literally mean simulation is key.
0:31:00 So everyone in the audience should feel empowered to help us in this quest of robotics, you know?
0:31:04 And the cool thing is that it feels also early stage.
0:31:06 Like I said before, it’s open-ended.
0:31:08 We don’t kind of know yet, you know?
0:31:10 We are hypothesizing.
0:31:15 So we really hope the audience kind of connects with, here’s a new challenge for you.
0:31:15 Yeah, yeah.
0:31:19 How can you, you know, what do I do next, right?
0:31:25 And graphics is very mature, but here is a new challenge for you that maybe needs to, you know, think outside the box.
0:31:32 So I think the key to success, you know, we suspect will be the combination of Cosmos.
0:31:35 And so this is NVIDIA’s World Foundation platform.
0:31:40 And this is both video generation, so simulation of the world, as well as reasoning.
0:31:48 Reasoning about the laws of physics, reasoning about all the agents in the scene, the scenarios, and so on, and physics simulation.
0:31:59 So all these three pieces interacting together, you know, that’s our bet of creating really physically, but also semantically accurate simulations of the real world in the future.
0:32:05 So I think that’s really kind of like what I hope the Seagraph audience takes away from the keynote.
0:32:13 Yeah, that spirit of, I mean, it even goes back to what you said about when you met Jensen and he invited you to come work with him, right?
0:32:19 That spirit of collaboration, you know, open source being what it is, conferences, obviously.
0:32:33 But we’ve had guests across all different industries come on the show and talk about how important it is to, you know, share research and trade notes with other people who, on the other side of the world, working in other industries, et cetera.
0:32:43 As AI continues to touch and evolve and change, you know, virtually every industry you can think of, how important is that?
0:32:46 Or what are you getting from this experience of working across so many industries?
0:32:50 And does it feel like AI is kind of bringing industries together?
0:32:59 Or does it feel like, you know, different industries are kind of hunkering down and siloing in their own approach to how they use these immersion technologies?
0:33:03 Yeah, it’s definitely bringing them in, bringing them together, right?
0:33:06 Because the workflows essentially are very similar.
0:33:08 At some point, they’re very similar.
0:33:12 And the difference is the data and the expertise, domain expertise that’s different.
0:33:20 And actually, there is even sharing, you know, how I do autonomous driving versus humanoid robots versus factory simulation architecture.
0:33:38 There’s some commonalities between things that could be shared and, you know, having this kind of like data-driven approach to simulation could really bring industries together and benefit from one another and build tech that can essentially make all of us, all of these industries better.
0:33:40 And open source, you mentioned open source.
0:33:43 I am a believer in open source.
0:33:47 And it’s great to see NVIDIA is also a big believer in open source.
0:33:51 Like I said, you know, in a lot of areas, we’re also still early on.
0:33:56 And that’s the only way to keep progress going, you know, and really build up these capabilities.
0:34:09 Even though in a lot of ways, the timeframe that we’ve been talking about the last, you know, seven, eight years in particular isn’t all that long in AI terms and particularly in this recent, you know, kind of generative AI revolution, it’s a long time.
0:34:17 So you’ve been doing this, you know, almost since the beginning of early object recognition and AI kind of now through to everything we’re talking about today.
0:34:19 What’s next on the horizon?
0:34:32 And is there a breakthrough that you’re either sort of waiting for or, you know, maybe more secretly kind of thinking like, I think this is going to happen soon, whether it’s something that’s particular to your own work that you’re doing or kind of more broadly.
0:34:38 What’s the next big breakthrough in AI that you’re looking forward to or maybe just kind of hoping to see?
0:34:41 Well, I think it’s going to be robots.
0:34:41 All right.
0:34:45 So I started the story with my sister in a cart.
0:34:45 Your sister.
0:34:57 And me dreaming about a robot taking the dog out in the morning and my parents made me do and I really like to sleep in the morning.
0:34:59 That was kind of the early dream.
0:35:01 My grandma lived 30 years alone.
0:35:03 My grandpa died quite early.
0:35:15 So the first talks I gave as a faculty were all started with a grandma and a Wally cute like robot in a kitchen talking to each other, you know,
0:35:23 and robot helping where she’s kind of common thread of let’s build this technology because it can be really powerful and useful.
0:35:32 And I now believe after many years in this field that we’re likely going to see that in our lifetime.
0:35:32 Yeah.
0:35:39 Robots in some form, autonomous cars are already, you know, out there to some extent, right?
0:35:41 And more is coming.
0:35:45 So that’s that’s the breakthrough I’m looking forward to.
0:35:45 Yes.
0:35:49 Have you ever had a robot in your home with you for a period of time?
0:35:50 You ever lived with a robot?
0:35:54 I have a robot that kind of wipes the floor.
0:35:57 I would look to have something that does more than that.
0:36:05 So, Sonia, as we look to wrap up here, and this has been fantastic again, thank you for taking the time.
0:36:12 What advice would you give to researchers out there who are interested in the work the Spatial Intelligence Lab is doing,
0:36:18 might be interested in collaborating, working with you in some way, joining the lab, collaborating from afar?
0:36:28 And then in particular, what are some of the skills and research areas that you think are becoming increasingly important now and, you know, will continue to be at least for the next few years?
0:36:33 Yeah, actually, the bar that we have is both low and high.
0:36:34 So I’ll explain what I mean.
0:36:40 So I think what we are looking for is people with immense passion.
0:36:45 You know, I feel like I still haven’t lost the passion of the first day.
0:36:48 You know, I wake up and I am excited.
0:36:52 So I think the passion is what drives us forward, the energy.
0:36:54 This is only a podcast, but the passion comes through in your voice.
0:36:56 So yeah, I’d say you’re doing all right.
0:37:03 Yeah, I mean, that’s what drives us because, you know, as a researcher, life is not easy.
0:37:06 Most of the time things don’t work, you know, and that’s basically what we do.
0:37:09 It could be six months, a year where you don’t get that result.
0:37:15 So you’re really that kind of like, you know, passion and energy that makes you keep going.
0:37:21 I think kind of wanting and having the ability to go technically deep is very important.
0:37:29 You know, not jump from one thing to another as things go hard, but like, let’s learn really the fundamentals to the level we need so we can innovate.
0:37:35 And I guess like maybe to my first point, the high level of perseverance, right?
0:37:40 Like we want to keep going and there is no wall thick enough, right?
0:37:43 The rest, I think we can teach people.
0:37:48 Like if you have these basic things, a lot of the other stuff comes along.
0:37:48 Yeah.
0:37:52 So in terms of like, you know, it’s mostly also interest, right?
0:37:58 So we are very interested in this 3D world, modeling and understanding 3D worlds.
0:38:05 So people that are interested and shared passion for the same topic, you know, please contact us.
0:38:07 We would be ready to work with you.
0:38:08 Fantastic.
0:38:16 For listeners who want to learn more about the lab, about the work we’ve been talking about, where are the best places to go online?
0:38:17 Is there a homepage for the lab?
0:38:19 Is it on the NVIDIA site?
0:38:21 Any social media handles to follow?
0:38:23 Where would you send them?
0:38:24 It’s all on the NVIDIA website.
0:38:34 Probably if you search for any special intelligence lab NVIDIA or Toronto NVIDIA, which is our old name, it should pop out.
0:38:35 Yeah.
0:38:35 Fabulous.
0:38:37 Sonia, again, this has been great.
0:38:46 I think that the story, just imagining, you know, your dad telling you those stories as a kid and with your sister and then the advice your grandma gave you.
0:38:48 Overcome your fear, get out there.
0:38:49 Just absolutely fantastic.
0:38:56 Congratulations to you and all the team, everyone you work with for all the work you’ve been doing and SIGGRAPH, of course.
0:39:00 And we really look forward to following your progress in the future.
0:39:01 Best of luck.
0:39:01 Yeah, thanks.
0:39:03 It was really fun talking to you.
0:39:35 Best of luck.
0:39:40 Best of luck.
Sanja Fidler, VP of AI Research at NVIDIA, joins the AI Podcast to share her journey from early curiosity to leading the Spatial Intelligence Lab in Toronto. Sanja discusses her path through research and what drew her to the world of AI and computer vision. She explains her team’s work on spatial intelligence—teaching AI to understand and create in 3D—and how this research is helping make content creation and simulation more accessible for everyone. She also discusses how breakthroughs in simulation, 3D modeling, and vision language models are powering the future of robotics and autonomous systems. Learn more at ai-podcast.nvidia.com.



Leave a Reply
You must be logged in to post a comment.