What Open Source Teaches Us About Making AI Better – Ep. 278

Leave a Reply

AI transcript
0:00:16 Hello, and welcome to the NVIDIA AI podcast. I’m your host, Noah Kravitz. Right now, the
0:00:21 world is watching AI evolve faster than ever before. And that progress isn’t just being
0:00:26 fueled by technological breakthroughs in scale. It’s being fueled by human collaboration.
0:00:31 Open source models, open data sets, and shared research are giving developers, enterprises,
0:00:37 and governments the building blocks they need to innovate together. NVIDIA has been part of this
0:00:41 movement from the very beginning, contributing open libraries, publishing data sets and research,
0:00:47 and most recently, sharing families of open models. Which brings us to today’s episode.
0:00:53 We’re talking about Nematron, specifically unlocking the secret of Nematron. On the surface,
0:00:58 Nematron may look like just another open model family. But the real story is how it anchors
0:01:02 NVIDIA’s strategy for building accelerated infrastructure and driving increased adoption
0:01:08 of AI everywhere. Joining us to unpack this open secret are two of the leaders driving this work
0:01:13 forward. Brian Catanzaro is Vice President of Applied Deep Learning Research at NVIDIA,
0:01:19 and Jonathan Cohen is Vice President of Applied Research at NVIDIA. Brian and Jonathan are here
0:01:25 today to talk Nematron. I can’t wait. Gentlemen, welcome to the AI podcast. Thank you so much for
0:01:28 making the time to join us. Thank you for having us. Yeah, it’s great to be here.
0:01:34 So let’s start at the top, and I’ll direct this one to you, Brian, to get us going, if that’s all right.
0:01:40 What is Nematron? And as a follow-up, why did NVIDIA decide to build its own family of models
0:01:44 when you already work with essentially every major model builder out there?
0:01:53 Nematron is NVIDIA’s open technology for artificial intelligence. Nematron includes models that we
0:02:01 train. It also includes data sets that we release, as well as algorithms and methodologies. And our goal
0:02:08 with Nematron is to support the community in building customizable AI that can be integrated
0:02:14 deeply and tightly into the beating heart of every business around the world. Our second goal with
0:02:20 Nematron is to help NVIDIA design systems for deploying and constructing AI. There’s a lot of
0:02:29 questions about how AI works that touch the various design decisions that go into building NVIDIA’s
0:02:34 software and hardware systems. And we can answer those questions better because we build Nematron.
0:02:42 So, you know, ultimately, we’re excited to open up Nematron even further and continue to put it out
0:02:48 there for the community. We love learning from the community. Nematron is built in collaboration with
0:02:53 the community where we learn a lot from what others are doing in the community. And then we try to,
0:02:59 you know, contribute what we can back. We think that this is a great opportunity for NVIDIA to support
0:03:01 the AI industry.
0:03:06 Yeah, so Nematron is a collection of large language models. And it’s probably worth saying,
0:03:14 so they’re text models and multimodal LLMs. And we’ve kind of settled on like three sizes,
0:03:18 we think of them as weight classes. So we have smaller models that we call nanomodels.
0:03:24 We have medium-sized models we call supermodels. And then we have the largest frontier-sized models,
0:03:30 which we call ultra. So Nematron collectively refers to everything Brian said, and then this
0:03:32 family of models that we’re called.
0:03:40 So how does Nematron fit into NVIDIA’s broader AI strategy? Because from what I understand,
0:03:45 it’s not just, and I say just, the models are huge, but not just the models, but it’s kind
0:03:48 of a cornerstone for growing the ecosystem.
0:03:53 Yeah, well, you know, if you think of NVIDIA as an accelerated computing platform company,
0:03:58 and you asked the question, well, what does an accelerated computing platform mean in this age
0:04:06 of AI? So it includes chips, it includes networking, it includes software stack, but it
0:04:12 also includes the models. And, you know, when we think about what is a platform today, a platform is
0:04:17 all of those components. And if you’re building AI applications, and you care about the quality
0:04:21 of the models, but you also care about the performance. Like Brian mentioned, one of the
0:04:25 reasons that we train Nematron models is so we can learn, we are pushing the limits ourselves,
0:04:29 so that we can learn and make sure that our platform is the best. But it also means we can
0:04:35 do co-design, we can cooperatively design the model architecture, the software stack, and the hardware
0:04:40 across the whole, all of the hardware components all together. And we’ve been doing that. And that
0:04:45 gives us opportunities to make things more efficient, lower latency, higher throughput, more energy
0:04:51 efficient, by improving things across that entire stack up into the model architecture. And so
0:04:55 Nematron is a really important part of that strategy, as a accelerated computing platform
0:05:00 company, where our success comes from this full stack co-design and optimization.
0:05:08 Yeah. One thing I wanted to add to that is that these days, there are new things that are part of
0:05:13 accelerated computing that maybe people haven’t considered. So for example, data sets that we use for
0:05:20 pre-training and post-training models have a dramatic effect on how quickly the model converges. In fact,
0:05:27 you know, comparing different revisions of our Nematron pre-training set, we’ve accelerated
0:05:34 pre-training by a factor of 4x just by having a smarter pre-training data set, which means that
0:05:38 you can actually train a much smarter model with the same amount of compute.
0:05:45 Yeah. What makes one data set better, more optimized to help the model converge faster than
0:05:51 another? Well, what we’re trying to do with LLMs is build something intelligent that can help us solve
0:05:57 problems. They can answer questions. It can reason. It turns out that if you just take all of the
0:06:02 texts that humankind or computers have ever produced on the internet and train an LLM on it, that’s kind
0:06:08 of where the community started many years ago. But it turns out that’s not the most intelligent way of
0:06:15 building AI. Because a lot of that text isn’t adding very much intelligence. And so every organization that
0:06:21 builds LLMs spends an enormous amount of effort and compute in understanding their data set, refining it,
0:06:29 rephrasing it using synthetic data generation. And the effort that we put into these data sets has an
0:06:35 enormous impact on how quickly the models train. And also on the overall strength of the model once it’s
0:06:42 trained. And so these days, I believe that the data sets that we release as part of Nemotron are an
0:06:48 important part of NVIDIA’s accelerated computing efforts, because it’s not really possible to think
0:06:54 about how fast a system is for training. If you’re training on a data set, that’s not very smart, it’s
0:06:59 going to take an enormous amount of compute to get to the same amount of intelligence as if you were
0:07:05 training on a data set that that was much more polished. And so that’s kind of the genius of
0:07:12 accelerated computing is that we try to understand the problem from first principles. And we try to
0:07:20 optimize the entire stack end to end. And these days, it seems pretty important that NVIDIA’s
0:07:23 accelerated computing platform includes Nemotron.
0:07:23 Right.
0:07:31 I can give another example that’s interesting. So reasoning, the way these models reason is they
0:07:36 generate thinking tokens, right? You ask it a question, and then it generates a lot of tokens as
0:07:42 it thinks through the answer. And there’s very clear examples where you can generate a lot of tokens and
0:07:47 not actually make a lot of progress towards the answer, or you can be more efficient, generate fewer
0:07:51 tokens and make more progress. And again, from the same perspective of accelerated computing,
0:07:57 you don’t really care, you know, did it generate 10,000 tokens? Well, well, you do care, you care
0:08:01 if you can generate, you know, the same quality answer in 2000 tokens, instead of 10,000 tokens,
0:08:06 that’s a five X speed up. And so that’s also part of the accelerated computing story. So all of these
0:08:08 are opportunities we have to make things faster.
0:08:14 Yeah, exactly. Accelerated computing has never just been about how many arithmetic operations per second
0:08:19 you can perform. It’s really about what capabilities do provide.
0:08:26 Yeah. Yeah. And I think the key to NVIDIA’s historic success is, as a company, we’ve always focused very
0:08:31 much and had deep expertise on the actual end applications people care about. You know, whether it’s computer
0:08:37 graphics or hyperforms computing, or deep learning, or now modern AI, it’s really thinking about what’s the end
0:08:41 goal. And how do you, how do you build a platform that gets you that goal with the least amount of
0:08:43 time you have to wait, you know, lowest latency, highest.
0:08:49 You talked some about the openness, the collaboration and co-design. That’s such an important piece of
0:08:56 this open source, a big part of Nemetron. What does it mean? And maybe Brian, I’ll ask you this first,
0:09:02 but either of you, what does it mean to call Nemetron one of the most open AI development efforts that
0:09:11 we’ve ever seen? Well, we really think that it’s important for AI to be trusted and widely deployed.
0:09:17 And in order for that to happen, we think it’s important that enterprises have the option to
0:09:24 understand the data sets and the technologies behind AI and fine tune them for their own problems and then
0:09:30 integrate them very tightly into the software and systems that they use to solve problems for their
0:09:37 markets. We think it’s AI is not a one size fits all solution. And we’ve seen in the past, you know,
0:09:44 many instances of when open platform technologies really allow different industries to differentiate
0:09:51 different solutions for the problems they face. You know, for example, the internet as an open technology
0:09:56 had really different implications for different industries like healthcare versus retail, you
0:10:01 know, the way that those organizations use the internet to change the work that they do was,
0:10:07 was quite different. But the fact that the internet was an open technology allowed many companies,
0:10:14 many industries to, to think about solving their problems in a new way using the internet. And when we
0:10:19 think about AI, it’s, it seems obvious that enterprises need that ability as well. You know,
0:10:26 the world’s most important and valuable data always has the most sensitivity about it. And so we think
0:10:34 it’s important to support enterprises as they learn how to deploy AI, that they can do it in a way that
0:10:41 respects their work, their privacy, the important ways that they go about problem solving in sort of a
0:10:48 unique way for their business. And so we think that, that it’s really important that there exists an open
0:10:55 foundation for organizations around the world to build and deploy AI. And NemoTron is, is how we’re
0:10:56 contributing to that.
0:11:01 I can just add one, one thoughts that from the perspective of accelerated computing, if you think
0:11:06 about, you know, we come up with some way to make a chip faster, how, how does the world consume the
0:11:11 benefits of that, you know, acceleration? Well, in the case of a chip, you buy a chip and you, and you get
0:11:18 the benefits. But what if we come up with a technique that makes models more efficient at thinking or a
0:11:24 data set mix that, that saves you time and training? How, how does the rest of the world receive the
0:11:29 benefits of that? Like in what form do you package it? And I think the answer, the only answer is we have
0:11:34 to teach everyone what we did by sharing it through open source, you know, open, open weight models,
0:11:40 sharing the data sets, explaining how they work, sharing the algorithm. So I think it’s natural
0:11:45 that open source is a delivery mechanism for the technology that’s going into our platform.
0:11:53 So from a little bit of a, um, uh, hypothetical, um, I’m an IT leader or a business leader at an
0:11:58 organization and I’m hearing what you guys are saying, and we want to do this. And we have
0:12:03 specialized needs in our industry and we have troves of our data that kind of represent company
0:12:08 intelligence and our special way of, of doing things that has brought us success. And we’re ready to
0:12:14 embrace the AI age and transform. We could use Nemetron and, and, and I’m going to walk through
0:12:19 this and point me when I get this wrong, we could use Nemetron to take an open source model
0:12:26 and we could customize it, train it on our company data and the rest of the industry data things to
0:12:32 help it understand what we do and the problems we’re trying to solve in our industry. Nemotron
0:12:38 could help, uh, we could add reasoning capabilities and, and that sort of thing to it. And then we have
0:12:42 a kind of, and I don’t want to misuse the term sovereign. And if you guys want to talk about
0:12:49 sovereign AI, but we would then have our own sort of customized, adapted to our business,
0:12:54 our industry, the way we do things, our data, and it’s ours because we took an open model,
0:12:57 we trained it. And so we don’t have to worry about the sensitive data
0:13:03 being out in some commercial model somewhere or, or what have you, because it’s our model now.
0:13:07 I think that’s one aspect close. Yeah. That’s one. I mean, there’s many aspects. So, so, so for
0:13:13 example, if you say, you know, NVIDIA trains a model, a Nemetron model, and it’s great, but since you’ve
0:13:16 disclosed all your training data and look at your training data, for whatever reason, we have some
0:13:22 policies where this data we can’t use. And we can say, that’s fine. Everything you need to reproduce
0:13:27 what we did is there. You can train your own model, excluding that data. Or you say, well, I like the
0:13:32 data, but the mix is wrong. I don’t know. I’m a sovereign project and it really needs to be very
0:13:37 good at speaking this language and understanding this culture. And that data wasn’t as represented in your
0:13:42 training set as, as I want it to be. Everything that we did is transparent. And so you can make these
0:13:47 modifications yourself. I mean, that’s one aspect. Fantastic. Yeah. Right. NVIDIA has released
0:13:53 data sets, recipes, alignment techniques alongside the models. So along these same lines of building
0:13:58 trust and transparency, why is all of that important? Why is this full level of transparency
0:14:03 important for the end users, you know, to be able to customize and deploy safely?
0:14:12 Well, I think ultimately, if you don’t know what’s in a technology, it’s harder to trust it. And every
0:14:17 business has different ways of thinking about the problems they’re solving. They have different problems.
0:14:22 And I think it’s important as we get more sophisticated about deploying AI and we integrate
0:14:29 it more tightly into business problems around the world for businesses to be able to, to inspect,
0:14:35 you know, how is this AI built? And, you know, therefore I can build trust that it’s going to
0:14:40 help my business solve problems. You know, the integration is a, is a really important point as
0:14:47 well. So with Nemotron models, there’s a really broad spectrum of integration. You can run it
0:14:53 locally on a machine without any internet. You could also run it through an API in the cloud and
0:15:00 everything in between. You can deal with your business’s sensitive data using the same data
0:15:07 management and security protocols that your business already has. And I think for a lot of applications
0:15:11 of AI, that level of customizability and introspection is going to be essential.
0:15:19 I also want to say that I think there’s a real big benefit to open technologies in the sense that
0:15:27 they tend to develop faster. So Nvidia believes that helping AI grow creates opportunity for us. And we
0:15:32 think that one of the best ways of helping AI grow is to contribute in an open way to the community.
0:15:41 I think when you consider a technology that’s being developed kind of independently by a few
0:15:46 different organizations, but they’re not able to share very much about what they’re doing,
0:15:51 there’s obviously going to be a lot of reinvention that has to happen and the progress is going to be
0:15:59 slower. And so if we are able as a community to come together, you know, to contribute ideas,
0:16:06 data models to each other and learn from each other, I think that will progress faster. And, you know,
0:16:11 we’ve seen that over the past couple of years as various organizations have been contributing
0:16:18 to the open technologies for AI. It’s really helped the community move forward. And, you know, like,
0:16:25 for example, open AI just released GPT OSS. That was a fantastic thing for the field. Alibaba has been
0:16:32 doing some great work with QN models. Obviously, that is family of llama technologies has been extraordinarily
0:16:40 helpful to the field to help the field grow and develop. And at Nvidia, we know that when AI grows,
0:16:44 it’s opportunity for everyone, it’s opportunity for businesses that they can solve new problems.
0:16:48 And it’s opportunity for us because we work with every business that’s building AI.
0:16:52 Yeah, I like I mean, a good example of that playing out is our own research groups.
0:16:56 Often will you like if you have some idea for a way to improve a model,
0:17:00 we often will just take one of the existing open weight models, not necessarily Nemo Tron,
0:17:06 that sort of gives you the best vehicle for trying out your idea, right, prove it in some way and
0:17:10 publish a paper release result, right? So so like, we are building on all the work from
0:17:13 these other organizations that release open weight models all the time as well.
0:17:18 And that’s, you know, this is no news to you guys, or probably many listeners of the show,
0:17:23 but that same sentiment has been echoed so many times over, I mean, over the past couple of years
0:17:28 in particular, by guests we’ve had from all industries and walks of research in life. And you know that
0:17:32 the more we’re collaborating, the faster we move as a whole.
0:17:33 Yeah.
0:17:38 Our guests today are Brian Catanzaro and Jonathan Cohen. They’re both from NVIDIA. Brian is Vice
0:17:44 President of Applied Deep Learning Research, while Jonathan serves as Vice President of Applied Research.
0:17:51 And they’re here talking to us about NVIDIA Nemo Tron, a family of open models and open technology.
0:17:58 We’ve been talking about the importance of open, open technologies to the AI community in general,
0:18:04 to NVIDIA, the learning that goes into informing really the whole stack, the hardware, the models,
0:18:10 the software, the connectivity, networking, everything. And the data sets, as Brian was talking about,
0:18:15 and how it all really comes together to make things advance faster and more efficiently,
0:18:20 sort of broadly speaking. Nemo Tron has been a huge effort at NVIDIA with many teams working
0:18:26 together, they still are, to bring this to life, from advanced research to commercially licensed models
0:18:32 and data sets now. Can you guys talk about the pipeline from research to production models,
0:18:37 what that’s like, what it’s been like for Nemo Tron? Well, it is a huge effort and it takes a lot of
0:18:44 people with different talents coming together to build Nemo Tron. We’ve organized the project around
0:18:49 basically the different stages of development that a model has to go through, pre-training,
0:18:55 post-training, alignment, and so forth, as well as different functional areas, like for example,
0:19:04 long context recall or image understanding. So within each of these areas, we have multiple teams working
0:19:11 together, some of which are very researchy, very theoretical, and others are very engineering focused,
0:19:17 and then whole spectrum in between. I would say it’s a great honor to be part of a project where people
0:19:24 are coming together to build something like this. It’s also a big challenge, trying to get so many
0:19:31 brilliant minds pointed in the same direction. I think that’s one of the central challenges facing
0:19:36 every AI development effort around the industry these days is how do we work together to build one amazing
0:19:42 thing as opposed to building a hundred small things. And that’s really something that’s been inspiring to
0:19:50 watch come together. Yeah, if you compare it with a large-scale software effort, there’s this famous
0:19:57 observation called Conway’s Law, which is the communication patterns that are observed within a
0:20:02 piece of software tend to mirror the communication patterns of the organizational structure that build
0:20:08 that software. And training a model is like, I mean, Conway’s Law is definitely an issue, but it’s just
0:20:12 a very different endeavor. It’s not like I build a module and you build a module and we have a nice,
0:20:20 clean interface. Somehow, all of these things have to get combined together, you know, image,
0:20:24 Brian’s example, image understanding and long context recall. Somehow, I’ll have to get combined together
0:20:31 into a single training recipe and a single dataset mix. And so the modularity is, I think, less.
0:20:37 less than in software engineering. And so this idea that you can just decompose it and have lots of
0:20:42 teams with sort of clean interfaces between them doesn’t really work as well. And so I think there’s
0:20:46 a real struggle in scaling up an effort like this to a very large team to do something really big.
0:20:48 Is there a new paradigm emerging?
0:20:49 It’s an interesting question.
0:20:50 You know, organizing, yeah.
0:20:55 I wonder, you know, over the next five, 10 years, there’ll be some new law named after someone and some
0:21:00 you know, management principle here. It’s an interesting thing that we’ve certainly been
0:21:03 thinking about. But it does present these challenges. I think one of the most important
0:21:10 principles that we’ve kind of settled on is you just need a lot of internal openness and transparency.
0:21:15 You have to solicit ideas or a lot of people across the company and outside of the company working on all
0:21:20 these problems. You have to solicit all these ideas and you have to encourage them all to work together.
0:21:27 That’s the only way forward. And so that just takes a very like mature culture and and, you know,
0:21:34 good leadership and ego lists, you know, operation and everyone being really motivated by
0:21:36 at the end of the day by the work.
0:21:42 I would say also that one of the amazing things about AI is that it’s such a general technology
0:21:46 that it really changes the way that we do
0:21:55 AI, you know, it used to be like 20 years ago, when I was a grad student, that it was common for
0:22:02 people to build state of the art models in computer vision on their own, like one graduate student on
0:22:07 their own to build a model that’s that was state of the art in some important area of computer vision.
0:22:15 And, you know, that’s kind of how we were trained as as PhD students is like, go be brilliant on your own.
0:22:24 Well, with with modern AI, the best results come from using industrial scale equipment and, you know,
0:22:30 general models that can then be taught how to solve important problems. But that requires working
0:22:36 together. So one of the first things that AI has changed is the development of AI itself,
0:22:41 and organizations that can figure out how to collaborate and work together succeed. And,
0:22:47 you know, that’s one of the reasons also that we really believe in Nemo Tron as an open project is
0:22:54 because we’ve seen how openness internally has made it possible for us to solve whole classes of new
0:23:01 problems with AI. We believe that as Nemo Tron and other open efforts come together, bring together more
0:23:06 ideas and more force to bear on the development of AI that the results will be stronger.
0:23:11 Jonathan, NVIDIA has a history of building end-to-end products, you know,
0:23:17 self-driving comes to mind, gaming, of course, super pods, but then disaggregating them for the world
0:23:22 to use. Does Nemo Tron follow that same pattern in your mind? And if so, how?
0:23:27 Yeah, I think so. I think when we talk about that, and Jensen talks about this a lot, you know,
0:23:33 what we mean is our solution, but the things ultimately that we build are very complicated,
0:23:39 integrated systems with many layers and many components. And on the one hand, we need to build
0:23:43 the whole thing ourselves because it doesn’t work unless you build the whole thing yourself. So we
0:23:47 need to train a whole model at the end of the day. You know, it doesn’t make sense for us to release,
0:23:55 like, I don’t know, a way to make a reasoning recipe without actually training a model to do reasoning,
0:23:59 you know, like you have to do these things and put the whole thing together. But at the same time,
0:24:05 I think it’s very important that we put all of the components into the ecosystem and allow people to
0:24:10 consume the parts that they want and not consume the parts that they don’t want. So this is how our
0:24:16 hardware is, you know, we design data center scale computers at this point, but we don’t sell it as
0:24:21 a single data center. We design the whole thing, we build the whole thing, then we chop it up into
0:24:28 pieces and we sell it through, you know, normal sales channels and, and people, our customers are free to
0:24:32 take the parts they want, replace, you know, and so it’s truly an ecosystem. You know, if you don’t like
0:24:36 the way you don’t like our CPU, use a different CPU. You don’t like the storage, use a different
0:24:40 storage. You don’t like this networking, use a different networking. And we’re, we’re
0:24:44 open and interoperable with all these things. And it’s, it’s a tremendous engineering challenge to
0:24:49 work that way. But I think it’s why we’ve been so successful is because it allows us to
0:24:55 harness the power of like the entire computing industry, because we’re not really locking anyone
0:25:00 out at all, right? We’re including everybody. And so when we think about large language models,
0:25:04 I guess we’re thinking in the same way. So we’re going to develop techniques and anyone is free to
0:25:08 take them. You know, other companies that train large language models for a living are free to take
0:25:11 anything we built. They probably won’t take all of it, but they’re free to take anything.
0:25:15 They want to take the software. That’s great. They want to take some of our data sets. That’s great.
0:25:20 They want to take the software and the data sets and some of the training recipes, but modify them.
0:25:23 That’s great. They want to take the finished models. That’s great. So, so in that sense, I think
0:25:28 philosophically, that’s, that’s absolutely how we think about products. That we think about
0:25:31 hardware, how we think about software. And it’s, it’s now how we also think about foundation models.
0:25:38 And I think that’s one of the things that makes NVIDIA unique as a big tech company is that although
0:25:46 we do full stack and, and integration, uh, we don’t dictate to our customers how that technology
0:25:52 is going to be deployed or used. We know that it’s not a one size fits all problem or even assembled.
0:25:58 Right. And so we’re, we’re happy to support companies of all shapes and sizes in every industry,
0:26:05 um, develop and deploy AI. And, uh, because NVIDIA has this orientation, the supportive orientation,
0:26:11 where we, we understand that it’s not one size fits all that actually is, is the secret to why
0:26:17 we collab, we are able to collaborate with all of these companies. And, um, we want to do that with
0:26:23 AI technology as well. Kind of switching gears a little bit, but still talking along technical lines.
0:26:29 Can you share any exciting technical breakthroughs, uh, that came about during the Nematron development
0:26:36 process and what they might mean going forward, uh, specifically in terms of efficiency and deployment,
0:26:42 um, but really take it as broad as you like. Yeah. Well, NVIDIA is thinking about, uh,
0:26:48 AI from an accelerated computing perspective. And we have a belief that the faster we can make a model,
0:26:54 the smarter it’s going to be. And this follows just because clearly, uh, if, if we’re able to think
0:26:59 quicker, then we can get more thoughts in the same amount of time that can help us solve problems,
0:27:04 you know? So we’re bringing this perspective of accelerated computing, uh, to AI and in kind
0:27:10 of a unique way. A couple of things just from the past few months that we’ve demonstrated that
0:27:16 I’m really excited about. One is, um, we released, uh, a model, we call it Nematron Nano V2.
0:27:22 It is a hybrid state space model. So it’s not a pure transformer model, but it uses this other
0:27:27 technology for reasoning over sequences called a state space model that has some pretty big efficiency
0:27:34 benefits, you know, on the same hardware compared with other models of the same intelligence. We’re about
0:27:39 six to 20 times faster. And, uh, you know, we’re, we’re pretty excited about, uh, you know, the,
0:27:45 the capabilities of, of this model, but it’s just the beginning. You know, we, we have really ambitious
0:27:51 plans to continue evolving, um, the architectures behind, uh, Nematron, as well as the systems that
0:27:58 are used to build and deploy it. Another thing that, uh, we were able to show recently is we trained a
0:28:05 nemotron model using, uh, four bit floating point arithmetic, and we’re able to get world-class results,
0:28:12 which is really exciting because, uh, using only four bits per parameter of the neural network
0:28:19 can be dramatically more energy efficient than using other, uh, representations. And we know that the
0:28:25 development of AI is going to be constrained by the efficiency with which we can train it and deploy it.
0:28:31 And so, uh, showing people new algorithms that are more efficient, uh, then is going to help,
0:28:37 uh, push the industry forward. And, you know, it’s, it’s not enough to say, Hey, I’ve got the system.
0:28:42 It’s really fast at low precision arithmetic. If no one understands how to use it. Right. Uh,
0:28:48 so nemotron, uh, is our way of demonstrating, um, to the community, like, Hey, you can take advantage
0:28:53 of this amazing, uh, low precision hardware to train a world-class model. If you follow this algorithm,
0:28:59 right. It’s amazing that four bits is enough. Like if you just think about how little,
0:29:03 how low resolution that is, the fact that that works is, is, is pretty incredible.
0:29:09 So maybe can you rephrase for folks who might be listening and myself included who don’t fully
0:29:15 get the ramifications of what doing four bit arithmetic and, and these results really mean?
0:29:22 Well, one fun analogy from my childhood, uh, comes from video games. I don’t know if you remember
0:29:27 the eight bit. That’s, I mean, yes, of course. Then there was the 16 bit Nintendo system. And it was
0:29:33 like, wow, there’s so many more colors with the 16 bit Nintendo. It’s like, wow, the, the, you know,
0:29:38 look at that smooth gradient, right? So if you, if you only have eight bits, you can represent 256
0:29:46 numbers with 16 bits. You can represent about 65,000 numbers with four bits. You can represent 16.
0:29:52 Right. So it’s a very, very small amount of options to pick from. Like if you’re going to draw a picture
0:29:58 using four bit numbers, it’s actually going to be pretty hard to make it look smooth. Right. Of course,
0:30:03 what we’re doing with our, our four bit training hardware and software isn’t as straightforward as
0:30:09 just using exactly one of 16 numbers for every parameter in the neural net. They, they actually
0:30:15 come in blocks. The blocks have scaling factors attached to them in hierarchical ways. And that’s
0:30:20 all accelerated by software and hardware that we’ve built in transformer engine and in, in our Blackwell,
0:30:26 uh, GPU generation. And so, uh, it’s kind of amazing that, you know, we’re, we’re able to take this raw
0:30:34 material that’s very coarse and rather, uh, small and we’re able to make it flexible enough, uh, to
0:30:39 train a world-class neural network. Right. But on some level, I always like to think of this as like,
0:30:45 you can have any number you want as long as it’s one of these 16 and somehow, you know, it still works.
0:30:52 It is pretty miraculous. Yeah. Amazing. As we wrap up the conversation, but look ahead to the future,
0:30:58 um, of Nematron. What can developers and enterprises expect next? We’ve talked, you’ve talked about it a
0:31:02 little bit, some of the things coming through the pipeline and that you’re working on, but what can
0:31:08 devs and enterprises expect from Nematron? And, you know, perhaps more importantly, how can they start
0:31:13 to engage with Nematron right now? Well, I can just say, you know, you should expect us to train some big
0:31:18 models. Um, we’ve trained recently some smaller models. We’ll, we’ll, we’ll be training some bigger
0:31:23 models. You can expect us to incorporate a more multimodal technology. We have, uh, from NVIDIA,
0:31:28 we have some of the world’s best, well, I guess the world’s best open weight, um, speech recognition
0:31:33 models at this point. That technology hasn’t really been incorporated into Nematron and we’re working
0:31:38 towards adding audio and these kinds of capabilities. So I think there’s a lot of just really cool technology.
0:31:44 We’re working on really bringing all of the best technology across NVIDIA and concentrating in
0:31:49 Nematron. I think that’s, you know, that’s something people can look forward to. Or Brian,
0:31:54 what you would say? Well, yeah, I would also reinforce, um, how important reasoning is to
0:31:59 Nematron. It’s been a core part of Nematron development for the last year. And we were super
0:32:05 proud, uh, that we were able, for example, to take Nematron reasoning and add it to Meta’s llama family.
0:32:10 We know that there’s a lot more work to do to make reasoning even stronger. And we’re really
0:32:16 excited, uh, to do that. Brian, John, this has been great, uh, really informative conversation,
0:32:21 but just to hear the two of you talk about Nematron from the inside out, just, just a treat.
0:32:27 So for folks who are listening and want to get started, uh, with Nematron, the models are available
0:32:31 now. Yeah. So our models are available on Hugging Face. You can download them.
0:32:37 Perfect. You can also experience in all of them on build.nvidia.com, uh, and download them there as
0:32:43 well. Excellent. We do have a landing page on nvidia.com for Nematron, and we’re busy filling
0:32:49 it out right now, gathering all of the Nematron content together in one place. So I, I would go there.
0:32:54 Excellent. And work in progress, I’m sure is the content, like the technology itself evolves and
0:33:00 evolves. Again, uh, John, Brian, both of you, I know, uh, tremendous amount on your plate with
0:33:05 Nematron and everything else. So we appreciate the hour to come on and, and, uh, you know, help shout
0:33:09 from the rooftops, tell the world about all the fantastic work you and your teams have been doing.
0:33:16 Congratulations and all the best going forward. Um, you know, as you said, not just inside of NVIDIA, but
0:33:20 collaborating with the community and, uh, working to raise all the boats together.
0:33:21 Thanks for having me. Thanks, everyone.
0:33:52 Thank you.
0:33:54 Thank you.
0:34:10 Thank you.

Learn how NVIDIA’s Nemotron family of open source models is redefining accelerated computing. NVIDIA’s Bryan Catanzaro and Jonathan Cohen discuss the breakthroughs in efficiency, openness, and collaboration — sharing how Nemotron empowers developers and enterprises to innovate, customize, and trust AI for every industry.

Browse the full AI Podcast archive: ai-podcast.nvidia.com

The AI PodcastThe AI Podcast
0
Let's Evolve Together
Logo