How OpenAI Builds for 800 Million Weekly Users: Model Specialization and Fine-Tuning

AI transcript

🕒

Việt

中文

0:00:02 We want ChatGPT as a first-party app.
0:00:06 First-party app is a really great way to get 800 million wows or whatever now.
0:00:07 Tenth of the globe, right?
0:00:09 Yeah, yeah, 10% of the globe uses it.
0:00:10 Every week, every week.
0:00:14 Yeah, even within OpenAI, the thinking was that there would be like one model that rules them all.
0:00:15 It’s like definitely completely changed.
0:00:19 It’s like coming increasingly clear that there will be room for a bunch of specialized models.
0:00:21 There will likely be a proliferation of other types of models.
0:00:25 Companies just have giant treasure troves of data that they are sitting on.
0:00:28 The big unlock that has happened recently is with the reinforcement fine-tuning.
0:00:33 With that setup, we’re now letting you actually run RL, which allows you to leverage your data way more.
0:00:37 OpenAI sells weapons to its own enemies.
0:00:43 Every day, thousands of startups build on OpenAI’s API, many trying to compete directly with ChatGPT.
0:00:45 It’s the ultimate platform paradox.
0:00:48 Enable your competitors or lose the ecosystem.
0:00:51 Sherman Wu runs this high-wire act.
0:00:57 He leads engineering for OpenAI’s developer platform, the API that powers half of Silicon Valley’s AI ambitions.
0:01:04 Before OpenAI, he spent six years at Opendoor teaching machines to price houses where a single wrong prediction could cost millions.
0:01:10 Today, Sherwin sits down with A16Z general partner Martin Casado to explore something nobody expected.
0:01:14 That the models themselves are becoming anti-disintermediation technology.
0:01:16 You can’t abstract them away.
0:01:22 And every attempt to hide them behind software fails because users already know and care which model they’re using.
0:01:25 It’s changing everything about how platforms work.
0:01:36 Sherwin and Martin talk about why OpenAI abandoned the dream of one model to rule them all, how they price access to intelligence, and why deterministic workflows might matter more than pure AI agents.
0:01:40 Sherwin, thanks very much for joining.
0:01:42 So we’re being joined by Sherwin Wu.
0:01:47 It’d be great, actually, if you provided the long form of your background as we get into this, just for those that may not know you.
0:01:51 I mean, I view Sherwin as one of the top AI thought leaders, so I’m really looking forward to this.
0:01:51 Yeah, yeah.
0:01:52 Thanks for having me.
0:01:53 I’m really excited to be on the podcast.
0:01:57 Yeah, so a little bit more of my background, so maybe we can start from present day and go backwards.
0:02:01 So I currently lead the engineering team for OpenAI’s developer platform.
0:02:03 So the biggest product in there, of course, is the API.
0:02:06 Is there more for the developer platform than the API?
0:02:07 It’s kind of assumed that it’s synonymous.
0:02:11 Well, so I also think about other things that we put into our platform side.
0:02:15 So technically, our government work is also like offering and deploying this into different areas.
0:02:16 Yeah, like I’ve talked about.
0:02:18 Oh, like so you have like a local deployment.
0:02:18 Yeah, yeah.
0:02:20 So we actually do have a local deployment.
0:02:20 I didn’t know that.
0:02:22 At Los Alamos National Labs is super cool.
0:02:23 I went to visit it.
0:02:25 It’s very different than what I’m used to.
0:02:28 But yeah, in a classified supercomputer with our model running there.
0:02:29 So there’s that.
0:02:31 But like mostly at the API.
0:02:32 Did you go to Los Alamos?
0:02:33 We did.
0:02:34 Yeah, I did go to Los Alamos.
0:02:34 It’s great.
0:02:35 They showed us around.
0:02:37 They showed us some of the historic sites.
0:02:39 I just work at Livermore, man.
0:02:40 So I’ve got like an image.
0:02:41 Oh, yeah, yeah, yeah.
0:02:42 My first time out of college, so.
0:02:43 Right, right, right.
0:02:44 Maybe you sell to them next.
0:02:45 Yeah, well, we hope to.
0:02:46 Yeah, so I work on the developer platform.
0:02:49 I’ve been working on it for around three years now.
0:02:50 So I joined in 2022.
0:02:54 I was basically hired to work on the API product, which at the time was the only product that
0:02:54 OpenAI had.
0:02:57 And I’ve basically just worked on it the entire time.
0:03:00 I’ve always been super interested in the developer side and kind of like the startup story of this
0:03:00 technology.
0:03:03 And so it’s been really, really cool to kind of see this evolve.
0:03:05 And so that’s my time at OpenAI.
0:03:08 Before OpenAI, I was at OpenDoor for around six years.
0:03:10 I was working on the pricing side.
0:03:11 My general background before.
0:03:12 Something is such a dissident.
0:03:13 Yeah, yeah, yeah.
0:03:15 Pricing at OpenDoor to like running API.
0:03:16 It’s such a different.
0:03:19 It’s been fascinating, actually, for me to see the differences between the companies.
0:03:21 Like, they’re run so differently.
0:03:23 They both have Open in the name, so there’s some overlap.
0:03:24 But that’s pretty much it.
0:03:27 But yeah, I was there for around six years working on the pricing team.
0:03:29 So our team basically would run the ML models.
0:03:32 This is actually pricing the assets on OpenDoor.
0:03:33 Yeah, yeah.
0:03:33 The inventory.
0:03:34 Exactly.
0:03:36 So yeah, OpenDoor would buy and sell homes.
0:03:40 And their main product was buying homes directly from people selling them with all cash offers.
0:03:43 And so my team was responsible for how much we would pay for them.
0:03:46 And so it was a really fun, like, ML challenge.
0:03:49 It had a huge operational element to it as well, because not everything was automated, obviously.
0:03:52 But it was a really fascinating technical challenge.
0:03:57 Is there any sense of that on the API side, like, GPU capacity buying, or is it just totally unrelated?
0:04:01 On the API side, there is a small bit of, like, how we price the models.
0:04:05 But I don’t think we do anything as sophisticated as OpenDoor.
0:04:07 OpenDoor is just, like, such a hard problem.
0:04:09 It’s, like, such an, like, expensive asset.
0:04:11 The holding costs are very expensive.
0:04:13 You’re, like, holding onto it for, like, months at a time.
0:04:14 There’s, like, a variability in the holding time.
0:04:17 And that’s a long tail of potential things that could go wrong.
0:04:18 Long tail, yes.
0:04:20 And, like, you, like, try to think about it from a portfolio perspective.
0:04:24 And, like, if one of them just, like, you’re holding on it for two years, it blows everything, like, goes negative.
0:04:26 So it’s a very, very different challenge.
0:04:26 Six years?
0:04:27 Different challenge, yeah.
0:04:28 Yeah, six years there.
0:04:28 Wow.
0:04:29 Lots of unknowns.
0:04:31 Saw a lot of the booms.
0:04:31 Saw a lot of the struggles.
0:04:33 And then we IPO’d for a lot.
0:04:36 But, yeah, just in general, it was a very great experience.
0:04:44 I think, for me, it was also just had such a very, like, business operations and, like, a very, like, by-the-book type of culture, whereas OpenAI is, like, very different.
0:04:49 What’s so interesting, I was just thinking about it now, it’s, like, even for a company like that, like, you don’t think about it as a tech company.
0:04:52 But if there is a deep technology problem, it actually is the pricing, right?
0:04:52 Yes.
0:04:53 It’s actually an ML problem.
0:04:54 Yeah, that’s what attracted me to the company.
0:04:55 It’s not, like, the website.
0:04:55 It’s not the platform.
0:04:56 Yeah, yeah, yeah, yeah.
0:04:56 It’s not the platform.
0:04:57 It’s not the API.
0:04:58 It’s literally that.
0:04:59 Yep, yep, yep.
0:05:00 And that’s what attracted me to it.
0:05:01 I think that’s what was interesting.
0:05:06 It’s also a way, like, lower margin business than OpenAI, because you’re, like, making a tiny spread on these homes.
0:05:07 Yeah, right.
0:05:09 They would talk about, like, basis points, like, eating bits for breakfast and all that.
0:05:10 Yeah, yeah.
0:05:12 Anyways, I was at Opendoor for around six years.
0:05:16 And then before that was my first job out of college, which was at Quora, at MDM’s from London.
0:05:16 No kidding.
0:05:18 Yeah, so I was working on the news feed.
0:05:21 So worked on news feed ranking for a bit, worked on the product side.
0:05:26 That was actually my first exposure to, like, actual ML and industry and learned a lot from the engineers at Quora.
0:05:28 We basically hired a lot of the early feed engineers from Facebook.
0:05:29 Was Charlie still there when you were there?
0:05:32 Charlie was not there when I was there.
0:05:32 Okay, so you, like, right after you left.
0:05:33 Yeah, yeah, yeah.
0:05:35 And that was a really legendary team.
0:05:38 It’s still known to be kind of this super iconic founding team.
0:05:38 Yeah, yeah.
0:05:40 The early founding team was really solid.
0:05:44 But I still think that even while I was there, I still, like, am amazed at the quality of the talent that we had.
0:05:44 Phenomenal.
0:05:47 I think there’s, like, when the company was, like, 50 to 100 people.
0:05:50 But, yeah, like, a bunch of the perplexity team was there.
0:05:51 Dennis was on the feed team with me.
0:05:51 Yeah, yeah.
0:05:52 Johnny Ho, Jerry Ma.
0:05:53 Yeah, that’s right.
0:05:54 This is crazy.
0:05:59 And then Alexander, the scale, now MSL, you know, was there between high school and college.
0:06:00 It was an incredible team.
0:06:02 I think I kind of took it for granted when I was there.
0:06:02 Yeah.
0:06:03 I was a good group.
0:06:04 How did you get to Quora?
0:06:05 What did you study in undergrad?
0:06:08 Yeah, so before that, I was at MIT for undergrad.
0:06:13 I studied computer science, did, like, one of those, like, computer science and the master’s degree, kind of, like, crammed it in.
0:06:17 I ended up at Quora because I got in what we call an externship there.
0:06:20 So at MIT, you actually get January off.
0:06:22 So there’s, like, the fall semester and then January’s off.
0:06:22 That’s cool.
0:06:24 And then you have the spring semester.
0:06:26 And so it’s called independent activities period.
0:06:27 So some people just, like, take classes.
0:06:28 Some people just do nothing.
0:06:31 But some people will do, like, month-long internships.
0:06:34 And some crazy companies will offer a month-long internship to a college student.
0:06:35 Yeah, yeah.
0:06:37 And it really is just kind of, like, a way to get people in.
0:06:39 Did you come out here from Boston?
0:06:39 Yeah, yeah.
0:06:40 Kind of that way.
0:06:40 Yeah, it was crazy.
0:06:42 So you had to apply.
0:06:45 I remember, yeah, this is, I think, 2013, January or something.
0:06:45 You had to apply.
0:06:48 And I remember the Quora internship was the one that just paid the most.
0:06:49 They paid, I think it was, like, $8,000, $9,000.
0:06:51 And I was like, wow, that’s, like, all for a month.
0:06:53 And you’re just, like, kind of ramping up, like, half the time.
0:06:54 I can eat for a year.
0:06:55 Yeah, yeah.
0:06:56 As a college student, it was like, great.
0:06:58 And, yeah, they would kind of, like, fly you out here.
0:07:00 So I did the interviews and then luckily got an offer.
0:07:02 And so, yeah, I came out for January.
0:07:04 That was right when they moved into their new Mountain View office.
0:07:08 And I basically, yeah, honestly just ramped up for, like, two weeks
0:07:11 and then have two weeks of good productivity working on the feed team.
0:07:14 So that was that, like, user-facing, like, user-facing product work?
0:07:18 Yeah, I distinctly remember my externship project for those two weeks
0:07:20 was just to, like, add a couple features to our feature store.
0:07:22 And that would make its way into the model.
0:07:26 I remember my mentor there was Tudor, who’s now running, I think,
0:07:27 it’s called Harmonic Labs.
0:07:28 Yeah, yeah, yeah.
0:07:29 Crazy team.
0:07:29 Crazy team.
0:07:33 I mean, by the way, I think it’s one of the untold stories of Silicon Valley
0:07:35 is, like, how good that original team at Quora is.
0:07:37 I mean, a lot of them are still there and still good.
0:07:39 But the diaspora from Quora is everywhere.
0:07:40 Yeah, yeah.
0:07:43 That’s actually how I ended up at OpenAI, too, kind of fast-forwarding from there.
0:07:45 Because OpenAI kind of kept a quiet profile-ish.
0:07:48 I’d always kind of kept tabs on them because a bunch of the core people I knew
0:07:49 kind of, like, ended up there.
0:07:50 It’s kind of, like, checking in on it.
0:07:52 And they were like, yeah, something crazy is happening here.
0:07:53 You should definitely check it out.
0:07:55 So, yeah, I definitely owe a lot to Quora.
0:07:58 But, yeah, part of the reason why I went there versus other options as a new grad
0:08:00 was the team was just so incredible.
0:08:02 And I just felt like I could learn a ton from them.
0:08:04 I didn’t think about everything afterwards.
0:08:07 I was just like, man, if I could just absorb some knowledge from this group of people,
0:08:08 it would be great.
0:08:08 Awesome.
0:08:13 So one place I wanted to start is something that I find very unique about OpenAI is it’s
0:08:16 both a pretty horizontal company.
0:08:17 Like, it’s got an API.
0:08:21 Like, I would say we’ve got this massive portfolio of companies, right?
0:08:24 And I would say a good fraction of them use the API.
0:08:29 And then it’s also a vertical company in that you’ve got full-on apps, right?
0:08:29 Yep.
0:08:31 Like, everybody uses ChatGPT, for example.
0:08:36 And so you’re responsible for the API and kind of the dev tool side.
0:08:40 So maybe just to begin with, is there an internal tension between the two?
0:08:42 Like, is that a discussion?
0:08:48 Like, the API may, whatever, it may help a competitor to, like, the vertical version?
0:08:51 Or is it not, if things are just growing so fast, it’s not an issue?
0:08:53 I would just love how you think about that.
0:08:55 By the way, it’s very unusual for companies to have both of that.
0:08:57 These two things this early, it’s very unusual.
0:08:58 Yeah, yeah, I completely agree.
0:09:00 I think there is some amount of tension.
0:09:04 I think one thing that really helps here is Sam and Greg, just from a founder perspective,
0:09:08 have, since day one, just been very principled in the way in which we approach this.
0:09:12 They’ve always have kind of told us, we want ChatGPT as a first-party app.
0:09:13 We also want the API.
0:09:16 And the nice thing is, I think they’re able to do this because, at the end of the day,
0:09:19 it kind of comes back to the mission of OpenAI, which is to create AGI
0:09:21 and then to distribute the benefits as broadly as possible.
0:09:24 And so if you interpret this, you want it in as many surfaces as you want.
0:09:27 And the first-party app is a really great way to get, you know,
0:09:29 I don’t know, it’s like 800 million wows or whatever now.
0:09:31 800 million wows?
0:09:32 Yeah, yeah.
0:09:34 It’s pretty, it’s actually mind-boggling to think about.
0:09:38 I don’t think, many people listening to this don’t understand how big that is.
0:09:40 Yeah, it’s crazy, yeah.
0:09:43 It’s got to be, like, actually historic for the time it’s taken to get to 800 million.
0:09:47 It’s historic, it’s also just, like, yeah, the amount of time and just, like, how much we’ve
0:09:48 got to scale up.
0:09:49 A tenth of the globe, right?
0:09:52 Yeah, yeah, 10% of the globe uses it weekly.
0:09:53 Every week, every week.
0:09:53 Yeah, yeah.
0:09:54 And it’s growing, and it’s growing.
0:09:57 So, like, at some point, you know, it’ll hit, like, you know, it’ll go even higher than that.
0:10:00 And so, so, yeah, like, obviously the reach there is unmatched.
0:10:04 But then also just, like, being able to have a platform where we can reach even more than just
0:10:04 that.
0:10:08 Like, one thing we talk about internally sometimes is, like, what does our end user
0:10:09 reach from the API?
0:10:11 Like, it’s actually, it’s, like, really, really, it’s really broad.
0:10:14 It might even, it’s hard because ChatGPT is growing so quickly.
0:10:18 But, like, at some points, it was definitely larger than ChatGPT.
0:10:21 And the fact that we’re able to get Tappen in all of this and get the reach that we want,
0:10:22 I think, is really good.
0:10:24 But, yeah, I mean, there’s definitely some tension sometimes.
0:10:27 I think the, I think it’s come up in a couple of places.
0:10:29 I think one of them is on the product side.
0:10:32 So, as you mentioned, you know, sometimes there are competitors kind of, like, building on
0:10:38 our platform who, you know, might not be happy if ChatGPT launches something that competes
0:10:39 with them.
0:10:39 Yeah.
0:10:43 I mean, that’s the tale of the old is the cloud or operating systems or whatever.
0:10:49 So, like, that’s, you know, I think it’s more, like, does ChatGPT worry about the competitor,
0:10:51 you know, type thing.
0:10:53 Like, you know, you enabling a competitor.
0:10:54 Yeah, yeah.
0:10:58 So, I mean, the interesting thing is, like, I would say not particularly, mostly just because
0:10:59 we’ve been growing so quickly.
0:11:02 It’s like, you know, it’s such a, you know, force right now.
0:11:03 Yeah, yeah.
0:11:05 Growth solves so many, so many different things.
0:11:08 And, like, and the other way we think about it is, like, everyone’s kind of building, building
0:11:10 around AGI, building towards AGI.
0:11:12 Of course, there’s going to be some overlap here.
0:11:16 So, yeah, I mean, but, but I would say, like, at least in my position, I feel more of this
0:11:18 tension from the customer, like, the API customers themselves, right?
0:11:21 It’s like, oh, my gosh, you know, you’re like, are you going to build this thing that I’m
0:11:22 working on?
0:11:25 Yeah, that story is as old as computer systems.
0:11:27 There’s never not been a computer platform that didn’t have that problem.
0:11:30 So, okay, so I kind of go back and forth on this one.
0:11:39 I want to try one out on you, which is the problem historically with, you know, offering a core
0:11:42 services and APIs, you can get disintermediated, right?
0:11:43 And so I can build on top of it.
0:11:48 But then, you know, the user doesn’t know, like, whatever, I build on top of the cloud,
0:11:51 but I disintermediate from the cloud, and then I can switch to another cloud or whatever.
0:11:57 And it occurs to me that that’s kind of hard to do with these models, because the models are
0:11:58 so hard to abstract away.
0:12:01 Like, they’re just, they’re just unruly, right?
0:12:06 If you try to, like, have traditional software drive them, they just don’t kind of manage very
0:12:06 well.
0:12:12 So part of me thinks that it’s almost like this, like, anti-disintermediation technology
0:12:16 that you kind of have to expose it to the user directly.
0:12:17 Does that make sense?
0:12:21 And so I’m wondering if, like, so even if I think ChatGPT is really just trying to expose
0:12:24 the model to the user, the API is kind of just trying to expose the model to the user.
0:12:28 So I think there’s almost this argument that’s like, if the real value is in the models, it
0:12:31 doesn’t really matter how you get it to them, because it’s going to be very tough for someone
0:12:36 to abstract it away in the classic sense of computer science of, like, they don’t know
0:12:36 that they’re using the model.
0:12:39 Like, you always know you’re using GPT-5.
0:12:42 Yeah, and the interesting thing is, I think, like, the entire industry kind of has slowly
0:12:43 changed their mind around this, too.
0:12:46 I think, like, in the beginning, we kind of thought, like, oh, these are all going to be
0:12:46 interchangeable.
0:12:47 It’s just like software.
0:12:48 Yeah, yeah, exactly.
0:12:50 So the piece of infrastructure that you can just swap out, yeah.
0:12:53 But I think we’re learning this on the product side with, like, you know, the GPT-5 launch
0:12:57 and, like, 4.0 and, like, how so many people liked 0.3 and 4.0 and all of that.
0:12:58 I felt that.
0:13:00 Yeah, I felt that when it changed.
0:13:02 I’m like, you’re not as nice to me.
0:13:04 Like, I like the validation.
0:13:05 Yeah.
0:13:09 It’s actually funny, because I really loved GPT-5’s personality, but I think it’s, like,
0:13:11 the way I used, you know, chat GPT was very utilitarian.
0:13:12 Oh, I see.
0:13:14 It’s, like, you know, mostly for work or just, like, information.
0:13:15 Yeah, I’ve definitely come around, just so you know.
0:13:17 But, like, I actually felt a dissonance when it changed.
0:13:20 It’s, like, it’s, like, there’s this emotional thing that goes on.
0:13:25 But it’s almost like it’s an anti, you know, disintermediation technology.
0:13:27 Like, you kind of have to show this to the user.
0:13:27 Yeah.
0:13:27 Yeah.
0:13:31 And then you see a lot of, like, you know, more successful products like Cursor, like, do this directly,
0:13:33 especially the coding products where users want more control.
0:13:37 We’ve even seen some, like, you know, like, more general consumer products do this.
0:13:39 And so it’s definitely been true on the consumer side.
0:13:42 The interesting thing is I think it’s also been true on the API side.
0:13:43 And that’s also something that I think—
0:13:44 No, no, exactly.
0:13:45 No, that’s exactly what I’m saying.
0:13:49 So, like, the argument could be that I could use the API to disintermediate you.
0:13:55 But, like, you don’t see that happening because it’s so hard to put a layer of software between a model and a person.
0:13:57 You almost have to expose the model.
0:13:58 Yes, yes.
0:14:04 And I think, if anything, I think the models are, like, almost, like, diverging in terms of, like, what they’re good at
0:14:05 and, like, their specific use case.
0:14:07 And I think there’s going to be more and more of this.
0:14:15 But, yeah, basically, it’s been surprisingly hard for—or, like, the retention of people building on our API is, like, surprisingly high,
0:14:17 especially when people thought you could just kind of swap things around.
0:14:21 You might have, you know, like, even tools that help you swap things around.
0:14:25 But, yeah, the stickiness of the model itself has been surprising.
0:14:29 And do you think that is because of a relationship between the user and the model?
0:14:38 Or do you think it’s more of a technical thing, which is, like, my evals work for, like, open AI and, you know,
0:14:40 and, like, the correctness maintains or—
0:14:40 Yeah, yeah.
0:14:41 I think it’s both.
0:14:45 So I think there’s definitely an end-user piece here, which is what we’ve heard from some of our customers.
0:14:48 Like, they just get familiar with the model itself.
0:14:51 But I also think there’s a technical piece, which is, like, the—
0:14:54 Also, as a developer, especially with startups, you’re, like, really going deep with these models
0:15:00 and, like, really, like, iterating on it, trying to get it really good within your particular harness.
0:15:01 You’re iterating on your harness itself.
0:15:03 You’re giving it different tools here and there.
0:15:07 And so you really do end up, like, building a product around the model.
0:15:12 And so there is a technical piece where, you know, as you kind of keep building with a particular product,
0:15:19 like GPT-5, you’re actually, like, building more around it so that your product works uniquely well with that model.
0:15:27 So I use Cursor and just for, like, a lot of stuff, like, writing blogs and, like, you know, we’re investors.
0:15:29 And I use it for—sometimes for coding.
0:15:32 And it’s remarkable how many models I use in Cursor.
0:15:35 So, like, literally my go-to model is GPT-5.
0:15:35 I love GPT-5.
0:15:38 I think it’s a phenomenal, like, you know.
0:15:41 And then, like, I use, like, max mode with GPT-5 for planning.
0:15:45 And then—but, you know, like, I mean, I like the tab complete model that’s in Cursor.
0:15:48 And, like, you know, the new model they just dropped is for, like, some basic—you know, some stuff.
0:15:49 Yeah, the Composer one.
0:15:50 Like, yeah, the Composer one’s good.
0:15:51 Yeah.
0:15:54 And so, like, you know—
0:15:56 And I think that, like, kind of reflects this, too.
0:15:59 Because it’s, like, it’s a particular model for each particular use case.
0:15:59 Yes, yeah, yeah, yeah.
0:16:01 Like, I’ve talked to a bunch of people who’ve used the new Composer model.
0:16:04 And it’s just really good for, like, fast—
0:16:04 It’s super fast.
0:16:07 Like, first pass, like, keep you in flow kind of thing.
0:16:10 And then you kind of, like, bubble out to another model if you want, like, you know, deeper thinking or something like that.
0:16:13 I mean, I literally sit down—I literally sit down and ask GPT-5 to help me plan something out.
0:16:15 And it’s really good at that.
0:16:15 Yep.
0:16:19 And then, you know, like, when I’m coding and I’m doing, like, the quick chat thing, then I’ll use Composer.
0:16:19 Yeah.
0:16:22 And if there’s, like, whatever, there’s, like, some crazy bug or something like that.
0:16:28 Like, so, you know, do you remember, like, in the early days of all of this, where, like, there’s going to be one model?
0:16:37 And, like, I mean, like, even, like, investors, like, we will never invest in a model company because, like, there will only be one model and it’s going to be AGI.
0:16:40 But, like, the reality, it feels like there’s this massive proliferation of models, like you said before.
0:16:41 They’re doing many things.
0:16:45 And so maybe two questions, maybe too blunt or too crass.
0:16:46 But the first one is, what does that mean for AGI?
0:16:50 And the second one is, what does that mean for OpenAI?
0:16:54 Like, does that mean that, like, you end up with a model portfolio?
0:16:56 Do you select a subset?
0:16:58 Do you think this all gets superseded by some God model in the future?
0:16:59 Like, how does that play out?
0:17:01 Because it’s against what most people thought.
0:17:04 Most people thought this is all going towards one large model that does everything.
0:17:04 Yeah.
0:17:08 I think the crazy thing about all this is just, like, how everyone’s thinking has just changed over time.
0:17:08 Totally.
0:17:12 Like, I distinctly remember this, like, and the crazy thing is not that long ago.
0:17:14 It’s just, like, three, like, two or three years ago.
0:17:19 I remember, like, even with an OpenAI, the thinking was that there would be, like, one model that rules them all.
0:17:22 And it’s like, why would you, I mean, like, this kind of goes to the fine-tuning API product.
0:17:24 It’s like, why would you even have a fine-tuning product?
0:17:26 Why would you even want to, like, iterate on it?
0:17:28 There’s going to be this one model that just subsumes everything.
0:17:33 And that was also kind of the, that is also, like, the most simplistic, like, view of what the AGI will look like.
0:17:33 Yeah.
0:17:38 And, yeah, it’s, like, definitely completely changed since then.
0:17:44 I think one, and, but then the other thing to keep in mind is, like, it might continue to change, like, even from where we are today.
0:17:44 Yeah.
0:17:50 But it’s, like, becoming increasingly clear, I think, that there will be room for a bunch of specialized models.
0:17:52 There will likely be a proliferation of other types of models.
0:17:55 I mean, you see us do this with, like, the codex model itself.
0:18:01 We have, like, you know, we have, like, GPT-4.1 and, like, 4.0 and, like, 5 and all of this.
0:18:04 And so I don’t think there’s room for all this.
0:18:06 I don’t think that’s bad for what it’s worth.
0:18:12 Like, if anything, I think, you know, as we’ve tried to move towards AGI, things have just been very unexpected.
0:18:15 And I think the market just evolved and the product portfolio evolves because of that.
0:18:17 So I don’t think it’s a bad thing at all.
0:18:19 What I do think it means—
0:18:22 You can easily argue it’s very good for OpenAI and very good for, like, the model companies.
0:18:27 Yeah, because not have, like, you know, winner-take-all consolidated dynamics, right?
0:18:31 I mean, you just have a healthier ecosystem, a lot more solutions you can provide a lot, you know.
0:18:35 Yeah, and as the ecosystem grows, it generally is helpful.
0:18:39 Like, this is one thing we actually think about a lot, too, is as the general, like, AI ecosystem grows,
0:18:41 like, OpenAI just stands to benefit a lot from this.
0:18:46 And this is also why we’ve, like, some of our products we’ve even started opening up to other models, right?
0:18:50 Like, our Eval’s product now allows you to bring in other models to all of this.
0:18:53 We think it’s, like, any rising tide generally helps us here.
0:18:56 But, yeah, I think as we move into a world where there will be a bunch more models,
0:19:00 this is why we’ve kind of invested in our model customization product with the fine-tuning API,
0:19:03 with the reinforcement fine-tuning, opening that up as well.
0:19:09 It’s also part of why we open-sourced GPT-OSS as well, because we want to be able to, you know, facilitate.
0:19:13 I want to super, I want to talk about that in just a bit, because the open-source is actually very interesting.
0:19:16 I mean, actually, I thought the open-source model was great.
0:19:16 Yeah.
0:19:18 But clearly, it’s something that a company has to be careful with.
0:19:18 Yeah.
0:19:24 But before that, I want to talk a little bit about the fine-tuning API.
0:19:28 So, I’ve noticed that you are moving towards kind of more sophisticated use of things,
0:19:34 like, you know, like fine-tuning, which, you know, in a way, you could read that as a bit of a capitulation,
0:19:39 that, like, you know, there is product-specific data,
0:19:43 and there’s product-specific use cases that a general model won’t do, to your point, right?
0:19:45 So, like, as opposed to proliferation model, you do that.
0:19:48 It seems like a lot of that data is actually very, very valuable, right?
0:19:56 And so, you know, to what extent is there, like, interest in almost a tit-for-tat,
0:20:02 where you can, like, expose, you know, the ability to get product data into fine-tuning,
0:20:09 and then you also benefit from that data because the vendors provide it to you,
0:20:14 versus, like, this is 100%, you know, like, they keep their own data,
0:20:15 and there’s kind of no interest in that.
0:20:19 Because it feels to me like the next level of scaling, this is kind of where we’re at,
0:20:21 and so I’m just kind of curious how…
0:20:23 Yeah, so, I mean, maybe even, like, taking a step back,
0:20:27 the main reason why we even invested in a fine-tuning API in the very beginning
0:20:33 is, one, there’s been huge demand from people to be able to customize the models a bit more.
0:20:35 It kind of goes into, like, prompt engineering, and also, like,
0:20:37 I think the industry’s changed their mind on that as well, like, it’s evolved.
0:20:39 But the second thing is exactly what you said,
0:20:44 which is the companies just have giant treasure troves of data that they are sitting on
0:20:48 that they would like to utilize in some fashion in this AI wave.
0:20:51 And you can, you know, the simple thing is to put it in, like, you know,
0:20:53 some, like, vector, like, do rag with it or something.
0:20:53 Yeah.
0:20:55 But there’s also, you know, if they have a more technical team,
0:20:57 they do want to see how they can use it to customize the models.
0:21:01 And so that is actually the main reason why we’ve invested in this.
0:21:06 The interesting thing was way back, kind of back in, like, 22, 23,
0:21:09 our fine-tuning offering was, I’d say, like, too limited
0:21:12 so that it was very difficult for people to tap into and use this data.
0:21:16 So it was just, like, an SF, like, a supervised fine-tuning API.
0:21:17 And, like, we’re like, oh, you can kind of use it,
0:21:20 but in practice it really is only useful for, like,
0:21:23 it’s honestly just, like, instruction following plus-plus.
0:21:24 You, like, kind of change the tone.
0:21:25 You’re just, like, really, like, instructing it.
0:21:29 But I think the big unlock that has happened recently
0:21:30 is with the reinforcement fine-tuning model
0:21:35 because with that setup, we’re now letting you actually run RL,
0:21:36 which is more finicky and it’s, like, harder
0:21:38 and, you know, like, you need to invest more in it.
0:21:40 But it allows you to leverage your data way more.
0:21:43 By the way, this is just a naive question for me,
0:21:48 which is, it feels, from just my understanding from my own portfolio,
0:21:49 it feels like there’s two modalities of use.
0:21:51 One of them is I’ve got a treasure trove of data
0:21:53 that I’ve had for a long time,
0:21:55 and I create my model on that treasure trove of data,
0:21:57 and all that happens offline, and then I deploy that.
0:21:59 There’s another one, which is, like,
0:22:00 I actually have the product being used in real time.
0:22:02 I’ve got a bunch of users.
0:22:02 Yeah.
0:22:05 And, like, I can actually get much closer to the user.
0:22:07 I can kind of A-B test and decide which data,
0:22:10 and, like, it’s kind of more of a near-real-time thing.
0:22:15 Is, like, is this focus on, like, more product stuff
0:22:16 or more treasure trove?
0:22:18 So the dream with the fine-tuning API
0:22:19 was that we should be able to handle both, right?
0:22:21 It’s like, we actually had this dream,
0:22:22 and we have this whole, like, Laura set up
0:22:24 with the fine-tuning inference
0:22:25 where we should just be able to scale
0:22:27 to, like, millions and millions of these fine-tuned models,
0:22:28 which is usually what would happen
0:22:30 if you had, like, this online learning thing.
0:22:30 Yeah, yeah, yeah, exactly.
0:22:33 In practice, it’s mostly been the form, right?
0:22:34 In practice, it’s mostly been, like, the offline data
0:22:36 that they’ve, like, already created
0:22:38 or they are creating with experts or something
0:22:39 and, like, using their product
0:22:41 that they’re able to use here.
0:22:43 But the main thing I was trying to say
0:22:44 around the reinforcement fine-tuning API
0:22:47 is it kind of changes the paradigm away
0:22:49 from just, like, small incremental,
0:22:51 like, tone improvements, which is what SFT did,
0:22:54 to actually improving the model
0:22:55 to potentially SOTA level
0:22:58 on a particular use case that you know about.
0:22:59 Like, that’s where people have really started
0:23:02 using the reinforcement fine-tuning API.
0:23:05 And that’s why it’s gotten more uptake.
0:23:07 Because if the discussion is less, like,
0:23:09 hey, I can make this model, you know,
0:23:11 not, like, speak in a certain way better,
0:23:11 it’s less compelling.
0:23:13 But if it’s, like, hey, for, like,
0:23:15 you know, medical insurance coding
0:23:17 or for, like, coding planning,
0:23:17 agentic planning or something,
0:23:19 you can create the world’s best model
0:23:21 using your data set with RFT,
0:23:22 then it becomes a lot more…
0:23:25 And will you ever, like, or maybe do you,
0:23:27 will you ever, like, find ways
0:23:28 to get access to that data?
0:23:29 Like, you know, like,
0:23:30 Listen, if I had the data
0:23:31 and I wanted cheap GPUs,
0:23:32 I’d trade you for it.
0:23:33 Like, I don’t know.
0:23:35 Yeah, I mean, we’ve talked about this
0:23:36 and we’ve actually been piloting
0:23:37 some pricing here, too,
0:23:38 where it’s, like,
0:23:40 because this data is, like, really helpful
0:23:42 and it’s kind of hard to get.
0:23:44 And if you actually build
0:23:46 with a reinforcement fine-tuning API,
0:23:48 you can actually get discounted inference
0:23:50 and potentially free training, too,
0:23:50 if you’re willing to share the data.
0:23:52 It’s always kind of, you know,
0:23:53 it’s up to the customer there.
0:23:54 But if they do,
0:23:55 it is helpful for us
0:23:57 and there will be benefits
0:23:58 for the customer as well.
0:23:58 That’s awesome.
0:24:00 Okay, you said that
0:24:02 the use on prompt engineering have changed.
0:24:02 Yeah.
0:24:04 Actually, I wasn’t aware of that.
0:24:05 All the other things,
0:24:05 I wasn’t aware of.
0:24:06 This one, I wasn’t.
0:24:06 Yeah, I mean,
0:24:08 I think the prevailing view,
0:24:09 this is back in 2022.
0:24:11 I remember I was talking to so many people
0:24:11 and they were basically,
0:24:12 I mean, this is similar
0:24:15 to, like, the single model AGI view as well,
0:24:15 which is, like,
0:24:17 like, prompt engineering
0:24:18 is just not going to be a thing
0:24:19 and you’re just not going to have to think
0:24:20 about what you’re putting
0:24:22 in the context window in the future.
0:24:23 like, the model would just be good enough
0:24:24 and it’ll just, like, know.
0:24:26 It’ll know what you need to do.
0:24:28 Yeah, that’s definitely not a thing.
0:24:28 Yeah, but, like,
0:24:30 I don’t know,
0:24:30 maybe people forget it,
0:24:30 but, like,
0:24:31 that was, like,
0:24:32 a very common belief back then
0:24:33 because, like,
0:24:33 the scaling laws
0:24:34 or whatever,
0:24:35 something with the scaling laws
0:24:35 and, like,
0:24:36 you’ll just mind meld with the model
0:24:37 and, like,
0:24:37 you just, like,
0:24:39 prompting and, like,
0:24:39 instruction following
0:24:40 will be so good
0:24:41 that you won’t really need to do it.
0:24:42 And if anything,
0:24:43 like, yeah,
0:24:43 it’s, like,
0:24:44 clearly been wrong.
0:24:44 Yeah, yeah, yeah.
0:24:46 But it is interesting
0:24:48 because I think it’s a slightly different world
0:24:53 relative to the, you know,
0:24:54 like, GB3.5 or something.
0:24:54 Yeah.
0:24:56 But I think the name of the game now
0:24:57 is less on, like,
0:24:58 prompt engineering
0:24:59 as we had thought about it
0:24:59 two years ago.
0:25:00 It’s more of, like,
0:25:01 it’s, like,
0:25:02 the context engineering side
0:25:02 where it’s, like,
0:25:03 what are the tools you give it?
0:25:04 What is, like,
0:25:05 the data that it pulls in?
0:25:06 When does it pull in the right data?
0:25:07 Well, this is very interesting.
0:25:07 I mean,
0:25:08 to reduce it to, like,
0:25:10 an almost absurdly simplistic level,
0:25:10 like,
0:25:13 the weird thing about RAG,
0:25:14 for example,
0:25:15 the classic use of RAG
0:25:15 is, like,
0:25:16 you’re using, like,
0:25:17 cosine similarity
0:25:19 to choose something
0:25:20 that you’re going to feed
0:25:21 into a superintelligence, right?
0:25:22 So, like,
0:25:22 you know,
0:25:22 you’re like,
0:25:23 I’m going to randomly
0:25:24 It’s like insulting almost, yeah.
0:25:24 I’m going to, like,
0:25:26 randomly grab this thing
0:25:27 based on, like,
0:25:28 fucking embedding space.
0:25:28 It doesn’t really,
0:25:29 you know,
0:25:29 and, like,
0:25:30 and then, you know,
0:25:31 when you want the superintelligence
0:25:32 to decide the thing to do,
0:25:33 and so it’s, like,
0:25:34 pushing intelligence
0:25:35 in that retrieval
0:25:37 clearly is something
0:25:38 that makes a lot of sense.
0:25:38 It’s almost like
0:25:40 pushing the intelligence out
0:25:40 in a way.
0:25:40 Exactly.
0:25:42 And to be fair,
0:25:42 I think, like,
0:25:43 RAG was kind of introduced
0:25:44 when the models were, like,
0:25:45 it was, like,
0:25:45 pre-reasoning models.
0:25:46 It was, like,
0:25:47 you only had to kind of,
0:25:47 like,
0:25:48 one shot to, like,
0:25:48 do this,
0:25:49 and it wasn’t that smart.
0:25:50 But now that we do have
0:25:51 the reasoning models,
0:25:51 now that we have,
0:25:51 I mean,
0:25:52 if you, like,
0:25:54 one of my favorite models
0:25:54 is actually 03
0:25:55 because it was, like,
0:25:57 one of the most diligent models.
0:25:57 I use 03.
0:25:58 It would just, like,
0:25:59 do all these tool calls,
0:26:00 and it’s, like,
0:26:01 really the intelligence itself
0:26:02 trying to, like,
0:26:03 do the, you know,
0:26:04 tool calls or RAG
0:26:05 or anything like that
0:26:06 or write the code
0:26:07 to execute.
0:26:09 And so the paradigm
0:26:09 has shifted there,
0:26:10 but, yeah,
0:26:10 because of that,
0:26:11 I think, like,
0:26:11 context engineering,
0:26:12 prompt engineering,
0:26:16 okay,
0:26:17 so you have API,
0:26:18 so you have the API,
0:26:18 which is horizontal,
0:26:20 you’ve got ChatGPT
0:26:21 and other products
0:26:21 which are vertical.
0:26:22 We haven’t even talked
0:26:22 about pixels.
0:26:24 This is all just language.
0:26:26 Are agents a new modality?
0:26:27 Is that something else?
0:26:29 Like, you know,
0:26:32 like a codex or…
0:26:32 What do you mean
0:26:33 by modality here?
0:26:33 Like,
0:26:35 I mean,
0:26:36 they feel both vertical
0:26:37 and horizontal to me
0:26:38 in a way.
0:26:38 Like, to me,
0:26:39 ChatGPT is a product,
0:26:40 right?
0:26:41 It’s like it’s a product
0:26:42 and, like,
0:26:43 my mom uses it, right?
0:26:43 Yep.
0:26:44 And an API
0:26:45 is a dev thing.
0:26:46 You know,
0:26:46 you kind of give it
0:26:47 to a developer
0:26:47 and, like,
0:26:48 a CLI is kind of
0:26:49 somewhere in between to me.
0:26:50 It’s like,
0:26:51 is it a product?
0:26:52 Is it, like,
0:26:53 it is horizontal?
0:26:53 Like,
0:26:55 how is it handled internally?
0:26:56 Is it a totally separate team
0:26:58 that does agents or…
0:26:59 No, so it’s,
0:27:01 yeah,
0:27:02 it’s interesting
0:27:02 because, like,
0:27:04 I think the way that I,
0:27:05 the way that you frame it
0:27:06 just now
0:27:06 almost seemed like
0:27:07 agents was, like,
0:27:07 this, like,
0:27:08 singular concept
0:27:09 that, like,
0:27:09 you know,
0:27:10 might have its own
0:27:11 particular team.
0:27:12 Maybe a better question
0:27:13 is what is an agent to you?
0:27:14 Yeah, yeah, yeah, yeah.
0:27:16 Even getting a language
0:27:17 is, like,
0:27:18 important for this conversation.
0:27:20 So, I actually don’t even know
0:27:20 if it would be helpful
0:27:21 for me to share,
0:27:22 but my general take on agents
0:27:24 is it’s an AI
0:27:25 that will take actions
0:27:25 on your behalf
0:27:26 that can work
0:27:27 over long-time horizons.
0:27:28 Okay.
0:27:29 And I think that’s the
0:27:30 pretty general…
0:27:30 Pretty utilitarian.
0:27:31 Yeah, yeah.
0:27:31 Definitely.
0:27:31 But, like,
0:27:32 if you think about it that way,
0:27:33 yeah, I mean,
0:27:34 maybe this is what you mean
0:27:34 by modality,
0:27:35 but it is just a, like,
0:27:37 way of, like,
0:27:38 using AI.
0:27:40 And it is a,
0:27:41 I guess it could be viewed
0:27:41 as a modality,
0:27:42 but we don’t view it
0:27:43 as, like,
0:27:43 a separate thing,
0:27:44 separate from AI.
0:27:47 Well, let me just try
0:27:49 and kind of, you know,
0:27:49 give you a sense
0:27:50 of where this question
0:27:50 is coming from.
0:27:51 Like, I know how
0:27:52 to build a product,
0:27:52 like, and we know
0:27:53 how to do go-to-market
0:27:53 for products.
0:27:54 We know how to do,
0:27:56 like, you know,
0:27:57 we know the implications
0:27:58 of turning them
0:27:58 into platforms.
0:27:59 Like, it’s just,
0:28:00 we’ve been doing this
0:28:01 for a very long time, right?
0:28:02 We know how to do
0:28:03 the same thing for APIs, right?
0:28:04 We know how to do billing.
0:28:05 We know, like,
0:28:06 the tension of, like,
0:28:07 people build on top of it
0:28:08 and all of that stuff.
0:28:08 And, like,
0:28:09 what I’ve been trying to,
0:28:10 and this is just maybe
0:28:11 a personal inquiry,
0:28:14 it’s just not clear for me
0:28:15 for an agent
0:28:15 if you,
0:28:17 if it sits in one
0:28:18 of those two camps,
0:28:19 is it more like
0:28:19 the product camp?
0:28:21 Is it more like the,
0:28:22 or is it,
0:28:24 because it’s kind of both.
0:28:24 Like, I could, like,
0:28:25 literally give you coding.
0:28:26 Yeah, yeah.
0:28:27 And, like,
0:28:28 as a user,
0:28:29 and then you just talk to it,
0:28:30 or I could, like,
0:28:32 build in a way,
0:28:33 kind of embed it
0:28:34 in, like, my app.
0:28:35 And so, like,
0:28:36 but then that means
0:28:37 something to you
0:28:38 as far as, like,
0:28:38 you know,
0:28:39 how do you price it
0:28:40 and what does it mean
0:28:40 for ecosystem?
0:28:41 Like, for example,
0:28:42 like, would you be fine
0:28:43 if I started a company
0:28:44 and just, like,
0:28:44 built it around Codex?
0:28:45 Is that a thing?
0:28:47 Starting a company
0:28:48 and building it around Codex?
0:28:48 Codex, yeah, yeah.
0:28:49 I actually think
0:28:49 that would be great.
0:28:50 Like, it’s a,
0:28:51 we, like,
0:28:51 released, like,
0:28:52 the Codex SDK
0:28:52 and we, like,
0:28:53 want people to be able
0:28:54 to build it and hack on it.
0:28:54 Yeah.
0:28:56 Actually, I think this might be
0:28:57 what you’re getting at,
0:28:57 which is,
0:28:59 and this is, like,
0:29:00 a kind of a unique thing
0:29:00 about OpenAI
0:29:01 and kind of reflects
0:29:02 on how it’s run,
0:29:03 which is,
0:29:03 at the end, like,
0:29:04 at the end of the day,
0:29:06 OpenAI is like an AGI company.
0:29:08 It’s like an intelligence company.
0:29:08 Yeah, for sure.
0:29:08 And so,
0:29:09 agents are just, like,
0:29:10 one way in which
0:29:11 this intelligence
0:29:12 kind of be manifested.
0:29:13 And so,
0:29:14 the way that I’d say
0:29:15 we actually think about internally
0:29:16 is all of our different
0:29:16 product lines,
0:29:18 Sora, Codex, API,
0:29:18 ChatGPT,
0:29:20 are just different interfaces
0:29:21 and different ways
0:29:21 of deploying this.
0:29:22 So, you don’t really.
0:29:23 So, there’s no, like,
0:29:23 single teams, like,
0:29:24 this is, you know,
0:29:25 like, thinking about agents.
0:29:26 I would say the way
0:29:27 that it manifests itself more
0:29:28 is, like,
0:29:29 each product area
0:29:29 thinks about, like,
0:29:30 what is, you know,
0:29:31 this intelligence
0:29:32 is actually turning
0:29:33 into a form
0:29:33 where, like,
0:29:33 it can actually,
0:29:34 agentic behavior
0:29:35 is more possible.
0:29:36 What would that look like
0:29:37 in a first-party product
0:29:38 like ChatGPT?
0:29:39 What would that look like?
0:29:40 Like, this is actually
0:29:40 why Codex ended up
0:29:41 becoming its own products.
0:29:42 Like, what would it look like
0:29:43 in a coding-style product?
0:29:44 Like, we explored it
0:29:45 and ChatGPT, like,
0:29:46 kind of worked there,
0:29:47 but, like, actually,
0:29:48 the Klai interface
0:29:49 actually makes a lot more sense.
0:29:49 That’s another interface
0:29:50 to deploy it.
0:29:51 And then if you look
0:29:52 about the API itself,
0:29:52 it’s, like,
0:29:54 this is another interface
0:29:54 to deploy it.
0:29:56 You’re thinking about it
0:29:56 in a slightly different way
0:29:58 because it’s a developer-first mindset.
0:29:59 We’re helping other people
0:29:59 build it.
0:30:00 The pricing is slightly different.
0:30:01 But it’s all these, like,
0:30:02 different manifestations
0:30:03 of this core, like,
0:30:04 intelligence
0:30:06 that is the agent behavior.
0:30:07 It is so remarkable
0:30:09 how much of this entire economy
0:30:09 is basically
0:30:10 just token laundering.
0:30:12 In a sense, right,
0:30:13 it’s literally, like,
0:30:14 anything I can do
0:30:15 to get, like,
0:30:16 English in
0:30:17 or, like,
0:30:18 a natural language in
0:30:18 and then, like,
0:30:19 you know,
0:30:20 the intelligence out.
0:30:20 Yeah.
0:30:21 And, I mean,
0:30:22 it’s because these things
0:30:24 are so resistant to layering.
0:30:25 It’s so hard to layer
0:30:26 a language out.
0:30:26 Like, you know,
0:30:28 like, I could even do it
0:30:29 pretty easily with, like,
0:30:29 Codex.
0:30:30 I could just, like,
0:30:31 use it, you know,
0:30:33 as a component of a program
0:30:34 and just, you know,
0:30:35 basically launder intelligence.
0:30:36 I mean, of course,
0:30:36 you know,
0:30:37 I’d be charged to do that.
0:30:39 So, I actually,
0:30:40 my view of this
0:30:41 and having seen now
0:30:42 so many kind of launches
0:30:42 of different products,
0:30:44 I’ve seen agent launches
0:30:44 and the definition
0:30:45 that you have,
0:30:46 I’ve definitely seen APIs,
0:30:48 and I’ve seen products
0:30:48 on these,
0:30:49 is, like,
0:30:52 they’re actually quite different
0:30:53 than, like,
0:30:55 what we’re used to.
0:30:56 Like, the Cogs is different,
0:30:58 the defensibility is different.
0:30:58 Like, oh,
0:30:59 so we’re kind of rewriting it.
0:31:01 And so it’s kind of like,
0:31:02 you know,
0:31:02 you came from
0:31:04 a kind of pricing background.
0:31:04 I mean,
0:31:05 you were working on
0:31:06 a model for pricing,
0:31:07 now you have the API,
0:31:08 so I just love your thoughts
0:31:09 on, like,
0:31:10 I mean,
0:31:12 how have you evolved
0:31:12 your thinking
0:31:13 and how do you price
0:31:15 these, you know,
0:31:16 access to intelligence
0:31:17 where,
0:31:18 you know,
0:31:19 you don’t know
0:31:19 how many people
0:31:20 are going to use it.
0:31:20 It’s almost certainly
0:31:21 usage-based billing,
0:31:22 not something else.
0:31:24 Like, can you talk
0:31:25 just a bit about, like,
0:31:26 philosophy around pricing
0:31:26 on these things?
0:31:27 Is it different
0:31:28 for product versus API?
0:31:29 Yeah, I think
0:31:33 the honest truth there
0:31:33 is, like,
0:31:34 it’s evolved over time
0:31:34 as well.
0:31:34 And, like,
0:31:35 I actually think
0:31:35 the simplest,
0:31:36 like,
0:31:37 the reason why
0:31:37 we’ve done
0:31:38 usage-based pricing
0:31:38 on the API,
0:31:39 honestly,
0:31:39 is because
0:31:40 it’s been, like,
0:31:41 it’s closest to
0:31:42 how it’s actually
0:31:42 being used.
0:31:43 And so that’s kind
0:31:44 of how we started.
0:31:45 I actually think
0:31:46 usage-based pricing
0:31:47 on the API
0:31:48 has, like,
0:31:50 surprisingly held strong
0:31:50 and, like,
0:31:51 I actually think
0:31:51 this might be
0:31:51 something that
0:31:52 we’ll keep doing
0:31:53 for quite a long time,
0:31:55 mostly because…
0:31:56 I don’t know
0:31:56 how you don’t
0:31:57 do usage-based.
0:31:58 Yeah, yeah, yeah.
0:31:58 I just don’t know
0:31:59 how that…
0:32:00 Yeah, and then
0:32:01 there’s also the strategy
0:32:01 of, like,
0:32:02 how we price it
0:32:03 and internally
0:32:04 one thing we do
0:32:05 is we always make sure
0:32:06 that we actually
0:32:08 price our usage-based pricing
0:32:08 from a, like,
0:32:09 cost-plus perspective.
0:32:10 Like, we’re actually
0:32:10 just, like,
0:32:11 trying to make sure
0:32:12 that we’re being
0:32:12 responsible
0:32:14 from a marketing
0:32:14 perspective.
0:32:15 By the way,
0:32:16 this is a huge shift
0:32:16 in the industry
0:32:17 in general
0:32:18 just because, like,
0:32:18 I remember the shift
0:32:19 from on-prem
0:32:20 to recurring.
0:32:20 Yeah.
0:32:21 That was a big,
0:32:22 big deal.
0:32:22 Like, that created
0:32:23 Zorro.
0:32:23 Like, it created
0:32:24 a whole company.
0:32:24 It was like,
0:32:25 the whole books
0:32:25 on it.
0:32:26 Like, a bunch of
0:32:27 consultants on how
0:32:28 you do this
0:32:28 to change, like,
0:32:29 you know,
0:32:29 and, like,
0:32:30 I think the shift
0:32:31 to usage
0:32:33 is as big or bigger.
0:32:34 And it’s also
0:32:35 even a really hard
0:32:35 technical problem.
0:32:36 Yeah.
0:32:38 I can’t even
0:32:39 imagine 800 million
0:32:40 wow.
0:32:40 Like, how do you
0:32:41 build?
0:32:41 Yeah.
0:32:43 Well, 800 million
0:32:43 wow is a little
0:32:44 easier because it’s
0:32:45 not usage-based pricing.
0:32:46 It’s subscription.
0:32:47 So, it’s like,
0:32:47 that was, like,
0:32:48 way easier.
0:32:49 But, I mean,
0:32:49 there’s still, like,
0:32:51 a lot of users
0:32:52 on the API
0:32:52 that we need to,
0:32:53 like, you know,
0:32:53 manage all the
0:32:54 billing side.
0:32:54 There’s some, like,
0:32:55 overages or stuff
0:32:55 you’ve got to deal
0:32:56 with on that?
0:32:57 What do you mean
0:32:58 by overages?
0:32:58 Like, I don’t know.
0:32:59 I guess I don’t know.
0:33:00 Most people have quotas
0:33:01 and then we’ll kind
0:33:01 of, like,
0:33:02 there are, like,
0:33:02 max quotas
0:33:03 that we don’t
0:33:04 let people go over.
0:33:04 But, like,
0:33:04 in practice,
0:33:05 these quotas are,
0:33:06 like, pretty massive.
0:33:06 And that would
0:33:07 literally be, like,
0:33:08 one of the most
0:33:08 complex systems
0:33:09 somebody’s ever built
0:33:10 if you would do
0:33:10 usage-based
0:33:11 at, like,
0:33:11 that scale.
0:33:11 I mean,
0:33:12 these are very,
0:33:12 very, very,
0:33:13 and, like,
0:33:13 you have to be correct.
0:33:14 Like, these are
0:33:15 very hard systems
0:33:15 to scale.
0:33:16 Yep, yep, yep,
0:33:16 yep, yeah.
0:33:17 Yeah, I mean,
0:33:17 we have a whole team
0:33:18 thinking about this
0:33:18 now internally.
0:33:20 Yeah, I mean,
0:33:20 usage-free pricing
0:33:21 is also interesting.
0:33:22 So, there’s,
0:33:23 we acquired this company
0:33:25 called Rockset
0:33:26 a while ago.
0:33:26 A founder,
0:33:27 his name is Ben Kott.
0:33:28 Yeah, Ben Kott’s awesome.
0:33:30 Ben Kott’s incredible.
0:33:31 He’s one of the best.
0:33:32 Like, Ben Kott,
0:33:33 if you’re listening,
0:33:33 we’re huge fans.
0:33:34 I’m a huge fan.
0:33:35 He’s going to love this.
0:33:37 Yeah, he’s great, man.
0:33:37 He’s a legend.
0:33:38 Anyways,
0:33:39 I was talking to him
0:33:40 about pricing as well.
0:33:42 And his take is
0:33:44 that pricing is kind of
0:33:44 like a one-way ratchet.
0:33:45 And, like,
0:33:46 basically,
0:33:46 once you get a taste
0:33:47 of usage-based pricing,
0:33:48 you’re never going to go back
0:33:48 to, like,
0:33:49 the per seat,
0:33:50 the, like,
0:33:51 per deployment type pricing.
0:33:52 And I think that’s
0:33:53 definitely true.
0:33:53 And I think it’s just
0:33:54 because it’s getting,
0:33:55 it gets closer and closer
0:33:55 to, like,
0:33:56 your true utility.
0:33:57 You’re getting all this thing.
0:33:58 The main pain point is,
0:33:58 like,
0:33:59 you have to maintain
0:34:00 all his infra.
0:34:00 Yeah, to, like,
0:34:01 get it to work well.
0:34:02 But if you do have it,
0:34:03 he thinks it’s, like,
0:34:04 a one-way ratchet
0:34:04 where, like,
0:34:05 there’s just, like,
0:34:05 no going back.
0:34:06 And then,
0:34:07 and I think the hot new thing
0:34:08 now is, like,
0:34:09 oh, with AI,
0:34:09 you can now kind of
0:34:10 measure, like,
0:34:10 outcomes.
0:34:11 And so that’s, like,
0:34:12 another, you know,
0:34:12 like, step forward.
0:34:13 And if that works,
0:34:14 like, maybe it’s a
0:34:14 one-way ratchet.
0:34:16 So we thought about that.
0:34:17 It’s like, you know,
0:34:17 is there some type of,
0:34:17 like,
0:34:18 outcome-based pricing?
0:34:19 This is more on the
0:34:20 first-party side.
0:34:20 On an API,
0:34:22 it’s kind of hard to measure that.
0:34:23 very hard.
0:34:23 I mean,
0:34:25 that’s hard because
0:34:26 you end up having to
0:34:28 price and value
0:34:29 non-computer science
0:34:30 infrastructure, right?
0:34:31 Like, you’re literally
0:34:32 going into verticalization
0:34:32 now.
0:34:33 Like, you’re, like,
0:34:34 I mean, listen,
0:34:35 if it’s, like,
0:34:36 porting a code base,
0:34:37 maybe you’d have
0:34:37 some expertise.
0:34:38 But if it’s, like,
0:34:39 whatever, like,
0:34:40 increasing crop yields.
0:34:42 Like, at some level,
0:34:43 you need to, like…
0:34:44 But there could be
0:34:44 a world where, like,
0:34:45 the AI is, like,
0:34:46 giving up where it can,
0:34:46 like, actually, you know,
0:34:47 make judgments of these
0:34:48 and do it in an
0:34:49 accurate enough way
0:34:49 where it can tie it
0:34:50 to billing.
0:34:51 I think this is a problem
0:34:52 with AI conversations
0:34:53 because, like,
0:34:53 at any point in time,
0:34:53 you’re, like,
0:34:54 but it could get good.
0:34:56 Yeah, yeah, yeah.
0:34:57 It’s not a problem anymore.
0:34:58 Yeah, yeah.
0:34:58 At some point,
0:34:59 it’ll be solved.
0:35:00 It’s so much like
0:35:01 the prompt engineering
0:35:02 and the single AGI,
0:35:03 I think, from before.
0:35:03 Yeah.
0:35:04 Yeah, it’s like,
0:35:05 when you reach that level
0:35:06 of, when you push it
0:35:06 that far,
0:35:07 everything’s kind of solved.
0:35:09 On outcome-based pricing,
0:35:11 it sounds very appealing.
0:35:12 Like, if it can work,
0:35:12 it can work.
0:35:14 But one thing that
0:35:15 we’ve started realizing
0:35:17 is it actually ends up
0:35:18 correlating quite a bit
0:35:19 with usage-based pricing,
0:35:20 especially with test-time compute.
0:35:21 Like, if the thing
0:35:21 is just, like,
0:35:22 thinking quite a bit.
0:35:23 Like, actually,
0:35:24 you know,
0:35:25 if you charge just by
0:35:25 usage-based
0:35:27 and not outcome-based,
0:35:27 you’re, like,
0:35:29 basically approximating
0:35:30 outcome-based at this point.
0:35:31 If the thing is, like,
0:35:32 thinking for, like,
0:35:32 so long,
0:35:33 it’s, like,
0:35:33 highly correlated
0:35:34 with what it’s doing.
0:35:35 It’s just adding more value.
0:35:36 Yeah, yeah, exactly.
0:35:36 Exactly.
0:35:37 And so, like,
0:35:38 maybe at the end of the day,
0:35:39 like, usage-based pricing
0:35:39 is all you need
0:35:40 and it’s, like,
0:35:40 we’re just going to,
0:35:41 like, you know,
0:35:42 live in this world forever.
0:35:44 But, yeah,
0:35:45 I don’t know.
0:35:46 It’s constantly evolving.
0:35:47 I think our thinking
0:35:48 has evolved here as well.
0:35:51 I personally am, like,
0:35:52 keeping track of if
0:35:53 the outcome-based pricing
0:35:55 setups can actually work here.
0:35:56 But at least on the API side,
0:35:57 I think, you know,
0:35:58 it’s such a usage-based setup.
0:35:59 We have to get infrastructure
0:36:00 around this.
0:36:01 And so, I think we’ll probably
0:36:01 stay with that for a while.
0:36:02 So, how do you think
0:36:03 about open source?
0:36:04 I mean, you know,
0:36:06 I think you’re the only
0:36:07 big lab that’s releasing
0:36:08 open source.
0:36:08 Is that?
0:36:09 No, Google has
0:36:10 some of theirs.
0:36:10 Okay.
0:36:11 Yeah, mostly smaller
0:36:12 models on their side.
0:36:13 Yeah, yeah, yeah.
0:36:13 That’s right.
0:36:14 So, how do you think
0:36:14 about open source
0:36:16 vis-a-vis, you know,
0:36:18 competition, cannibalization,
0:36:19 you know, like,
0:36:22 what’s the strategic goal?
0:36:23 What’s the complexity?
0:36:24 Yeah, yeah.
0:36:28 So, I personally love
0:36:29 open source.
0:36:30 Like, I think it’s great
0:36:30 that there’s a…
0:36:32 All of us grew up with it, right?
0:36:32 Yeah, all of us grew up with it.
0:36:33 Like, the internet
0:36:34 wouldn’t exist without it.
0:36:34 Like, you know,
0:36:35 so much of the world
0:36:36 was built in half of it.
0:36:37 Cloud wouldn’t exist
0:36:38 without of it.
0:36:38 Yeah.
0:36:39 Nothing would exist
0:36:40 without of it.
0:36:41 Except for maybe Windows.
0:36:42 And so, it was interesting
0:36:42 because, like,
0:36:44 I felt like over the last years
0:36:44 before we launched
0:36:45 the open source model,
0:36:46 I know Sam feels this way as well.
0:36:47 Yeah.
0:36:47 It’s like, there’s this, like,
0:36:49 weird, like, you know,
0:36:51 mindset where
0:36:52 because OpenAI
0:36:52 hadn’t launched anything,
0:36:53 it just seemed like
0:36:55 it was super, like, anti…
0:36:55 Like, OpenAI was, like,
0:36:57 super anti-open source.
0:36:57 Yeah.
0:36:59 But I’d actually been
0:36:59 having conversations with Sam
0:37:00 ever since I joined
0:37:01 about open sourcing a model.
0:37:02 We were just trying to think about,
0:37:04 like, how can we sequence it?
0:37:05 What compute is always a hard thing.
0:37:06 It’s like, do we have the compute
0:37:08 to kind of, like, train this thing?
0:37:08 Yeah.
0:37:09 So, we’ve always wanted
0:37:10 to kind of do this.
0:37:10 I’m really glad
0:37:11 that we were able
0:37:12 to finally do it.
0:37:13 I think it was early…
0:37:14 Was it earlier this year?
0:37:15 Mm-hmm.
0:37:16 I, like, lost sense of time.
0:37:17 AI time is so great.
0:37:18 Yeah, I was like,
0:37:18 was it last year?
0:37:19 No, it was this year.
0:37:20 Yeah, when GPOSS came out.
0:37:23 And so, I was just really glad
0:37:24 that we did that.
0:37:25 The way that I generally think about it
0:37:28 is, one, I think as a…
0:37:30 This is also particularly true
0:37:32 for OpenAI because, as you said,
0:37:32 we are a vertical
0:37:33 and a horizontal company.
0:37:35 It’s like, we want to continue
0:37:36 investing in the ecosystem.
0:37:37 And just from a, like,
0:37:37 brand perspective,
0:37:38 I think it’s good.
0:37:39 Yeah.
0:37:39 But then also,
0:37:41 I think from OpenAI’s perspective,
0:37:45 if the AI ecosystem
0:37:46 grows more and more,
0:37:47 it’s, like, a rising tide.
0:37:49 Yeah, it’s all, like,
0:37:50 really helpful for us.
0:37:51 And if we can launch
0:37:52 an open source model
0:37:53 and it helps, like,
0:37:53 unlock a whole bunch
0:37:54 of other use cases
0:37:55 in the other industries,
0:37:56 I think that’s, you know,
0:37:58 that’s actually not good for us.
0:38:01 Also, what people talk about a lot
0:38:02 is, like, how well
0:38:04 these open source AI business models
0:38:05 actually work
0:38:07 because, like, this is very, like,
0:38:08 like, the cannibalization risk
0:38:09 is actually very low.
0:38:10 Yeah.
0:38:12 And, like, you don’t really
0:38:13 enable competitors a lot
0:38:13 because, I mean,
0:38:14 when we say open source,
0:38:16 you really mean open weights, right?
0:38:17 It’s not like they can recreate.
0:38:18 Right, you know?
0:38:19 And, like, if I can distill
0:38:20 your API as well
0:38:21 as I can distill, like,
0:38:21 you giving me the weights
0:38:22 in some way, like,
0:38:22 and so, like,
0:38:23 it doesn’t really change
0:38:24 that dynamic a lot.
0:38:26 But, like…
0:38:26 Yeah, I mean,
0:38:27 to be clear, like,
0:38:28 we have not seen
0:38:29 cannibalization at all
0:38:30 from the open source models.
0:38:32 It’s, like, it seems like
0:38:32 a very different set
0:38:33 of use cases.
0:38:35 The customers tend to be,
0:38:35 like, slightly different.
0:38:37 The use cases are very different.
0:38:38 And, by the way,
0:38:39 it turns out inference
0:38:39 is super hard,
0:38:40 like, to actually have,
0:38:41 like, scalable, fast,
0:38:42 performant.
0:38:44 That’s a hard, hard problem.
0:38:44 Yeah, so, like,
0:38:45 I’d say the way
0:38:46 that I personally think
0:38:46 about open source
0:38:48 in relation to the API business
0:38:49 in particular is,
0:38:49 well, one,
0:38:50 it hasn’t shown
0:38:51 cannibalization risks,
0:38:52 so, you know,
0:38:53 I’m not particularly worried
0:38:53 about that.
0:38:54 But also, like,
0:38:55 especially for all these
0:38:56 major labs, like,
0:38:56 there are usually, like,
0:38:57 two or three models
0:38:58 where, like,
0:38:59 that is where you’re making
0:39:00 all of your impact,
0:39:00 all of your revenue.
0:39:01 Yeah, yeah.
0:39:01 And those are the ones
0:39:02 where we’re throwing
0:39:03 a bunch of resources
0:39:04 into improving the model,
0:39:05 and these tend to be
0:39:06 the larger ones
0:39:06 that are, like,
0:39:07 extremely hard to inference.
0:39:08 Yeah.
0:39:09 We have a really
0:39:09 cracked inference team
0:39:10 at OpenAI,
0:39:12 and my sense
0:39:12 is, like,
0:39:12 even if we just, like,
0:39:13 you know,
0:39:13 open source them,
0:39:15 like, if we just literally
0:39:16 open sourced GPT-5
0:39:16 or something,
0:39:17 it would be really,
0:39:18 really hard to inference it
0:39:19 at the level
0:39:20 that we are able
0:39:21 to get it to do.
0:39:22 There’s also,
0:39:22 by the way,
0:39:23 like,
0:39:23 feedback loop
0:39:24 between the inference team
0:39:24 and, like,
0:39:25 the training team,
0:39:25 too,
0:39:25 so, like,
0:39:26 we can kind of,
0:39:26 like,
0:39:27 optimize all of that.
0:39:28 Is it possible
0:39:29 to verticalize models
0:39:30 for products?
0:39:32 Do you, like,
0:39:32 train models
0:39:33 specifically for products?
0:39:33 Yeah, I mean,
0:39:34 to actually,
0:39:35 yeah.
0:39:36 I think,
0:39:36 I mean,
0:39:37 we’ve kind of done this
0:39:39 with GPT-5 Codex,
0:39:39 right?
0:39:40 Or do you mean,
0:39:40 like,
0:39:41 even more verticalization?
0:39:41 I mean,
0:39:42 like,
0:39:42 deep,
0:39:42 deep,
0:39:43 deep verticalization
0:39:44 where,
0:39:44 like,
0:39:45 you know,
0:39:45 like,
0:39:47 the released model
0:39:48 wouldn’t,
0:39:48 you know,
0:39:49 it’s, like,
0:39:51 actually part of a product.
0:39:52 I think we’re,
0:39:52 like,
0:39:54 basically starting to move
0:39:55 in that direction.
0:39:57 I think there’s a question
0:39:57 of how deeply
0:39:58 you verticalize it.
0:40:00 I think most
0:40:00 of what we’ve done
0:40:01 is mostly at,
0:40:01 like,
0:40:02 the post-training,
0:40:02 like,
0:40:03 the tool use level.
0:40:03 Like,
0:40:04 Codex is particularly
0:40:05 good at using the,
0:40:06 sorry,
0:40:07 GPT-5 Codex is particularly
0:40:07 good at using the
0:40:08 Codex harness.
0:40:10 But there’s,
0:40:10 like,
0:40:11 even deeper verticalization
0:40:12 you can do,
0:40:12 like,
0:40:13 that one I think
0:40:14 is more of an open question.
0:40:14 Yeah,
0:40:14 so,
0:40:15 like,
0:40:15 a lot of my,
0:40:16 I mean,
0:40:16 a lot of my mental model
0:40:17 of this comes from
0:40:18 the pixel space,
0:40:18 which is,
0:40:18 like,
0:40:19 you,
0:40:20 you know,
0:40:23 you can Laura
0:40:25 a bunch of image
0:40:25 models,
0:40:26 right?
0:40:26 And you can,
0:40:27 you can do a bunch
0:40:28 of stuff to make it
0:40:29 better and more suitable
0:40:29 for some products,
0:40:30 for example.
0:40:31 But,
0:40:31 like,
0:40:32 these open source
0:40:34 models are really,
0:40:35 really good.
0:40:36 And,
0:40:36 like,
0:40:37 you would believe
0:40:38 that you could,
0:40:38 like,
0:40:39 verticalize a model
0:40:39 for,
0:40:40 like,
0:40:41 editing or cut and paste
0:40:41 or this or that,
0:40:42 you know,
0:40:42 like,
0:40:43 that’s actually part of this,
0:40:44 but you actually don’t
0:40:44 see that happen.
0:40:46 Yeah.
0:40:47 It’s almost always,
0:40:48 like,
0:40:48 you’re just kind of
0:40:49 exposing,
0:40:49 like,
0:40:50 a model,
0:40:51 not something,
0:40:51 like,
0:40:52 specific to a product.
0:40:52 Yeah,
0:40:52 I think,
0:40:54 I think there’s a distinction
0:40:54 to be made between
0:40:55 the,
0:40:55 like,
0:40:56 the image model space
0:40:57 and the text model space.
0:40:57 Yeah,
0:40:58 probably.
0:40:59 Also because the image models
0:41:00 tend to be way smaller
0:41:00 and,
0:41:00 like,
0:41:01 you can iterate on it
0:41:02 a lot faster.
0:41:02 Like,
0:41:02 that’s why you get that
0:41:03 crazy,
0:41:04 cool proliferation
0:41:04 of,
0:41:04 like,
0:41:05 the image model side.
0:41:06 Whereas,
0:41:06 like,
0:41:07 I don’t know,
0:41:08 for the text models,
0:41:08 there’s always going to be
0:41:08 this,
0:41:08 like,
0:41:09 really big,
0:41:10 that free training step
0:41:10 that,
0:41:10 like,
0:41:11 you have to invest in.
0:41:12 And then even the post
0:41:13 training side is,
0:41:13 like,
0:41:14 you know,
0:41:14 it’s not,
0:41:14 it’s not,
0:41:14 like,
0:41:15 the easiest thing.
0:41:16 Like,
0:41:24 I actually think,
0:41:25 like,
0:41:25 that’s one of the
0:41:26 bigger bottlenecks.
0:41:28 Because I think you’re,
0:41:29 you’re,
0:41:29 you are right that,
0:41:30 like,
0:41:30 on the image side,
0:41:31 yeah,
0:41:31 you can,
0:41:31 like,
0:41:32 fine tune a,
0:41:32 like,
0:41:33 image diffusion model
0:41:33 to be,
0:41:33 like,
0:41:34 extremely good at,
0:41:34 like,
0:41:35 editing faces.
0:41:35 Yeah,
0:41:35 like,
0:41:36 something very specific.
0:41:37 And then you build a product
0:41:38 around that.
0:41:38 And it’s like,
0:41:38 yeah,
0:41:39 you can just kind of
0:41:40 put all these resources
0:41:40 into,
0:41:41 and iterate on that
0:41:42 one specific model.
0:41:42 Whereas it’s much,
0:41:43 it’s a much heavier emotion.
0:41:44 It seems like they’re
0:41:45 on the tech side.
0:41:46 I gotta say,
0:41:48 it is a bit of an anti-pattern
0:41:49 to do both languages,
0:41:50 like,
0:41:51 language-based models
0:41:52 and diffusion,
0:41:52 like,
0:41:53 pixel models
0:41:54 in the same company.
0:41:54 Like,
0:41:56 most that have tried,
0:41:57 like,
0:41:59 it sounded very clunky
0:41:59 to do it,
0:41:59 but,
0:42:00 I mean,
0:42:01 you and Google
0:42:02 are the two kind of
0:42:04 counter examples for this.
0:42:04 And so,
0:42:04 like,
0:42:06 is it possible
0:42:06 to even,
0:42:06 like,
0:42:07 converge the infrastructures
0:42:08 on these things?
0:42:09 Like,
0:42:09 I mean,
0:42:11 is it totally different orgs?
0:42:12 Is it shared infrastructure?
0:42:12 Like,
0:42:14 how do you operationalize?
0:42:14 Yeah.
0:42:15 I think,
0:42:16 I think you’re totally right.
0:42:17 It’s an anti-pattern.
0:42:17 It’s pretty tough
0:42:18 to pull off.
0:42:20 I think,
0:42:20 honestly,
0:42:21 like,
0:42:22 props to Mark
0:42:23 on our research team
0:42:23 for,
0:42:23 like,
0:42:23 you know,
0:42:24 structuring things
0:42:25 in a way
0:42:26 where we’re able to do it.
0:42:27 From my perspective,
0:42:28 I think the biggest thing
0:42:29 is I think our,
0:42:29 like,
0:42:29 image,
0:42:30 like,
0:42:30 our,
0:42:32 I think we call it
0:42:33 the world simulation team,
0:42:33 like,
0:42:34 the team that builds Sora
0:42:35 and all that
0:42:36 under Aditya
0:42:38 is just extremely solid.
0:42:38 Like,
0:42:39 they are probably,
0:42:39 it’s like the highest
0:42:41 concentration of,
0:42:41 like,
0:42:42 talent that I’ve seen
0:42:43 in a while.
0:42:44 But is it the same,
0:42:44 like,
0:42:45 is it the same,
0:42:45 is it like,
0:42:46 are they,
0:42:46 like,
0:42:47 totally separate infrastructure?
0:42:48 Do they use the same infrastructure?
0:42:48 Yeah,
0:42:48 yeah,
0:42:48 yeah.
0:42:49 So it’s,
0:42:49 it’s actually,
0:42:49 like,
0:42:50 pretty separate.
0:42:50 So,
0:42:51 and I think that’s part of the reason
0:42:53 why we’re able to kind of do this.
0:42:53 Well,
0:42:53 it’s like,
0:42:53 one is,
0:42:54 like,
0:42:55 the team needs to be extremely strong,
0:42:55 which they are.
0:42:56 And then two is,
0:42:57 they’re,
0:42:57 they’re,
0:42:58 they’re run very separately.
0:42:59 They’re kind of,
0:42:59 like,
0:43:00 thinking about their own
0:43:01 particular roadmap.
0:43:02 They think about productization
0:43:04 very separately as well,
0:43:04 right?
0:43:05 Which is how,
0:43:05 like,
0:43:06 the Sora app kind of
0:43:08 came out of that as well.
0:43:09 And then,
0:43:10 yeah,
0:43:10 even,
0:43:10 like,
0:43:12 the inference stacks are slightly different,
0:43:12 are kind of,
0:43:12 like,
0:43:13 different.
0:43:14 They,
0:43:16 they own a lot more around their inference stack
0:43:17 and they optimize their inference stack
0:43:17 pretty,
0:43:18 pretty separately.
0:43:19 And so,
0:43:20 I think that,
0:43:21 that contributes to,
0:43:23 to helping us run things in parallel,
0:43:23 but,
0:43:25 it’s pretty hard to pull off for sure.
0:43:26 Maybe,
0:43:28 maybe you can educate this on me.
0:43:28 Like,
0:43:30 so I think about APIs as mostly text-based
0:43:31 for an open AI.
0:43:32 Do you guys do actual,
0:43:33 do you do actual,
0:43:33 pixel-based stuff?
0:43:34 Yeah,
0:43:34 yeah,
0:43:34 we do.
0:43:35 We have a bunch.
0:43:36 So,
0:43:36 Dolly,
0:43:38 Dolly 2 is in the API.
0:43:38 Of course,
0:43:38 yeah.
0:43:40 The OG model.
0:43:42 Dolly 2 is in the API.
0:43:44 That was like the first real text-image model,
0:43:44 right?
0:43:44 Yeah,
0:43:44 yeah,
0:43:45 yeah,
0:43:45 yeah.
0:43:47 That was actually the model that got me
0:43:47 to go to open AI.
0:43:48 No kidding.
0:43:50 Because it was the summer when I was looking for,
0:43:51 I was thinking about something new.
0:43:52 It’s when Dolly 2 came out
0:43:55 and it just completely blew my mind.
0:43:55 Wow.
0:43:56 And I distinctly remember,
0:43:58 I was like asking it to do the simplest thing,
0:43:59 like draw a picture of a duck or something.
0:44:01 And it was like the simplest thing now.
0:44:01 And it just like,
0:44:03 it generated a picture of a,
0:44:03 you know,
0:44:04 like a white duck.
0:44:04 And so,
0:44:07 that was actually the thing
0:44:09 that kind of got me to open AI in the first place.
0:44:09 But yeah,
0:44:11 we have a bunch in our API.
0:44:13 The image gen model,
0:44:14 as well as in our API.
0:44:16 And then Sora 2 is in our API.
0:44:16 We launched it at DevDay.
0:44:17 It’s actually been a huge hit.
0:44:18 I’ve been very,
0:44:19 very surprised.
0:44:21 Need more GPUs for that.
0:44:23 But the amount of use cases.
0:44:24 And then from your standpoint,
0:44:26 like you can converge that,
0:44:28 like the API infrastructure probably like that.
0:44:28 Yeah.
0:44:29 So there’s,
0:44:29 yeah,
0:44:30 I’d say on the API side,
0:44:32 a lot of the infrastructure is shared for those.
0:44:34 But once you reach the inference level,
0:44:34 they’re separate, right?
0:44:36 Because you got to inference them differently.
0:44:37 And it is that team
0:44:39 that has just like been really laser focused
0:44:41 on making that side particularly efficient
0:44:46 and work well separate from the text models.
0:44:46 But yeah,
0:44:48 we have image gen,
0:44:48 we have video gen,
0:44:51 and we’ll continue adding more to the API there.
0:44:54 So it feels like we’ve been evolving
0:44:57 our thinking as an industry
0:44:57 on a bunch of stuff, right?
0:44:58 Like one of them for sure
0:45:00 is like the models like we’ve talked about.
0:45:01 The other one is like context engineering.
0:45:03 It seems to me that like actually
0:45:05 how you build agents and expose them
0:45:06 has evolved too.
0:45:07 So maybe you can talk a bit about that.
0:45:08 Yeah.
0:45:08 Yeah.
0:45:09 I think,
0:45:11 so at DevDay this year
0:45:12 when we launched our agent builder,
0:45:13 I got a bunch of questions around this
0:45:15 because the agent builder is like,
0:45:15 yeah,
0:45:16 it’s like the bunch of different nodes
0:45:18 and it’s like the deterministic thing.
0:45:18 And I was like,
0:45:20 oh, is this really like the future of agents?
0:45:23 And we obviously put a lot of thought into this
0:45:24 when we were thinking about building that product.
0:45:26 But the way I think about it is-
0:45:27 Do you think they came from a point
0:45:28 of being constrained by the way?
0:45:28 They’re like,
0:45:30 oh, this is too constraining and like-
0:45:30 Yeah.
0:45:31 I think people are like,
0:45:31 it’s too constraining.
0:45:33 It’s not like AGI forward.
0:45:33 You know,
0:45:34 like at the end of the,
0:45:34 again,
0:45:35 at the end of the day,
0:45:36 the AGI will do everything.
0:45:36 And so like,
0:45:37 why not?
0:45:39 Why have nodes in this like node builder thing?
0:45:40 Just tell it what to do.
0:45:41 Yeah.
0:45:43 And so I think there’s like two things at play here.
0:45:44 One of them is like,
0:45:46 there is a like practicality component.
0:45:49 And then the other thing is I think there are actually like different types of work
0:45:51 that exist out there that could be automated into agents.
0:45:53 And so on the practicality side is,
0:45:53 yeah,
0:45:54 like the models today,
0:45:56 just like maybe in some future world,
0:45:58 instruction following would be so good
0:46:01 that you just like ask it to do this four-step process.
0:46:03 And it like always does the four-step process exactly.
0:46:05 we’re still not there yet.
0:46:06 And in the meantime,
0:46:06 you know,
0:46:08 this entire industry being born and a lot of,
0:46:08 you know,
0:46:10 people still want to use these models.
0:46:11 Like what,
0:46:12 what can you build for them?
0:46:13 So there’s a practicality component of it.
0:46:13 When,
0:46:14 when,
0:46:15 when did you launch that?
0:46:15 Uh,
0:46:16 dev day.
0:46:18 So it feels like forever ago,
0:46:19 earlier this month,
0:46:20 uh,
0:46:20 October,
0:46:21 uh,
0:46:22 it was like October 6th or something.
0:46:23 Yeah.
0:46:23 Yeah.
0:46:23 Yeah.
0:46:23 Yeah.
0:46:24 So less than a month ago.
0:46:24 Yeah.
0:46:25 Okay.
0:46:26 Um,
0:46:27 uh,
0:46:27 it’s been,
0:46:28 it’s,
0:46:29 it’s been crazy seeing the,
0:46:29 the,
0:46:30 uh,
0:46:30 reception to it,
0:46:31 by the way,
0:46:35 I think the video where Christina on my team demos agent builders,
0:46:37 like one of the most viewed videos on our YouTube channel now.
0:46:38 I will say,
0:46:41 I will say just anecdotally from kind of my perspective,
0:46:42 people love it.
0:46:42 That’s great.
0:46:43 But I also saw the dissonance too.
0:46:44 Like I saw when it came out,
0:46:45 people were like,
0:46:45 wait,
0:46:46 what is this?
0:46:46 Yeah,
0:46:46 exactly.
0:46:47 No code,
0:46:47 low code.
0:46:47 Yeah,
0:46:48 exactly.
0:46:49 It’s another low code thing.
0:46:51 And now people love it.
0:46:51 Yeah.
0:46:51 Yeah.
0:46:51 Yeah.
0:46:52 So there’s a practicality piece.
0:46:53 There’s another piece,
0:46:55 which is like when we were talking to our customers,
0:46:56 we’ve realized that there’s like,
0:46:58 because at the end of the day,
0:46:58 a lot of this,
0:46:58 um,
0:47:02 the agent work is just trying to automate work and like what people do in
0:47:02 their day-to-day jobs.
0:47:05 I realized there’s like actually like two different types of work.
0:47:06 There’s the work that we think about,
0:47:08 which is like maybe what like software engineers do,
0:47:08 which is like,
0:47:09 it’s very undirected.
0:47:10 There’s like a high level goal.
0:47:12 And then you have like,
0:47:12 you know,
0:47:14 you have your cursor and you’re just like writing,
0:47:15 writing code.
0:47:15 And,
0:47:16 and you,
0:47:18 you’re kind of like exploring things and going towards an objective.
0:47:19 That’s like,
0:47:19 I don’t know,
0:47:20 more like knowledge-based work,
0:47:21 like data analysis,
0:47:22 maybe like that,
0:47:23 like codings kind of like this.
0:47:23 Yeah.
0:47:24 Um,
0:47:25 but then there’s another type of work,
0:47:28 which is actually what we realized is like maybe even more prevalent in
0:47:29 industry than,
0:47:29 than,
0:47:29 than,
0:47:30 than software.
0:47:30 We’re just,
0:47:31 we’re just not aware of it,
0:47:33 which is work tends to be very procedural,
0:47:35 very like SOP oriented,
0:47:37 like customer support is a good example of this.
0:47:37 Like customer support,
0:47:40 there’s like very clear policy that these agents and people have to
0:47:40 follow.
0:47:40 Yeah.
0:47:41 And,
0:47:42 uh,
0:47:44 it is actually not great for them to deviate from this and like try
0:47:45 something else.
0:47:46 It’s like the,
0:47:46 the team really,
0:47:46 the,
0:47:47 the,
0:47:49 the people running these teams just really want the,
0:47:50 these SOPs to be followed.
0:47:50 Yeah.
0:47:50 Uh,
0:47:53 and this pattern actually generalizes a ton of different work.
0:47:55 A standard operating procedure.
0:47:55 Yeah.
0:47:55 Sorry.
0:47:57 So it’s just like,
0:47:57 uh,
0:47:58 the way in which,
0:47:58 uh,
0:47:59 you need to operate the,
0:47:59 the,
0:48:00 um,
0:48:00 the,
0:48:01 um,
0:48:01 the support team,
0:48:03 but like this extends to like marketing,
0:48:04 this extends to like sales,
0:48:05 extends to like a bunch,
0:48:06 way more than it has any right to.
0:48:07 Yeah.
0:48:08 And what we realized is like,
0:48:11 there’s a huge need on that side to have determinism here.
0:48:11 Yeah.
0:48:15 Of which an agent builder with nodes that kind of like helps enforce this
0:48:16 thing ends up being very,
0:48:16 very helpful.
0:48:17 But I think a lot of us,
0:48:18 especially in Silicon Valley,
0:48:19 don’t really appreciate that.
0:48:21 There’s like a ton of work that actually falls into this camp.
0:48:22 I got to say like,
0:48:24 there’s a pattern that’s similar to this.
0:48:25 I’m wondering if you’ve seen it that I’ve seen where,
0:48:26 um,
0:48:30 some regulated industries actually can’t let any generated content go to a user.
0:48:30 Yeah.
0:48:30 Right.
0:48:31 Yep.
0:48:31 Yep.
0:48:33 And so what they do is think it’s so interesting.
0:48:35 They’ll like either pass in like,
0:48:36 like,
0:48:36 uh,
0:48:37 uh,
0:48:37 uh,
0:48:40 a conversation tree and like you can choose something from here.
0:48:41 Yeah.
0:48:42 So there’s some human elements,
0:48:42 uh,
0:48:43 to it.
0:48:43 So,
0:48:44 so it was part of the prompt.
0:48:45 They’re like,
0:48:47 here are the viable things you can say,
0:48:48 choose which one to say.
0:48:50 So the language reasoning has happened by the model,
0:48:51 but nothing generated comes out.
0:48:51 interesting.
0:48:52 Interesting.
0:48:52 Does that make sense?
0:48:52 Yeah.
0:48:53 Yeah.
0:48:53 Yeah.
0:48:53 Yeah.
0:48:53 Yeah.
0:48:55 And then another one I’ve seen is like actual pseudocodes.
0:48:57 I’ll pass in like a Python function.
0:48:59 And then it’ll ask a human to like,
0:48:59 right,
0:49:00 like,
0:49:00 uh,
0:49:02 use the pseudocode to write actual code that,
0:49:03 that makes it in or.
0:49:03 The,
0:49:06 it actually has a response catalog as part of it.
0:49:08 And it has like the logic to apply.
0:49:09 And then.
0:49:10 Interesting.
0:49:13 And so like the model takes the language in from the,
0:49:15 takes the language in from the human user.
0:49:17 And then we’ll like,
0:49:18 you know,
0:49:20 the logic of how to respond is I can Python code.
0:49:22 Cause it just turns out that like,
0:49:24 there’s been a lot of code written for these types of things.
0:49:27 And then it’ll actually includes the responses that you would send out.
0:49:28 Does that make sense?
0:49:28 Actually,
0:49:30 a lot of NPCs are done this way.
0:49:31 Like actually video game NPCs.
0:49:31 So,
0:49:32 yeah.
0:49:32 So,
0:49:32 cause,
0:49:34 cause the way that I think about it is like,
0:49:35 you know.
0:49:35 So,
0:49:36 so that way the,
0:49:37 with the NPCs,
0:49:37 it’s the,
0:49:38 the,
0:49:41 the actual code being generated by the model is not what ends up making it to the,
0:49:41 to the end user.
0:49:42 Just to the.
0:49:43 That’s it.
0:49:43 It’s not the,
0:49:45 the code is not being generated by the model.
0:49:46 It’s the prompt has the code.
0:49:47 So like,
0:49:48 so let’s say,
0:49:50 let’s say that I have an NPC and I want the NPC,
0:49:51 like,
0:49:52 let’s say you’re,
0:49:52 you’re,
0:49:52 you’re,
0:49:52 you’re the gamer.
0:49:54 And so you’re,
0:49:55 you’re coming in and you’re talking to my NPC,
0:49:58 but my NPC has some logic that it needs to do.
0:49:59 Like if you say a certain thing,
0:50:01 I’ll give you a key or maybe a little barter.
0:50:01 Yeah.
0:50:04 Like describing the game logic in English just doesn’t work.
0:50:04 Yeah.
0:50:05 Actually,
0:50:06 if you try and do it and then,
0:50:09 and then like actually scripting the output doesn’t work either.
0:50:11 If you needed to use it in a game context,
0:50:12 like you would have to know,
0:50:15 like give like a specific direction or a specific this or that.
0:50:15 Yeah.
0:50:15 Yeah.
0:50:18 So how do you make these things behave in a more constrained way?
0:50:21 People pass in functions.
0:50:24 Like they’ll like to describe the logic in Python.
0:50:26 So like my prompt will be like,
0:50:28 you’re an NPC in a video game.
0:50:29 The user just asked you a question.
0:50:31 Here’s the logic you should go through.
0:50:32 If the user says this,
0:50:33 then do this.
0:50:34 It’s like the pseudocode.
0:50:36 Like if the user has this,
0:50:36 you know,
0:50:37 in the belt,
0:50:37 do this,
0:50:38 like whatever,
0:50:38 whatever,
0:50:38 whatever.
0:50:40 And then here are the set of valid responses.
0:50:42 And so you’re almost constraining.
0:50:43 I see.
0:50:43 I see.
0:50:45 And then when it actually does do a response,
0:50:45 you can,
0:50:47 you can validate that it’s one of those responses.
0:50:47 I see.
0:50:49 It’s like highly structured.
0:50:49 Yeah.
0:50:51 So the NPC still only exists in that,
0:50:52 like the space that it can act in,
0:50:54 is still only within the space of the program that you gave.
0:50:55 Yeah.
0:50:56 Well,
0:50:57 the logic is in there.
0:50:59 So it can have a normal conversation,
0:51:02 but like in as much as you’re trying to guide the logic for like,
0:51:02 like,
0:51:04 like game design or game logic.
0:51:05 And so like,
0:51:06 so you see this with NPCs,
0:51:08 but you also see this with regulated industries for like,
0:51:09 I literally can’t have it.
0:51:09 Yeah.
0:51:11 I was going to say what you described kind of sounds like,
0:51:11 you know,
0:51:12 giving the,
0:51:15 the SOPs to like your set of human operators to like,
0:51:15 yeah,
0:51:15 yeah,
0:51:15 yeah,
0:51:15 yeah.
0:51:16 To stick to it,
0:51:16 please.
0:51:17 Yeah.
0:51:19 You must say these three things and here’s like the discussion.
0:51:20 And like,
0:51:21 you cannot give a refund if it’s like less than this amount.
0:51:22 Yeah.
0:51:22 Yeah.
0:51:22 Yeah.
0:51:22 Yeah.
0:51:23 Very interesting.
0:51:23 Yeah.
0:51:23 Yeah.
0:51:23 Yeah.
0:51:23 I mean,
0:51:24 I mean,
0:51:24 yeah,
0:51:25 I don’t want to equate them to NPCs,
0:51:28 but like this is similar to similar.
0:51:30 I’m just saying it’s actually like,
0:51:30 if you want,
0:51:33 if you want to really guarantee what happens,
0:51:34 you have,
0:51:35 there’s like a set of techniques that you do.
0:51:36 And like,
0:51:38 there’s some situations where you want to constrain what they do.
0:51:40 It could be from a regulatory standpoint.
0:51:42 It could be because you want it to run for a long time.
0:51:46 And it also could be because I actually have game logic and my game logic is a traditional program.
0:51:48 Like I have like a monetary system.
0:51:49 I have an item system.
0:51:50 I have a battle system.
0:51:52 Like you can’t describe that in English.
0:51:54 Like you have to kind of give it to them so it can behave within that.
0:51:55 And that is,
0:51:57 that is exactly the problem I think we’re trying to solve here.
0:51:57 Right.
0:51:58 That’s so awesome.
0:51:59 If you do not give it any of this,
0:52:00 like it can just kind of go off and do,
0:52:01 do whatever.
0:52:03 And yet there are like regular regulatory concerns around this.
0:52:04 Yeah.
0:52:07 And that is the exact use case that I think we’re trying to target with the Asian builder.
0:52:08 That’s awesome.
0:52:08 Well,
0:52:09 listen,
0:52:11 we’re running out of time and there’s a million more things I want to ask you,
0:52:13 but I really appreciate your time to come in.
0:52:15 It was a great kind of survey and like what’s going on.
0:52:20 And particularly like teasing apart horizontal versus vertical in this page.
0:52:20 Yeah.
0:52:21 Which I really want to do.
0:52:22 So thank you so much.
0:52:22 Yeah.
0:52:22 Thank you.
0:52:28 Thanks for listening to this episode of the A16Z podcast.
0:52:29 If you liked this episode,
0:52:31 be sure to like comment,
0:52:31 subscribe,
0:52:35 leave us a rating or review and share it with your friends and family.
0:52:37 For more episodes,
0:52:37 go to YouTube,
0:52:38 Apple podcasts,
0:52:39 and Spotify.
0:52:46 follow us on X at A16Z and subscribe to our sub stack at A16Z.substack.com.
0:52:47 Thanks again for listening.
0:52:48 And I’ll see you in the next episode.
0:52:50 As a reminder,
0:52:54 the content here is for informational purposes only should not be taken as legal
0:52:56 business tax or investment advice,
0:53:00 or be used to evaluate any investment or security and is not directed at any
0:53:03 investors or potential investors in any A16Z fund.
0:53:07 Please note that A16Z and its affiliates may also maintain investments in the
0:53:08 companies discussed in this podcast.
0:53:10 For more details,
0:53:11 including a link to our investments,
0:53:15 please see a16z.com forward slash disclosures.

In this episode, a16z GP Martin Casado sits down with Sherwin Wu, Head of Engineering for the OpenAI Platform, to break down how OpenAI organizes its platform across models, pricing, and infrastructure, and how it is shifting from a single general-purpose model to a portfolio of specialized systems, custom fine-tuning options, and node-based agent workflows.

They get into why developers tend to stick with a trusted model family, what builds that trust, and why the industry moved past the idea of one model that can do everything. Sherwin also explains the evolution from prompt engineering to context design and how companies use OpenAI’s fine-tuning and RFT APIs to shape model behavior with their own data.

Highlights from the conversation include:

• How OpenAI balances a horizontal API platform with vertical products like ChatGPT
• The evolution from Codex to the Composer model
• Why usage-based pricing works and where outcome-based pricing breaks
• What the Harmonic Labs and Rockset acquisitions added to OpenAI’s agent work
• Why the new agent builder is deterministic, node based, and not free roaming

Resources:

Follow Sherwin on X: https://x.com/sherwinwu

Follow Martin on X: https://x.com/martin_casado

Stay Updated:

If you enjoyed this episode, be sure to like, subscribe, and share with your friends!

Find a16z on X: https://x.com/a16z

Find a16z on LinkedIn: https://www.linkedin.com/company/a16z

Listen to the a16z Podcast on Spotify: https://open.spotify.com/show/5bC65RDvs3oxnLyqqvkUYX

Listen to the a16z Podcast on Apple Podcasts: https://podcasts.apple.com/us/podcast/a16z-podcast/id842818711

Follow our host: https://x.com/eriktorenberg

Please note that the content here is for informational purposes only; should NOT be taken as legal, business, tax, or investment advice or be used to evaluate any investment or security; and is not directed at any investors or potential investors in any a16z fund. a16z and its affiliates may maintain investments in the companies discussed. For more details please see http://a16z.com/disclosures

Stay Updated:

Find a16z on X

Find a16z on LinkedIn

Listen to the a16z Podcast on Spotify

Listen to the a16z Podcast on Apple Podcasts

Follow our host: https://twitter.com/eriktorenberg

Hosted by Simplecast, an AdsWizz company. See pcm.adswizz.com for information about our collection and use of personal data for advertising.

How OpenAI Builds for 800 Million Weekly Users: Model Specialization and Fine-Tuning

Leave a Reply Cancel reply