GPT-5 Breakdown – w/ OpenAI Researchers Isa Fulford & Christina Kim

AI transcript

🕒

Việt

中文

0:00:02 I mean, I think it’s pretty unique at OpenAI
0:00:05 to be able to work on something that’s so generally useful.
0:00:07 I mean, it’s like everything they tell you not to do at a startup
0:00:09 is just, like, your user is anyone.
0:00:13 You just kind of take it for granted that you literally have this, like, wizard in your pocket.
0:00:16 We’re trying to make the most capable thing,
0:00:19 and we’re also trying to make it useful to as many people as possible
0:00:20 and accessible to as many people as possible.
0:00:24 I think we hear this with GPT-5 internally when people are testing it.
0:00:26 They’re like, “Oh, I thought I asked, like, a really hard question.”
0:00:29 I feel, like, a little bit insulted that it got for, like, two seconds.
0:00:31 Or, like, when it doesn’t even want to think at all.
0:00:35 Today’s episode was recorded the day GPT-5 launched.
0:00:39 A major milestone not just for OpenAI, but for the entire AI ecosystem.
0:00:42 Joining me in the studio, fresh off the launch livestream,
0:00:45 were three people who were instrumental in making this model a reality.
0:00:48 Christina Kim, researcher at OpenAI,
0:00:50 who leads the Core Models team on post-training.
0:00:53 Isa Fulford, researcher at OpenAI,
0:00:57 who leads Deep Research and the ChatGPT agent team on post-training.
0:00:59 And A16Z general partner, Sarah Wang,
0:01:02 who’s helped lead our investment in OpenAI since 2021.
0:01:04 We talk about what’s new in GPT-5,
0:01:07 from major leaps in coding and creative writing
0:01:09 to meaningful improvements in reasoning, behavior, and trust.
0:01:12 We also get into training, RL environments,
0:01:14 and why data quality is more important than ever.
0:01:17 We also cover agents, what that word actually means,
0:01:19 the paradigm shift for async workflows,
0:01:21 and the golden age for the idea guys.
0:01:22 Let’s get into it.
0:01:23 Let’s get into it.
0:01:29 As a reminder, the content here is for informational purposes only,
0:01:32 should not be taken as legal, business, tax, or investment advice,
0:01:35 or be used to evaluate any investment or security,
0:01:39 and is not directed at any investors or potential investors in any A16Z fund.
0:01:42 Please note that A16Z and its affiliates
0:01:45 may also maintain investments in the companies discussed in this podcast.
0:01:48 For more details, including a link to our investments,
0:01:53 please see A16Z.com forward slash disclosures.
0:01:59 So slow news day, not much going on for you guys.
0:02:01 Thank you for, thank you for coming on.
0:02:04 I know, obviously, you know, Tina, you were just on the, on the live stream.
0:02:07 We’re recording day of, congratulations.
0:02:08 Thank you.
0:02:12 For those who are unfamiliar, why don’t you introduce what you guys do at OpenAI?
0:02:13 Yeah, I’m Christina.
0:02:16 I lead the core models team on post-training.
0:02:17 I’m Isa.
0:02:21 I lead the deep research chat GPT agent team on post-training.
0:02:24 And Tina, you’ve been here for, or you’ve both been here for a while now.
0:02:27 Do you know what you want to give a little bit of your history at the company?
0:02:30 Yeah, I’ve been at OpenAI for about four years now.
0:02:33 I originally worked on WebGPT, which was the original,
0:02:37 first LLM using tool use, but it was just one question.
0:02:39 And so the model learned how to use the browser tool,
0:02:41 but you only ask one question, you got to answer back.
0:02:43 And then we kind of just had this realization like,
0:02:46 oh, you normally, when you have questions, you have more questions after that.
0:02:49 And so we started building this chat bot.
0:02:52 And then that’s what eventually became chat GPT.
0:02:55 And what have been the reactions so far?
0:02:57 You know, but it’s only been a few hours, but in your live stream,
0:03:01 like what are any reflections, any, what can you, what can you tell us the day of?
0:03:03 I’m honestly really excited.
0:03:07 I think that obviously we have some great eval numbers and numbers are always really exciting.
0:03:09 but I think the thing I’m like really excited about this model is just,
0:03:11 it’s way more useful, like in cross,
0:03:14 like all the things that people actually use chat for.
0:03:17 And it’s not just like, and it’s, I think the eval numbers look good,
0:03:20 but then also like the way when people use it, I think we’ll,
0:03:24 they’ll notice a quite a bit, big of a difference when the utility of it.
0:03:25 I mean, this is my personal use cases.
0:03:27 I use it for coding and writing all the time.
0:03:29 And it’s just a huge stuff change.
0:03:30 Yeah.
0:03:34 Sarah, you’ve been involved in the, in helping lead our investment since, since 2021.
0:03:37 What, when do you, uh, either share more or tee up how you,
0:03:40 how you’ve been thinking about, uh, sort of this as it relates to, to coding or more broadly.
0:03:43 Yeah. Well, well actually just on the topic of coding,
0:03:47 it was a huge deal to have Michael Traul come on there and, um,
0:03:50 not only showcase the, uh, the capabilities, but also say,
0:03:53 this is the best coding mark, uh, model in the market.
0:03:56 Um, and so just curious to the extent that you can share,
0:03:59 what did you do differently to get these results?
0:04:03 Yeah, I think huge shout out to the team, um, especially Michelle Pokris.
0:04:06 Like me, I think to get these things right.
0:04:09 And like eval numbers is one thing, like I said,
0:04:12 but to get the actual usability and like how great it is at coding,
0:04:14 I think it just takes a lot of detail and care.
0:04:18 Um, I think the team put a lot of effort into data sets
0:04:20 and thinking about the reward models for this.
0:04:24 Um, but I think it’s just literally just caring so much about getting coding working well.
0:04:28 And, and maybe actually just to double click on front end web development,
0:04:32 I mean, we’ve seen as sort of investors in the ecosystem,
0:04:35 that’s obviously taken off in the last six to eight months.
0:04:41 Um, if you could pinpoint, uh, the improvement to that piece specifically,
0:04:43 is it around, is it more around aesthetics?
0:04:46 Um, or is there sort of another capability, um,
0:04:50 leap forward in terms of what we can do with front end, um, web development?
0:04:53 I think there’s gonna be a lot more we can do with front end.
0:04:55 I think the way we’ve gotten this big leap, I mean,
0:04:57 if you compare it to O3’s front end coding capability,
0:04:59 this is just totally next level.
0:05:00 Yeah, totally.
0:05:01 It feels very different.
0:05:03 And I think it kind of just goes back to what I was saying.
0:05:05 The team just really cared about like nailing front end.
0:05:07 Um, and that means like getting the best data,
0:05:10 like thinking about the aesthetics of the model and all of these things.
0:05:14 Um, I think it’s just all those details that are really coming together
0:05:16 and making the model like great at front end.
0:05:17 Really exciting to see.
0:05:20 Loved, loved the demos in the, in the live stream too.
0:05:22 I wanted to, uh, ask about model behaviors.
0:05:24 Cause I know you, you worked on that too.
0:05:27 Um, but how did you guys think about that for GBT5?
0:05:30 And there are a lot of things that, you know, um,
0:05:33 we’ve talked about in prior models of like syncophancy
0:05:34 and characteristics like that.
0:05:36 Um, how did you guys think about for this?
0:05:38 What did you guys change or tweak?
0:05:39 Yeah.
0:05:42 The design of this model has been very, very intentional for model behavior,
0:05:45 especially with the syncophancy issues that we had like a few months ago with 4.0.
0:05:48 Um, and we’ve just spent a lot of time thinking about like, yeah,
0:05:49 what is the ideal behavior?
0:05:52 Um, and I think for post training, what’s really,
0:05:56 or one of the reasons I really like post training is it feels more like an art
0:05:57 than maybe even like other areas of research.
0:05:59 Cause you kind of have to make all these trade-offs, right?
0:06:03 Like you have to think about like for my rewards, like all these different rewards,
0:06:04 I could be optimizing during the run.
0:06:06 Like how do like, how does that trade off against it?
0:06:07 Right?
0:06:11 I want the assistant to be like super like helpful and engaging, but maybe that’s
0:06:16 like a bit too engaging and getting too engaging gets to the overly effusive like assistant
0:06:17 that we have.
0:06:20 Um, so I think it’s really like a balancing act of trying to figure out like,
0:06:23 what are like the characteristics and like, what do we want this model to actually feel
0:06:24 like?
0:06:28 And I think we were really excited with GPT-5 because it’s kind of a time to like reset
0:06:33 and rethink about, um, especially since it’s so easy to make something, I think very engaging
0:06:36 in the sense that in an unhealthy way, how can we make this like a very healthy,
0:06:37 helpful assistant.
0:06:42 Say more about how you received such a kind of reduction in hallucinations, but also,
0:06:43 also deception.
0:06:45 What’s the relation between those?
0:06:48 I guess I, for me, I find hallucinations, deceptions like pretty related.
0:06:51 So the model, um, and we kind of saw this a lot with the reasoning models.
0:06:55 Like the reasoning model would understand that it didn’t have some ability, but then it still
0:06:56 really wanted to respond.
0:06:59 I think we really baked it into the models that they want to be helpful.
0:07:02 And so they’re like, whatever I can say to be helpful in that moment.
0:07:05 Um, and that’s kind of what we consider for like deception versus hallucinations.
0:07:10 Sometimes the model, like literally, uh, it seems that they will just say something quickly.
0:07:14 Um, and we kind of see a lot of this reduction with the thinking with, when the models are
0:07:19 able to stick step by step, they actually can like pause and before blurting out an answer
0:07:22 is kind of what I, it feels like with a lot of the previous models for hallucinations.
0:07:26 Over the next few weeks, as you’re evaluating usage, what are the biggest questions that
0:07:30 you’re having or that you’re sort of anticipating, uh, being potentially answered?
0:07:34 I’m just really curious to see how all of these things, um, reflect in usage, right?
0:07:36 Like I think coding is way, way better.
0:07:38 Like what does this actually unlock for people?
0:07:41 And I think we’re really excited to be offering these models at the price points that we have.
0:07:45 Cause I think this actually like unlocks like a lot more use cases that really weren’t
0:07:46 there before.
0:07:51 Maybe like previous competitor models were, are good at coding, but the price point is not as exciting.
0:07:55 And so I think with this number of capabilities that we have in this model and the price point,
0:08:00 I’m kind of excited to see like all the new startups and like developers, like doing things on top of it.
0:08:02 Yeah, we’re excited too.
0:08:08 But by the way, just on the topic of usage, um, you obviously have a lot of products with a ton of usage already.
0:08:16 And since we have one of the, uh, deep research gurus here too, um, how did deep research, ChatGPT operator,
0:08:22 sort of your existing products inform how you went about approaching GPT-5?
0:08:30 One thing that’s interesting is with reinforcement learning, um, training a model to be good at a specific capability is very data efficient.
0:08:33 You don’t need that many examples to teach it something new.
0:08:40 And so the way that we think about it on my team is we’re trying to push capabilities and things that are like useful to people.
0:08:46 So like deep research, it was the first model to do like very comprehensive browsing.
0:08:50 But then when O3 came out, it was also good at comprehensive browsing.
0:09:00 And that’s because we’re able to, um, take the datasets that we’ve created for, um, the, you know, frontier agent models and then contribute it back to the, um, frontier reasoning models.
0:09:06 So we always want to make sure that the capabilities that, um, we’re pushing with agents makes it into, into their flagship models as well.
0:09:08 Yeah, that’s great. Very self-reinforcing.
0:09:18 Uh, you mentioned all the, the startups that you’re excited to see come as, like flush out what you think that, that could look like, or even just high level some opportunities you’re, you’re more excited about because of this.
0:09:20 I mean, like people always say vibe coding.
0:09:25 I think basically like non-technical people like have such a powerful tool at their hands.
0:09:27 I think really you just need some good idea.
0:09:30 And like, you’re not going to be limited by the fact that like, you don’t know how to code something.
0:09:33 Like you saw two of our demos, which were front end coding or in the beginning.
0:09:35 And that’s just literally took minutes.
0:09:39 I’d literally, I think that would have honestly taken me like a week to actually build like fully interactive.
0:09:47 Um, and so I think we’re just going to have a lot more, I would expect like maybe a lot more like indie type of like businesses built around this because of the fact that like, you just need to have the idea.
0:09:51 Write a simple prompt and then you get the full fledged app.
0:09:52 It’s the world of the ideas guy.
0:09:53 Yeah.
0:09:54 It’s our time.
0:09:55 I think so.
0:09:56 Finally.
0:09:57 Yeah.
0:10:00 Um, how about in the, in the broader sort of, uh, AGI discourse?
0:10:04 Like what does this, um, what, what does this mean or accelerator or not?
0:10:12 Or like, how do we think about sort of the broader, um, AI discourse in terms of what does GPT-5 mean here or change the conversation in any sort of way?
0:10:19 I think that GPT-5, um, because it’s like a new, it’s obviously state of the art and like all the things we talked about.
0:10:23 Um, but I think if you’re showing that like, you know, we can continue pushing the frontier here.
0:10:26 And I feel like there’s always people like, Oh, we’re hurting a wall.
0:10:27 Like things aren’t actually improving.
0:10:37 Um, and I think the interesting thing is I feel like we’ve almost saturated a lot of these evals and the real like metric of like how good our models are getting is I think can be like usage, right?
0:10:45 Like who, what are the new use cases that are being unlocked and like what, how, like how many more people are using this in their daily lives to help them like across multiple tasks.
0:10:51 So I feel like that’s actually like the ultimate usage in terms like that I’m excited about for terms of like, are we getting to AGI?
0:10:59 Yeah, actually, I think Greg made this comment about how he was comparing the last model to this model and the benchmark went from 98 to 99.
0:11:02 He’s like, clearly we’ve saturated the benchmarks.
0:11:05 Um, at least on that, that front which I think is instruction following.
0:11:08 Um, what benchmarks do you pay attention to?
0:11:10 Like how do you guys think about evals, right?
0:11:15 Cause given you’re already saturating what’s out there, um, to a large extent or, or doing very well along those dimensions.
0:11:18 Um, what actually gets you to push the frontier?
0:11:27 Is that, um, before them, I mean, so usage would be kind of post the model release, but before you get there, what are you guys looking to internally to help guide you?
0:11:29 Is it a lot of internal evals that you created?
0:11:33 Um, you know, is it early access to startups seeing what they think?
0:11:36 Maybe it’s a combo of all the above, but how do you weigh all those things?
0:11:42 Yeah, I mean, I think on, on our team, we really work backwards from the capabilities we want the models to have.
0:11:50 So maybe we want it to be good at creating slide decks or something or, um, spread good at editing spreadsheets.
0:12:00 And then if evals for those things don’t exist, we try to make evals that are representative measures of that capability in a way that’s actually going to be useful for users.
0:12:04 Um, and then we’ll, um, a lot of those are internal.
0:12:11 We’ll collect them maybe from human experts or, um, you know, try and synthetically create examples or we’ll actually look at usage data.
0:12:14 Um, and then for us, we’ll just try and hill climb on those.
0:12:16 Um, and yeah.
0:12:17 Yeah.
0:12:22 I think we make this joke a lot internally that like, if you want to nerd-side someone into working on something, you just need to make a good eval.
0:12:26 And then people are going to be so happy to try to hill climb that.
0:12:27 Yeah.
0:12:30 I like what you said about starting with the capabilities first.
0:12:34 How do you prioritize which you actually are shooting for?
0:12:41 Let’s say there’s this dimension of maybe deeper into everyday use versus getting much deeper into the expert use cases.
0:12:42 Mm-hmm.
0:12:43 How do you think about that trade-off?
0:12:45 What does that trade-off mean practically speaking?
0:12:47 And what do you guys prioritize when?
0:12:53 I mean, I think it’s pretty unique at OpenAI to be able to work on something that’s so generally useful.
0:12:58 I mean, it’s like everything they tell you not to do at a startup is just like your user is anyone.
0:13:04 Like for deep research, we wanted it to be good across like every single domain someone might want to do research in.
0:13:11 And I think you only have the privilege of doing that if you work at a company that has like huge distribution and like all different kinds of users.
0:13:25 So, yeah, I mean, I think if you choose a capability that’s quite general, like online research, you just have to make sure that you represent like a distribution of tasks across loads of different domains if you want to get good at all of them.
0:13:35 But then, yeah, sometimes it’s hard to decide to focus on one specific thing because there are just so many different verticals that you could choose from.
0:13:38 But I think in some cases, maybe like coding will be really important.
0:13:41 So then, you know, a specific team will focus on coding.
0:13:53 But I think in general, because the capabilities are so general, usually like the next model improvement just kind of improves performance on a pretty broad range.
0:14:01 Yeah, I think we’ve kind of seen this like with the progression of even the models that we’ve had in ChatGPT, like as the model gets smarter, it’s better at instruction following.
0:14:06 It’s better at tool use and like just more things get unlocked as we just continue to make smarter models.
0:14:16 So I think like a good chunk of our team also like does focus on just getting general intelligence up because I think the wins that we get from there are like Isa saying like pretty great.
0:14:20 Whenever we get a new base model, it’s just saying like, oh, wow, suddenly this clicks. It works.
0:14:25 And I think we kind of saw that moment with like operator because we had been working on computer usage.
0:14:31 But I think it was hard to finally get the model to actually without like the multimodal capabilities to really support it.
0:14:33 Like you couldn’t have something like operator when it launched.
0:14:40 Yeah, it’s the same thing with everyone was talking about agents, but we didn’t really have a way of actually training useful agents.
0:14:46 I mean, I think everyone was talking about all these agent demos, but nothing that actually really works.
0:14:58 But I think when we saw the reinforcement learning algorithm working really well on math and physics problems and coding problems, it became pretty clear, like just from reading through the chain of thought, like, okay, this thing’s actually like thinking and reasoning and backtracking.
0:15:04 And to build something that’s able to like navigate the real world, it also needs to have that ability.
0:15:09 So we realized, okay, like this is the thing that’s going to actually let us get to useful agents.
0:15:18 And so I think it’s interesting at OpenAI because you have people pushing like, you know, foundational algorithms, getting really good at math, getting a gold medal in the IMO.
0:15:26 And then on post training, we’ll often take like those methods and try and figure out how to make things that are most useful and like usable to all of our users.
0:15:32 How much of the improvements are coming from the architecture versus the data versus the scale?
0:15:34 Like where, how do you sort of think about that?
0:15:36 My opinion, I’m very data-pilled.
0:15:37 Like I think data is very important.
0:15:48 And I think like, I think deep research was so good because Isa put so much thought and like careful attention to like the data curation that they did and thinking about all the different use cases she wanted to have represented.
0:15:51 So I’m on team data.
0:15:52 Yeah.
0:16:03 I mean, I think all are very important, but especially like, especially now that we have such an efficient way of learning, data is even, high quality data is even, even more important.
0:16:09 Maybe on the data topic, we’ve been talking a lot about RL environments.
0:16:13 Um, it’s a popular space for startups who all want to work with you guys.
0:16:18 Um, and I, I was curious just to get your thoughts on this since you’ve been data, or you’re data-pilled.
0:16:22 Um, but what are the bottlenecks that you see for the next stage?
0:16:43 Is that, I mean, maybe tying it to RL environments, um, is there sort of a lack of good, realistic RL environments that that’s sort of the next frontier, which maybe creates an opportunity for these startups, um, that once you, you know, sort of are able to really work within a, um, environment that takes a long time to build.
0:16:53 These are not, you know, sort of built in, um, in a day or two that you can actually automate labor, um, to the full extent of like compute, you know, the way that you would need computer use to do.
0:17:01 Yeah, I think in my, in my opinion, I do think, um, there’s a lot of value in getting really good tasks and getting really good tasks requires really good RL environments.
0:17:08 Um, I think the more complicated, but the more realistic, the more simulated we can make them, I think the better, um, we’ll get.
0:17:14 And I think we’re kind of seeing that like tasks matter, just like tasks matter more at this point, given the fact that we have such a strong algorithm.
0:17:22 Um, so I think the data, creating data and figuring out like the best tasks to train on is like the, one of the big questions we have.
0:17:30 Yeah, like there’s some generalization from training on like one website to another, but if you want to get really, really good at something, the best thing to do is just like train on that exact thing.
0:17:31 Right.
0:17:39 So, um, yeah, I think we’re definitely just constrained by how like things that we can represent in a way that we can train on.
0:17:42 Like the chatGBT agent, for example, has such a general tool.
0:17:50 Um, it has a browser and a terminal and between those two things, you can basically do most of the tasks that a human does on a computer.
0:17:54 So in theory, you can ask it to do anything that you can do on your computer.
0:17:56 It’s obviously not good enough to do that yet.
0:18:00 But with, with the tools that has in theory, you can push it really, really fast.
0:18:07 So now we just have to like, make it really good at all those things, uh, by, you know, training on training on way more things.
0:18:08 Yeah.
0:18:10 Let’s talk about, uh, creative writing.
0:18:12 Maybe you talk about the improvements there, how you think about it.
0:18:15 That’s one of my favorite improvements in GBT5.
0:18:22 Um, the writing, I honestly find it’s very tender and touching, especially for a lot of the creative writing that we want to do.
0:18:26 Um, we were thinking through like a bunch of different samples for the live stream.
0:18:30 And like, every time I was like, oh, that’s like actually like, that like hits.
0:18:33 Like, it’s like, it’s like, it’s like spooky.
0:18:37 And I’m just like, oh, this feels like someone, like someone should have written this.
0:18:39 Um, but I think it’s really cool.
0:18:43 Cause you can actually really use it for, um, like helping you with things.
0:18:48 Like, like I, like my example that I did in the live stream was like writing, helping me write the eulogy, something that like, that’s like kind of hard to write.
0:18:51 Especially since writing isn’t really something a lot of people are good at.
0:18:53 Like I’m personally a very, very bad writer.
0:18:54 That’s not true.
0:18:57 I think it’s.
0:18:58 But it makes a better story.
0:19:02 Compared to maybe the other things I’m better at.
0:19:12 Um, but it’s so great to have this tool, um, to help me like craft whenever, like I use it literally for as simple things as like Slack messages to figure out like how to phrase this well.
0:19:16 And it’ll help me, give me some iterations on how to, how to say something to the team.
0:19:17 I want to see those prompts.
0:19:18 Yeah.
0:19:20 We’re now all just looking for em dashes.
0:19:21 That was good to say.
0:19:22 Right?
0:19:24 Where do you stand on the em dash discourse?
0:19:25 I like em dashes.
0:19:26 I do that normally now.
0:19:27 People think I’m just using it.
0:19:28 I know, I know.
0:19:29 I know.
0:19:30 Me too.
0:19:33 Going back to the, the discourse for a second.
0:19:45 Sam said in his interview with Jack, he said, if you had said 10 years ago that we would get in models at the level of, of sort of PhD students, um, I would think, wow, the world looks so different.
0:19:47 And yet we’ve basically taken it for granted.
0:19:50 Um, do you think basically the improvements are similar?
0:20:04 Like, as soon as we get them, we’re just going to be like, oh, you know, now this is the standard or do you think at some point there’s going to be like, oh my God, this is like, um, how do you think about sort of people’s ability to, um, sort of acclimate or adjust or?
0:20:05 Yeah.
0:20:07 I mean, it seems like people adjust really quickly, don’t you think?
0:20:08 Yeah.
0:20:09 Like whatever happens basically.
0:20:11 I feel like ChatGPT got released and everyone was like, wow, that’s so cool.
0:20:14 But then you just kind of take it for granted that you literally have this like wizard in your pocket.
0:20:17 You can like ask it whatever, whatever random thought you have.
0:20:20 And it just pops out like a good essay and you’re like, oh, okay, cool.
0:20:21 That that’s what’s happening.
0:20:25 I guess people adapt to things rather quickly in my opinion with technology.
0:20:26 And it is really easy.
0:20:35 And I think because the form factor is so easy, even with like new tools, like deep research and ChatGPT agent, it’s like presented in such like a, like easy way that people already know how to interface with.
0:20:45 Like, I think as long as that’s true, even with the models getting like much smarter than us, like I think it’ll be, it’s still going to be like quite approachable to people.
0:20:53 Do you think the jump from GPT four to five was bigger or three to four or maybe three and a half to four?
0:21:01 I mean, at least one, one thing for me and my usage of it is sometimes I’m wondering if I have hard enough questions to ask it to actually like highlight the difference.
0:21:08 Because when it gets to a point where it’s just answering what you need so well, it’s like almost harder to tell the difference in some areas.
0:21:16 But with writing, yeah, I’ve been, I’ve been using it for a few weeks and it’s just kind of blown me away in a way that models previously haven’t.
0:21:20 Maybe I’m biased, recency bias, but I think the jump to four to five is most impressive for me.
0:21:26 Cause I guess with 3.5, when we first released it, the most common use case for me then also was still just for coding.
0:21:35 And, but now like, even though four was better at coding, I feel like the jump between four and five in terms of like breadth of ability to do things is just way different and way more.
0:21:42 And you can just handle a lot more complex things than like before, like with the context length being much longer as well.
0:21:46 Like I think the jump to four to five to me is like much bigger.
0:21:50 Is there anything the model categorically can’t do?
0:21:53 I guess for five, we don’t really take like actions in the real world yet.
0:21:55 We’re going to team up with agent for that.
0:22:05 Yeah, as I said, you could ask the agent to do anything, but it’s not capable enough to do everything you want it to do yet.
0:22:14 We take a conservative approach, especially with like asking the user for confirmation before doing any kind of action that’s irreversible.
0:22:18 So like sending an email or ordering something, booking something.
0:22:26 So I think I can imagine quite, you know, a number of tasks where you’d want to take like bulk actions, which you might not be able to do right now because it would last you every single time.
0:22:36 But I think as people get more comfortable using these things and as they get better and you trust them more, you might, you know, allow it to do things for you without checking in with you as much.
0:22:45 Maybe just to build on that question for in terms of what it can’t do today, but what you would sort of direct future research toward.
0:22:52 If you look at coding something like end to end DevOps, for example, that feels like the logical next set of capabilities.
0:22:58 Do you guys think we’ll get there and I don’t know what you’ll name it, but 5.5 or GPT 6.
0:23:00 How far are we from something like that?
0:23:05 Yeah, I don’t know about the exact thing of DevOps, but I do feel like with the models getting much smarter.
0:23:11 One other thing that came to my mind when you asked me the question is like longer running tasks and like things like that.
0:23:22 I think like, I think we, GPT 5 is great because like, yeah, within like a couple of minutes, maybe you get a full fledged app, but then what would it look like if you actually gave it like an hour, like a day, a week, what can it actually get done?
0:23:25 And I think that’s, there’s gonna be a lot of interesting stuff.
0:23:27 We’re interested to see what will happen there.
0:23:33 Yeah, I think a lot of it is not just about the model capability, but it’s actually like how you set it up in a way to do things.
0:23:41 Like I’m sure that you could build something that’s like monitoring, you know, your Humio or like data dog, whatever, like with these current models.
0:23:43 It’s just like setting up the harness, like to make that possible.
0:23:55 And same for, for like agentic tasks. I think a lot of things that will be quite useful will be when the agent like proactively does something for you, which I don’t think is impossible today.
0:24:05 It’s just not like set up that way. But eventually, like as it proactively does things for you, then we might get feedback on whether that was useful and we can make it like even better at like triggering.
0:24:13 Agents is probably, or agent is probably the most overused word of 2025. That being said, your agent’s launch was extremely exciting.
0:24:21 What does that word mean to you in the context of capabilities that you’d like to build in the near term or have already built?
0:24:25 And what is sort of most important that the agent is able to do on behalf of your users?
0:24:36 I guess my very general definition would just be something that does work useful work for me, um, on my behalf with, I would say asynchronously.
0:24:42 So like you’d kind of leave it and then come back and get, either get a result or like a question about what it’s doing.
0:24:54 And then in terms of, I guess, roadmap for agents, I mean, longer term, you want it to be able to do anything that, you know, a chief of staff or assistant or something like that would do for you.
0:25:05 Um, but I think in the more immediate term, we, there are a lot of new capabilities that we launched in ChatGBT agent that we just want to, to improve.
0:25:08 So one of the main capabilities is, um, deep research.
0:25:20 So just being really good at synthesizing information from the internet, but also, um, I think we can improve capabilities on synthesizing information from like all of the, the services that you use and like private data that you have.
0:25:28 And then, um, also being better at creating and editing artifacts like docs or slides and spreadsheets.
0:25:35 Cause I think so much of like the work that’s useful that people do in their jobs is basically just research and making something.
0:25:45 Um, but then also I’m personally like love all the consumer use cases, um, like making it better at like shopping or planning a trip and those kinds of things are like also really fun.
0:25:57 Um, and so that also involves like taking an action, um, which is, um, interesting cause it’s, um, it’s very, it’s kind of the last step often of, of, of a, of a task.
0:26:00 And it’s the, maybe a task that would take less time for a human.
0:26:07 And it’s like actually very hard, like a very hard research question to like, get it to, to do something or like book something or use it, use a calendar picker.
0:26:12 Um, but yeah, once you have the end to end flow working really well, it can basically do, do anything.
0:26:13 Yeah.
0:26:14 That’s incredible.
0:26:15 On the shopping piece.
0:26:23 I now do not make a single large ticket purchase without having ChatGPD put all the options in a table for me along the dimensions I care about.
0:26:24 It’s, it’s incredible.
0:26:27 Um, but I want to push on the async piece.
0:26:35 Um, because I, I don’t know if you would agree with this, but it felt like a revelation to me, at least, um, at the beginning of the year that people were willing to wait.
0:26:38 Cause you kind of think about, Oh, we want it faster.
0:26:41 Like the value prop of this tool is that it gives me the answer fast.
0:26:42 Right.
0:26:43 That was sort of very 2024.
0:26:46 Um, clearly this paradigm has shifted.
0:26:50 People are willing to wait for high quality, high value answers and work.
0:27:01 Um, how do you think about the trade off between how long something take, how long, um, you take to get something back to the user versus what you’re actually the value that you’re providing?
0:27:04 And like, what do you think is the ideal frontier for something like that?
0:27:05 Yeah.
0:27:10 It’s interesting because, um, I worked on, I built a retrieval on chat GBT and was on the browsing team before this.
0:27:16 Um, Tina was also on the, the browsing team and we were always making these trade-offs and optimizations to, for latency.
0:27:24 And so we were thinking, how can you best like fill the context with information you’ve retrieved so that the answer is pretty good in a few seconds.
0:27:29 And so I think with deep research, I was just very excited to like remove latency as a constraint.
0:27:35 And since we were going for these, we’re going for these tasks that are really hard for humans to do and would take humans many hours to do.
0:27:36 Yeah.
0:27:46 I think we felt like, you know, if you asked an analyst to do this and it would take them 10 hours or two days, seems, seems reasonable that, um, someone would be willing to wait like five minutes in your product.
0:27:50 Um, so I think that was the, we just kind of made that bet.
0:27:56 And luckily it seems like it’s the case, but I do also think that, you know, initially people were like, oh, this is amazing.
0:27:57 It’s doing all this work.
0:28:00 Um, that’s would have taken me so long.
0:28:03 And now people are like, okay, but I want to, now I want it in 30 seconds.
0:28:04 Right.
0:28:07 To the point on the, the bar changing.
0:28:08 Yeah.
0:28:10 Cause yeah, I was going to say, is there any sort of rule of thumb?
0:28:18 I’m sure it’s constantly shifting where as long as you’re 10 times faster than it would take the human to do, they’re willing to wait for it?
0:28:20 Or is that just constantly shifting sand?
0:28:23 I think with these launches, people’s expectations keep getting changing.
0:28:24 Yeah.
0:28:25 Yeah.
0:28:28 I do think we have like a specific, a specific number.
0:28:38 One thing that’s interesting is I think sometimes people just bias to thinking that the longer answer is more like thorough or is it done more work for it, which I don’t necessarily, um, think is the case.
0:28:41 Like deep research, for example, always gives you a really long reports.
0:28:44 Um, but sometimes for me, I don’t want to read this whole long report.
0:28:46 I actually don’t, don’t like that.
0:28:49 And so agent, like it will only give you a long report if you ask for it.
0:28:54 But I think sometimes people, um, since now that you still always getting a really long report, they’re like, wait, I’ve been waiting.
0:28:55 Like where’s my long report?
0:29:00 Um, but sometimes it’s like really hard to find a specific piece of information and would have also taken a human a long time.
0:29:04 Cause it’s in like page 10 of the results where it’s where it finds this information.
0:29:20 So, um, I think it’s interesting also how you can condition people’s expectations with, um, with a product so that when you change or like with deep research, it always thinks for a really long time, which again, I don’t necessarily think is a feature, but I think now people are like really used to the amount of time that they wait.
0:29:21 Um, and so, so.
0:29:26 I think we hear this with GPT-5 internally when people are testing and they’re like, oh, I thought I asked like a really hard question.
0:29:32 I feel like a little bit insulted that it got for like two seconds or like when it doesn’t even want to think at all.
0:29:34 It’s like the Mark Twain line.
0:29:36 I didn’t have time to write you a short letter.
0:29:38 So I wrote you a long one.
0:29:39 Yeah.
0:29:44 Why don’t you talk about the, the bottom, like, why don’t we have reliable agency?
0:29:47 What are the main bottlenecks as you see them?
0:29:48 Yeah.
0:29:52 I think a big part of it is the things that we train on were often really good at.
0:29:58 And then sometimes with the things outside of that, it’s can be a bit, um, sometimes it’s good at those things.
0:29:59 Sometimes it’s not good at those things.
0:30:05 Um, so I think, yeah, creating more data across like a broader range of things that we want it to be good at.
0:30:18 Um, I think also what’s interesting with, with agents is we have this, like, uh, when, when something is doing something on your behalf and it has access to your, you know, your private data and the things that you use.
0:30:22 Um, it’s kind of more scary, the different things that could do to achieve its final goal.
0:30:32 Um, you know, in theory, if you asked it to, to buy you something that, and like, make sure that I like it, it could go and buy five things just to make sure that you liked one of them.
0:30:33 Right.
0:30:35 Which you might not necessarily want.
0:30:40 So I think that there’s definitely like having oversight during training is also like an interesting area.
0:30:48 I think there’s just like new things that we have to like develop to, you know, push these agents even further.
0:30:51 Um, so yeah, I think that that’s part of it.
0:31:01 And then also like, as the, every time we get a, have a smarter, like base model or something like this, it improves every model that’s built on top of that.
0:31:06 So I think that will also help, especially with like multimodal capabilities, as Tina said, with like computer use.
0:31:12 Um, cause it’s like just literally looking at screenshots of, of the, of the webpage.
0:31:26 And it’s like, it’s a little interesting cause, um, the way that humans like focus on specific things, it’s like, it’s a lot to expect a model to just like take a whole image and be able to like know everything about the image when like, when we’re looking at something, we’ll like focus on a specific thing.
0:31:27 Yeah.
0:31:30 I think that there’s just lots of room for improvement and lots of, in lots of areas.
0:31:31 Sorry.
0:31:32 That was kind of a general answer.
0:31:33 No, no.
0:31:41 Well, actually I was gonna, maybe that last example, um, gets into something that we were curious about, which is, and this ties back to training data, um, as well.
0:31:49 But what, what sort of, I guess, what specific categories of browsing tasks are challenging for agents, um, today?
0:31:54 And like, I don’t know if you have thoughts on how you’d overcome this for sort of the next iteration of the model.
0:32:00 I mean, I think one thing is like, so free training, it’s based on like what data is available, right?
0:32:09 And so I think when we’ve done these free training, there’s not much data out there to begin with, but people using computers, like computer usage is not really a thing that like, there’s lots of like data out there.
0:32:09 Yeah.
0:32:13 And this is something we actually have to like, seek out now that this is a capability that we want.
0:32:15 So I think that’s actually probably a big one.
0:32:16 Mm-hmm.
0:32:18 Just for general improvements of like computer usage.
0:32:22 Do you think you’ll lean more heavily on human data vendors to help collect that?
0:32:35 Or given it doesn’t exist to your point, like recorded in the way that maybe it’s most helpful for training, like how do we, but it is probably the most useful application of the models to, you know, at least knowledge work.
0:32:37 Um, like how do you overcome that?
0:32:48 I mean, I think one cool thing is for, for example, for initial deep research, there’s not really any data sets that exist for browsing in the same way that you have a math data set that already exists.
0:32:49 Right.
0:32:55 So we, we have to create all this data, but once you have good browsing models or good computer use models, you can like bootstrap them to help you make synthetic data.
0:32:57 So I think that’s like pretty promising area.
0:33:03 Christina, can you explain what mid training is and how it sort of, what does it achieve that pre or post doesn’t?
0:33:08 So I think with your pre training runs, these are like your, these are your, the big runs.
0:33:09 These are the massive ones.
0:33:12 Like that’s what we’re building all these giant clusters for.
0:33:20 Um, so you can kind of think of mid training is literally it’s for like middle, like we do it before, um, after pre-training, but before post-training.
0:33:26 Um, you kind of think of a way to like extend the models, like intelligence without having to do a whole new pre-training run.
0:33:29 So this is mostly just focused on data and off of the pre-training models.
0:33:34 Um, so this is a way for us to do things like updating the knowledge cut off of these models, right?
0:33:39 So when you pre-train it, you’re kind of like, okay, shoot, now we’re kind of stuck in this date and we can never update it again.
0:33:42 And it doesn’t quite make sense to put all that data into post-training.
0:33:49 Um, and so mid-training is just a smaller pre-training run to help expand like the models intelligence and like up-to-dateness.
0:33:51 Christina, did you work on web GPT?
0:33:52 Yes, I did.
0:33:54 Okay. So you’re basically like an AI historian.
0:33:55 Um, yes, yes.
0:33:57 She also works on computer use.
0:33:58 I’m an elder.
0:34:06 So can you like reflect back a little bit to, you know, four years ago, five years ago and sort of reflect on like, what are the biggest thing?
0:34:12 Like if you were to predict the five years out, like what are the inflection points or biggest things that would have surprised you?
0:34:18 Honestly, with web GPT, the main thing we were just excited about was like trying to ground these language models.
0:34:22 Like it’s, they had so many issues with like hallucinations and the model just saying random things.
0:34:25 And like the fact of we didn’t really do mid-training sense.
0:34:29 So like the fact of like, how do we make sure the model is actually up to date, like most factually up to date.
0:34:32 So then that’s kind of how we thought about like, oh, let’s give it a browsing tool.
0:34:33 I think that makes sense.
0:34:38 Um, and then, yeah, like I said, that kind of went on from like, oh, I actually want to keep asking questions.
0:34:39 So what a chatbot would look like.
0:34:43 But at this point, I think there had been a few chatbots by a few other companies.
0:34:48 And I feel like a chatbot is also like a very common AI thing to think of.
0:34:50 Um, but they’re quite unpopular at the time.
0:34:57 So we weren’t really even sure that like, this is actually something useful for people to work on or like people to use, or will people be excited about this?
0:35:01 Is this really like a research innovation that we like, are we making the Turing test here?
0:35:06 Like, um, but I think it kind of clicked into me that like, maybe there was actually something interesting happening here.
0:35:09 Um, we gave early access to about 50 people.
0:35:12 Most of those people being like people I lived with at the time.
0:35:16 Uh, and there, two of my roommates just used it all the time.
0:35:17 They just like would never stop using it.
0:35:21 And they would just have these long conversations and they would ask it like quite technical things.
0:35:23 Cause they’re also AI researchers.
0:35:25 And so I was just like, oh, this is like kind of interesting.
0:35:26 Like, I don’t know.
0:35:32 And at the time we’re kind of thinking like, okay, we kind of this chatbot, should we do make this like a really specific like meeting bot type of thing?
0:35:34 Do we like make it a coding helper?
0:35:41 Um, but it was interesting to see my two roommates just use it like for anything and everything and just like literally be chatting with it.
0:35:43 Like the whole work day as they’re using it.
0:35:45 So I was like, oh, this is kind of interesting.
0:35:51 But then it was also interesting to see like the majority of the people that I gave access to on that 50 person list, like didn’t really use it that much.
0:35:56 But I was like, oh, it’s like, there’s clearly like something here, but it’s like not quite maybe for everyone yet.
0:35:57 Um, but there’s something here.
0:36:02 When did you realize like I’m working at one of the most important companies of this generation?
0:36:06 Like, like when was the moment where you were like, hey, this is something that I obviously believe is important.
0:36:07 That’s why I joined.
0:36:09 But that you realized like the scale and significance.
0:36:12 Honestly, I kind of had this moment before I joined OpenAI.
0:36:23 Like, like I think with the scaling laws paper with GPT-3, I was just like kind of hit me that like, if this exponential is true, like there’s not really much else I want to spend my life working on.
0:36:25 Um, and like, I want to be part of this like story.
0:36:29 Like, I think there’s, there’s going to be so many interesting things unlocked with this.
0:36:40 And I think this is, this is probably the next like step level in terms of like technology that it kind of made me realize like, oh, I should probably go start reading about deep learning and figure out how I can get into one of these labs.
0:36:41 Is that what was your moment?
0:36:51 I think, I think for me, it was also before I started working at OpenAI, um, using, I think I first learned about OpenAI in a, in a AI class or something or some kind of computer science class.
0:36:53 And they were saying like, oh, they trained on the whole internet.
0:36:54 It’s like, oh, that’s so crazy.
0:36:56 Like, what is this company?
0:37:04 And then started using GPT-3, like in the, I think I was the, it was a power user of the OpenAI playground.
0:37:10 And at a certain point, like had early access to these like different OpenAI features, like embeddings and things like that.
0:37:12 And just became this like big OpenAI fan.
0:37:16 Um, which is like a little embarrassing, but you know, it’s fine because it got me here.
0:37:19 And eventually they’re like, okay, like you’re stalking us.
0:37:21 Do you want to interview here?
0:37:25 Um, but yeah, I think it was like pretty clear to me, but just how much I was using GPT-3, which wasn’t even,
0:37:28 compared to what we have now, like just pales in comparison.
0:37:33 But I was like, from then I was hooked and just trying to figure out a way to, to, to work here.
0:37:37 Maybe a, uh, a question or more on the company building front.
0:37:43 Um, we all sort of read and reread Calvin French Owen’s piece, uh, just as reflections on working at OpenAI.
0:37:57 Um, curious, and you don’t have to comment on that piece unless you want to, but, um, would love your reflections on the change that you’ve seen over the last four years or, um, or, you know, or even less than then, given, I think that was only covering one year of change.
0:38:00 Um, but what are the biggest things that you’ve seen change at OpenAI?
0:38:05 I mean, when I first joined OpenAI, the applied team was 10 engineers or something.
0:38:07 It just like, we didn’t really have this like product arm.
0:38:08 We had just launched the API.
0:38:10 It was just a completely different world.
0:38:21 And I think AI is in most people’s mind now after ChatGPT, but I think pre-ChatGPT, like people didn’t really know what AI was or really like thought about it as much.
0:38:24 Um, it’s kind of cool working in a place that like my parents know what I do now.
0:38:26 And like, it’s like, that’s really cool.
0:38:31 Um, and I think the company obviously is just a lot bigger, but I think with that, we can just take a lot more bets.
0:38:36 I think when I first joined OpenAI, there were obviously way less, um, people.
0:38:37 Like it was much, much smaller.
0:38:38 It was around like 200-ish people.
0:38:42 And I think we’re close to like a few thousand for sure.
0:38:43 Yeah.
0:38:44 Yeah.
0:38:46 When I joined, it was also a few hundred before ChatGPT.
0:38:52 So it’s obviously, yeah, very different in how, you know, all of your friends have heard of, you know, what you work on.
0:38:54 But I think culturally, obviously the company is much bigger.
0:39:00 I still think we’ve maintained, um, this, it, it still feels very much like a startup.
0:39:06 I think some people who come from a startup are surprised at like, oh, I’m working even harder than when I was working on the startup that I founded.
0:39:08 I think ideas can still come from anywhere.
0:39:11 And if you just like take initiative and want to make something happen, you can.
0:39:14 And this doesn’t really matter like how senior you are or anything like that.
0:39:17 I think we’ve been able to maintain that culture, which I think is pretty special.
0:39:19 Yeah, we definitely reward agency.
0:39:21 And I think that’s like what has been true.
0:39:24 And I think, especially in the research side, the teams are quite small.
0:39:27 Like when Issa was working on deep research, it was like two people still.
0:39:30 So like, I think we still do that on the research side.
0:39:34 Like most research teams are quite small and nimble for that reason.
0:39:35 Um, so.
0:39:43 And earlier you said, um, you know, we do something at OpenAI, which startups never do, which is, you know, try to appeal to every single person with the product.
0:39:54 What, um, are there other things that come to mind that OpenAI just does differently than, than all your peers or other startups or things that we may not appreciate being on the.
0:40:05 I mean, I think it’s different for, um, different teams, but, um, my, the, my team collaborates so closely with the applied, like the engineering team and the product team and design team.
0:40:11 Um, in a way that I think sometimes like research can be quite separate from like the rest of the company.
0:40:12 But for us, it’s like so integrated.
0:40:13 We all sit together.
0:40:20 Um, you know, sometimes like the researchers will help with like implementing something.
0:40:23 I’m not sure that engineers are always happy about it, but we’ll try.
0:40:31 Um, like they’ll like get out of the front end code, but, um, and, and vice versa, like they’ll help us with things that we’re doing for like model training runs and things like that.
0:40:35 So I think, um, some of the like product teams are quite integrated.
0:40:44 I think it’s for, for post training, um, it’s, it’s a pretty common pattern, which, um, I think just lets you move really quickly.
0:40:57 I guess one thing that I think is unique about OpenAI is that you’re both very much a consumer company by revenue, et cetera, um, products, but also an enterprise company.
0:41:03 How does that internally, like what would you guys consider yourself or is that even just the wrong paradigm to think about?
0:41:09 Yeah. I mean, I guess if you tie it to the mission, it’s like, we’re trying to make the most capable thing.
0:41:14 I’m also trying to have as, make it useful to as many people as possible and accessible to as many people as possible.
0:41:17 So like in that framing, I think it makes a lot of sense.
0:41:22 The concept of taste has become, um, also very widely used.
0:41:24 What does good taste mean within OpenAI?
0:41:27 How do you know when you see it, know it, when you see it?
0:41:32 Um, and is that something that, um, even in a world where everything, the cost of it,
0:41:35 the cost to produce everything just keeps going down and down.
0:41:41 Is that, is that the one thing that’s not commoditizable or is that also shifting given maybe that can go into the training data?
0:41:47 No, I think taste is quite important, especially now that like, it is like, like I said, like our models are getting smarter.
0:41:48 It’s easier to use them as tools.
0:41:51 Um, so I think having the right direction matters a lot now.
0:41:55 Um, and like having the right intuitions and like with the right questions you want to ask.
0:41:58 Um, so I would say maybe it’s matters more now than before.
0:42:06 I think also I’ve been surprised by how often the thing that is, is the most simple, like easy to explain is the thing that works the best.
0:42:15 And so sometimes it’s like sound seems very obvious, but, um, it, you know, it’s quite hard to get the details of something right.
0:42:22 But I think usually good researcher taste is just like pretty simplifying the problem to like the dumbest thing or the most simple thing you can do.
0:42:28 Yeah. I feel like with every like research release we do and when people figure out what happened there, they’re like, Oh, that’s so simple.
0:42:38 Like, Oh, I should like that. Obviously, obviously that would have worked. Um, but I think it’s like knowing to try that like obvious or like at the time, not obvious thing that is obvious in hindsight.
0:42:47 Yeah. And then all of the details around the hyperprime and all these things and like the infra, that’s obviously like very hard, but the actual concept itself is usually, usually pretty straightforward.
0:42:51 Hmm. Very cool. Taste is Occam’s razor. Yeah.
0:43:04 So sort of in, in, in closing here, uh, obviously historic day, uh, you want to contextualize sort of what, what this means in context of, of, of the mission and, and, um, you know, where you’ve been to, to get to now to where, where you’re going.
0:43:10 Yeah. I think with GPT five, the thing that’s the word that’s like been in my mind throughout all of this is like usable.
0:43:17 And I think the thing that we’re excited about is getting this out to everyone. Um, we’re excited to get like our best reasoning models out to free users now.
0:43:25 And I think just getting the, our smartest model yet to like everyone. And I’m just excited to see what, like what people are going to actually use it for.
0:43:28 That’s a great place to wrap. Tina, you said, thanks so much for coming on the podcast.
0:43:29 Yeah. Thank you.
0:43:30 Thank you for having us.
0:43:44 Thanks for listening to the a 16 Z podcast. If you enjoyed the episode, let us know by leaving a review at rate, this podcast.com slash a 16 Z. We’ve got more great conversations coming your way. See you next time.

ChatGPT-5 just launched, marking a major milestone for OpenAI and the entire AI ecosystem.

Fresh off the live stream, Erik Torenberg was joined in the studio by three people who played key roles in making this model a reality:

Christina Kim, Researcher at OpenAI, who leads the core models team on post-training
Isa Fulford, Researcher at OpenAI, who leads deep research and the ChatGPT agent team on post-training
Sarah Wang, General Partner at a16z, who’s led our investment in OpenAI since 2021

They discuss what’s actually new in ChatGPT-5—from major leaps in reasoning, coding, and creative writing to meaningful improvements in trustworthiness, behavior, and post-training techniques.

We also discuss:

How GPT-5 was trained, including RL environments and why data quality matters more than ever
The shift toward agentic workflows—what “agents” really are, why async matters, and how it’s empowering a new golden age of the “ideas guy”
What GPT-5 means for builders, startups, and the broader AI ecosystem going forward

Whether you’re an AI researcher, founder, or curious user, this is the deep-dive conversation you won’t want to miss.

Timecodes:

0:00 ChatGPT Origins

1:57 Model Capabilities & Coding Improvements

4:00 Model Behaviors & Sycophancy

6:15 Usage, Pricing & Startup Opportunities

8:03 Broader Impact & AGI Discourse

16:56 Creative Writing & Model Progress

32:37 Training, Data & Reflections

36:21 Company Growth & Culture

41:39 Closing Thoughts & Mission

Resources

Find Christina on X: https://x.com/christinahkim

Find Isa on X: https://x.com/isafulf

Find Sarah on X: https://x.com/sarahdingwang

Stay Updated:

Let us know what you think: https://ratethispodcast.com/a16z

Find a16z on Twitter: https://twitter.com/a16z

Find a16z on LinkedIn: https://www.linkedin.com/company/a16z

Subscribe on your favorite podcast app: https://a16z.simplecast.com/

Follow our host: https://x.com/eriktorenberg

Please note that the content here is for informational purposes only; should NOT be taken as legal, business, tax, or investment advice or be used to evaluate any investment or security; and is not directed at any investors or potential investors in any a16z fund. a16z and its affiliates may maintain investments in the companies discussed. For more details please see a16z.com/disclosures

GPT-5 Breakdown – w/ OpenAI Researchers Isa Fulford & Christina Kim

Leave a Reply Cancel reply