AI transcript
0:00:13 The real ultimate end state of AI and thus AI agents is these are autonomous things that run in the background on your behalf and executing real work for you.
0:00:18 The more work that it’s doing without you having to intervene, the more agentic it’s becoming.
0:00:21 Somehow it produces output that it feeds back into itself.
0:00:27 It’s literally just the ampersand in Linux, which is it’s a background test.
0:00:29 And it’s like the worst assistant in the world.
0:00:34 And agentification is just hiring a lot of these really bad interns.
0:01:08 From the old school idea of agents as just background tasks to today’s vision of fully autonomous systems, we’ll explore what this means for coding, enterprise workflows, and how whole industries might reorganize around agents.
0:01:10 Let’s get into it.
0:01:19 I thought I’d start this wide-ranging podcast by asking the very simple but very provocative question, what is an agent?
0:01:20 Oh, boy.
0:01:21 To who?
0:01:21 Steven.
0:01:22 Okay, Steven.
0:01:22 Oh, to me?
0:01:23 Oh, I go for it.
0:01:32 So I actually have a very old person view of what an agent is, which is it’s literally just the ampersand in Linux, which is it’s a background test.
0:01:33 Okay.
0:01:39 Because, like, you type something into O3, and then it’s like, hey, I’m trying this out.
0:01:40 Oh, wait, I need a password.
0:01:41 Can’t do that.
0:01:43 And it’s like the worst assistant in the world.
0:01:48 And really, it’s just because they need to entertain you while it’s taking a long time to answer your prompt.
0:01:56 And so that’s my old person view of what an agent, and agentification is just hiring a lot of these really bad interns.
0:01:58 The interns, they’re getting better.
0:02:02 They are, but they still don’t remember if I have a password to nature.
0:02:06 Is it possible you guys just had bad interns in, like, the 80s and 90s?
0:02:07 We had terrible interns.
0:02:08 I have, like, a very high esteem for interns.
0:02:11 But now a real answer.
0:02:15 No, no, I mean, I think collectively we’re seeing what these are becoming.
0:02:23 So if you think about two years ago, the post-Chat to be team moment, we thought that we were looking at the form factor of AI, which is you’re talking back and forth to something.
0:02:33 And I think to Stephen’s point, the real ultimate end state of AI, and thus AI agents, is these are autonomous things that run in the background on your behalf and executing real work for you.
0:02:40 And you’re ideally, in an ideal world, interacting with them actually relatively little relative to the amount of value that they’re creating.
0:02:47 And so there’s some kind of metric where the more work that it’s doing without you having to intervene, the more agentic it’s becoming.
0:02:48 And I think that’s sort of the paradigm that we’re seeing.
0:03:00 The only addition I’d have, in addition to long-running, which I agree, is that somehow it produces output that it feeds back into itself as input, which you can actually do long-running inference.
0:03:05 Like, you can make a video that’s really long-running, but it’s just basically a single-shot video, and you just throw more compute at it.
0:03:12 I think there’s, like, tactical limitations if you start feeding the input back in, because we’re not quite sure how to contain that, too.
0:03:20 And so I think you can measure things based on how long they run, and you can also measure it by how many times it’s actually taken its own guidance, which would be kind of more of an agency.
0:03:26 Yeah, because I do think it’s important that in this transition—look, we are—what Aaron described is where we’re going to be.
0:03:30 It’s just that, what are the interesting steps that happen along the way?
0:03:37 Because we are going to need, for the time being, it to stop and say, am I heading in the right direction or not?
0:03:52 Because putting aside all the horror stories about, you know, taking action without consent and using accounts and data or whatever, there is this thing where you just don’t want to waste your time on the clock while it’s churning away, way off in the wrong direction.
0:04:00 Yeah, so the question is, to what extent do they have their own agency, which to me means they’ve spit something out and they’ve kind of consumed it back up again, and it’s still a sensible thing.
0:04:08 Which, by the way, as you start thinking of these things in distribution, it’s actually a very difficult thing to do, because it doesn’t know if it’s going to be spitting something out that’s still in distribution when it brings it back in.
0:04:10 Like, they don’t have that self-reflection.
0:04:15 So, I think there’s actually a very kind of technical question here, to what extent we can make these things have independent agency.
0:04:17 But we can make them long run pretty easily.
0:04:19 Yeah, yeah, we’re good at the long run.
0:04:19 The long running thing.
0:04:21 What you get back is, yeah.
0:04:29 Yeah, I mean, I think the interesting thing is how the ecosystem is sort of solving or mitigating then the issues.
0:04:32 Like, you’re seeing sort of this logical division of the agents.
0:04:35 So, they might be long running, but they’re not actually trying to do everything.
0:04:44 And so, the more that you subdivide the tasks out, then actually the more that they can go pretty far on a single task without getting kind of totally lost on what they’re working on.
0:04:51 Well, Unix is going to prove to be right, which is, like, you’re going to want to break things up into much smaller granularity in tools.
0:04:58 And I think to other points that you’ve made on X, like, you’re going to want to divide things up so that it’s like an expert in this thing.
0:05:04 And then it might be a different, let’s just say, body of code where you go and ask, you know, are you good at this thing?
0:05:06 Let me get your answer on this part of the problem.
0:05:07 Yeah, it’s kind of interesting.
0:05:14 I don’t know how much you’ve plotted this, but, like, the conversation on AGI has sort of evolved very clearly in the past, like, six months.
0:05:26 And I think that the consensus was, maybe not even consensus, what some of the view was, let’s say, two years ago, was this sort of monolithic system that’s just super intelligent, and it solves all things.
0:05:35 And now if you kind of fast forward to today and, let’s say, whatever we agree kind of state-of-the-art is, it’s sort of looking like that’s probably not going to work for a variety of reasons, at least in today’s architecture.
0:05:42 So then what do you have is maybe a system of many agents, and those agents have to become very, very deep experts in a particular set of tasks.
0:05:45 And then somehow you’re orchestrating those agents together.
0:05:47 And then now you have two different types of problems.
0:05:48 One has to go deep.
0:05:50 The other has to be really good at orchestration.
0:05:54 And that maybe is how you end up solving some of these issues over the long run.
0:05:56 I just think it’s very difficult to think cleanly about this.
0:06:02 Like, I’ve still yet to see a system where they perform very well, and you don’t draw a circle that doesn’t have a human being in it somewhere.
0:06:03 Oh, yeah.
0:06:07 So in a sense, like, the G, like, often seems to be coming from, like, the general seems to be.
0:06:07 Yeah.
0:06:13 So, like, I just, listen, these things are tremendously good at increasing productivity of humans.
0:06:18 At some point, maybe they’ll increase productivity without humans, but until then, it’s just very hard for me to actually talk cleanly.
0:06:25 Well, and it’s so important for people to get past sort of the anthropomorphization of AI because that’s what’s holding everybody back.
0:06:32 Like, AGI is about robot fantasy land, and that leads to all the nonsense about destroying jobs and blah, blah, blah.
0:06:39 And none of that is helpful because you have to then, you dig yourself out of that hole to just explain, wow, it’s really, really good at writing a case study.
0:06:39 Right, right.
0:06:45 Which, it writes a better case study than all the people that work for it, but it doesn’t know who to write it about.
0:06:47 It doesn’t know what necessarily you want to emphasize.
0:06:50 It doesn’t know what the budget is, what’s needed, how many words.
0:07:02 Right, but it also turns out, like, AGI just does an awful lot of work, you know what, so, for example, someone asked me recently, they say, well, are you worried that if we have AGI, then you’ll no longer be investing in software companies?
0:07:04 I’m like, well, I mean, you’re AGI.
0:07:07 I’m still investing in software companies, right?
0:07:13 And so, like, just because you’re AGI says nothing about economic equilibrium or economic feasibility, et cetera.
0:07:18 So, like, just the term AGI does basically infinite work for every kind of fear we have and maybe every hope that we have.
0:07:26 And then when we tie it down to, like, not only it solves a class of problems, but the economics pencil out yes or no, we can actually have a more sensible discussion.
0:07:29 Which I actually, I think, is finally entering the discourse.
0:07:32 I think we’re actually talking a lot more sensibly now than we were a year ago.
0:07:41 And so when people say things, or the AI 2027 paper, when they talk about sort of automated research, recursive self-improvement, does that feel like fiction or fantasy?
0:07:49 Or does it feel like, or is it thinking that even with those things, we’re, you know, sort of nowhere near peak software and there would just be unlimited sort of demand?
0:07:52 I think you’ve got to go first for each question.
0:07:57 I need you to anchor us in reality and then we can deviate.
0:08:03 Well, look, I think that, first, I’m just not a fan right now of buying into anything by year.
0:08:11 Because whatever year you want to buy into, in 2027, we’re just going to be having a fight over what we meant by the metrics.
0:08:16 And it just turns into, like, OKRs for an industry, which is just, like, a ridiculous place to be.
0:08:17 That’s really funny.
0:08:21 But I think that everything takes 10 years, but you can’t predict anything in 10 years.
0:08:23 So how do you even reconcile that?
0:08:28 And I think that you just have to recognize that we’re on an exponential curve.
0:08:30 So no one’s predictive powers work.
0:08:32 And it’s just going to keep happening.
0:08:33 It’s not going to plateau.
0:08:36 It’s not going to, you know, all of a sudden we’re done.
0:08:39 And that’s what makes this a different kind of platform shift.
0:08:49 If you just, you look at the progress, and that’s the same that went through with storage, that went through with bandwidth, that went through with productivity on computing, on connectivity around the world.
0:08:51 Like, because it’s exponential, you can’t predict it.
0:08:54 And it’s just folly to sit around and try to predict.
0:09:03 Now, you can do science fiction, and you can say, in the future, when we all have our personal AI with all this other stuff, and then that’s great, but then you say it’s going to happen in 2029, you’re an idiot.
0:09:05 And so…
0:09:07 That sounds totally correct, right?
0:09:15 Because basically, three years ago, you would not have been able to conceive a cloud code, or cursor, or name your background agent writing code.
0:09:18 So it’s like, what is the point of having some date at which you’re naming something?
0:09:25 And so we’ve actually seen probably vastly more progress in the past just two years of actual applied AI than we would have thought.
0:09:29 And yet, does it matter that one or two of the predictions didn’t play out?
0:09:29 No.
0:09:37 So I think it’s probably more interesting to think about, like, where is the technology from more of a classic Moore’s Law standpoint?
0:09:38 Like, how much compute do we have?
0:09:40 How much data are we working through?
0:09:41 How powerful are these models?
0:09:43 Let me ask you, like, as semi-old.
0:09:44 Well, I mean, like…
0:09:44 Guilty.
0:09:56 Like, nobody, after AI collapsed, and machine translation and machine vision failed, you couldn’t find anybody who thought that those would become solved problems.
0:10:00 Or after neural nets imploded, and, like, literally, you were teaching…
0:10:01 Or expert systems.
0:10:01 Or expert systems.
0:10:10 But you were teaching, and if you tried to teach neural nets, like, the students would rebel because you were wasting everybody’s time.
0:10:14 In 1999, like, Hinton couldn’t get funded trying to do neural nets.
0:10:18 Grad school was this three-volume history of artificial intelligence thing.
0:10:19 Neural nets was, like, eight pages.
0:10:24 You know, ironically, I remember when ML was the cool thing and neural nets was the old thing.
0:10:28 And now, like, you know, ML is, like, the old thing and neural nets are the cool thing.
0:10:28 Right, or NLP.
0:10:30 And so, the fact that…
0:10:34 So, we will return to all of these problems that couldn’t be solved.
0:10:35 Like, even, like, this…
0:10:38 Everyone’s favorite one, oh, it doesn’t understand math.
0:10:39 Right.
0:10:43 Like, okay, that is a solvable problem because math is solvable.
0:10:44 Like, there’s just…
0:10:52 No one put the math layer in to understand what a number was and to, you know, hard-code it and just build in an expert system for math, which is actually a well-understood thing.
0:10:55 Because we’ve had Maxima since, like, 1975.
0:10:56 You know.
0:11:01 I think it’s important to, like, maybe for us to describe how hard it is to predict anything, right?
0:11:02 So, let’s take recursive self-improvement.
0:11:03 This is one of my favorite ones.
0:11:11 So, the theory of recursive self-improvement is you have a graph or you have a box, which is the thing, and then there’s an arrow that goes back to the box, which says improve.
0:11:13 And then, of course, you look at that and you’re like…
0:11:22 So, I guess, you know, like, from an intuitive lay perspective, every time you have a box with an arrow back in it, you’re like, okay, we’re done, right?
0:11:30 But, like, if you know anything about nonlinear control theory, answering that question is one of the most difficult questions that we know in all of technical sciences, right?
0:11:31 Like, does it converge?
0:11:33 Does it diverge?
0:11:35 Like, does it asymptote, right?
0:11:41 So, for example, you could recursively self-improve if you’re doing basic search, but you asymptote, right?
0:11:41 Right.
0:11:47 And so, like, saying recursive self-improvement from, like, a deeply tactical perspective says almost nothing.
0:11:59 But, unfortunately, because we tend to anthropomorphize AI, we say recursive self-improvement, and all of a sudden we’re like, and then it, like, overcomes energy boundaries and human intelligence.
0:12:02 Well, that’s how it goes from being a toddler to being, like, an eight-year-old.
0:12:04 It’s just because it figured out how to learn.
0:12:05 It recursively self-improve, right?
0:12:15 So, I mean, the reality is, like, nonlinear control systems, which are feedback loops that are adaptive, we don’t even have the math for a relatively simple system to understand what happens.
0:12:18 You have to actually know the distributions that come out and go into them.
0:12:20 And so these things are going to improve.
0:12:21 They’re going to continue to improve.
0:12:23 Maybe they’ll improve themselves.
0:12:27 But just because they do improve themselves doesn’t mean they can continue to do it.
0:12:29 And this is kind of part of this entire journey as we’re learning about these systems.
0:12:34 Again, the good news is I think we’re talking a lot more sensibly now than we were a year ago, and hopefully that will continue.
0:12:39 Hopefully the discourse can recursively self-improve so we’re just more sensibly.
0:12:41 Well, the good news is that’s involving humans, so we don’t actually have to learn.
0:12:46 But I think that, I mean, you must be seeing this even with customers.
0:12:50 I mean, like, take the conversation about, like, hallucinations and things like that.
0:12:54 How dramatically that’s altered in just the past two years, say.
0:12:56 Yeah, on two dimensions, actually.
0:13:00 So, on one dimension, the problem of hallucinations has improved.
0:13:13 So, as the models get better, as our understanding of how do you, you know, whether it’s RAG or whatever, you know, even the problem of actually the efficacy of the context window has improved.
0:13:17 So, you have the technical improvements, you know, kind of across the stack.
0:13:27 And equally, you have a kind of a cultural understanding to some degree within the enterprise as to, like, okay, actually, no, these are non-deterministic systems.
0:13:28 They’re probabilistic.
0:13:46 So, you’re starting to see almost a culture shift, which is, okay, you can actually implement AI in essentially more and more critical use cases because the employees that are using those systems understand that they do actually have to do the work to verify it.
0:13:54 And then the only question is, is what is that ratio of time it took to verify versus if I had done it myself and how much efficiency gained for whatever that workflow is?
0:14:18 But we are, we’re going from probably, like, two and a half years ago where there was, you know, this instant excitement as to, oh, my God, this is going to be the greatest thing of all time to a reality check within three to six months because everybody was like, hallucination is going to be the massive, you know, kind of problem to now a couple years later after that, which is like, okay, like, we’re seeing the hallucination rates shrink.
0:14:24 We’re seeing the quality of the outputs increase, and we understand that you do have to go and review the work that these AI, you know, agents are doing.
0:14:27 And that takes on a different form depending on the use case.
0:14:30 So, in the form of coding, that means just, like, you just have to go review the code.
0:14:33 Which you had to do anyway.
0:14:34 People seem to be forgetting.
0:14:45 You had to do anyway, but, like, there was probably at least a little bit of, like, theory as to, like, what part you should go review with extra level of detail because you kind of knew the person you were working with.
0:14:48 It also implicitly limits the value of AI, which people are uncomfortable with.
0:14:49 Right, right.
0:14:56 Which it just basically says it helps people that will know more than the AI does in a sense that it knows more than, you know, like, it starts to actually kind of bisect the utility.
0:15:04 Yeah, yeah, basically it’s super interesting, which is the experts are now becoming, the productivity of an expert is outpacing everything else.
0:15:18 Which is, which was this, you know, I think we could have probably predicted it based on historical events and I think you’ve got some good theories about how, you know, the skill, the type of skills that are, you know, the kind of, the right user for these models for the kind of use case.
0:15:27 So, we’re seeing that, you know, where the expert engineers are like, like, I don’t mind that it’s a slot machine where I’m pulling it and I see what comes out because I know I can still get 10x productivity.
0:15:28 It gives me good ideas.
0:15:38 Yeah, and I get it good enough that it’s worth that productivity gain, whereas if you were, like, not an expert engineer and you did this slot machine, you probably would try and go and deploy, you know, all the ones that were also wrong.
0:15:40 And you actually don’t know which lever to pull.
0:15:40 Right.
0:15:45 Which is, like, a big thing is, like, literally knowing, like, what to ask for and what language to use will get you a better perspective.
0:15:52 Well, I think that this is just an incredibly important point that you’re making and it really gets to the heart of what it means to use a tool.
0:15:58 Like, you know, you put me in front of, like, a 12-inch chop saw and say, like, go fix the fence.
0:16:01 Really, really bad idea.
0:16:02 I mean, I could go buy one.
0:16:04 I could cruise the home.
0:16:05 There’s a reason.
0:16:08 And I’m like, ooh, dang, man, I don’t have a DeWalt.
0:16:14 And I could buy it, but it’s really not a particularly good idea.
0:16:29 And I think that how these platform shifts happen and why there’s so much excitement over coding is that, well, the best way for a platform shift to take hold is it’s the experts that are the closest you have to an expert in the new platform.
0:16:34 It’s who becomes the most enthusiastic and the biggest users overall.
0:16:39 Like, I’ve been practicing yoga over at the Cubberley Community Center in Palo Alto because the studio is closed for a remodel.
0:16:43 But what’s neat is that was, like, the OG place for computer clubs.
0:16:44 Oh, nice.
0:16:52 Like, in the early 1990s and the late 80s, like, if you ever wanted to meet the computer club, and you would go, and, like, this is, like, halt and catch fire.
0:16:56 And it’s, like, a bunch of people with soldering irons and shit.
0:17:04 And, like, they’re, that’s who, and, you know, when it didn’t work, when something was broken, that wasn’t, like, oh, man, these things are terrible.
0:17:12 That was, like, the whole meeting was, like, who could get, like, one of these new discrete graphics cards to actually work and debug the driver?
0:17:16 Can anyone print? Is there anyone in this room who can print in this new thing called PostScript?
0:17:19 And I think that’s what’s really happening right now.
0:17:23 And so, first, it’s obvious it should happen with development and coding first.
0:17:30 Because they’re the most forgiving and the most understanding of, like, what’s a bug, what’s a thing that can never get fixed?
0:17:34 And the thing to watch for is no one is saying that coding can’t get fixed.
0:17:40 Like, whatever it’s degenerating that’s bad for, like, a 2X coder rather than a 10X coder,
0:17:43 no one is saying, well, that’ll never be fixed.
0:17:43 Right.
0:17:48 And then the next thing that’s going to happen is going to be what I think is just going to be, like, the creation of words.
0:17:55 Like, the marketing document, the positioning document, all of this long-form stuff where, if you’re really good at that job,
0:17:58 you can, you know the right questions to ask.
0:17:59 Yeah.
0:18:00 You know what looks good.
0:18:02 And then you can get really domain-specific.
0:18:07 Like, on the next, you know, the next level is, like, oh, I need to understand, like, a competitor.
0:18:07 Yeah.
0:18:11 Which then is using real information from the internet in real time, not just statistical.
0:18:14 And then you’re like, well, they already know what the competitor does.
0:18:15 Right.
0:18:22 Like, they’re, and then my favorite scenario is the one that just constantly just has these aha moments is attack this thing I just wrote.
0:18:23 Yeah.
0:18:27 I’m not interested in you getting, adding em dashes and making it a little bit better.
0:18:29 I just want to know, what did I miss?
0:18:34 You said one, I think, recently, on this last one, about, like, here’s my earning statement.
0:18:34 Yeah.
0:18:37 The, for people, that’s the thing you read.
0:18:38 Or tell me the analyst.
0:18:41 That you read after, to the analyst.
0:18:43 That, now, like, attack it like an analyst.
0:18:47 And there’s, like, 6,000 hours per company of analyst questions.
0:18:50 It knows what they’re going to, they only ask me three questions anyway.
0:18:50 Yeah.
0:18:51 Ooh, expense line, you know.
0:18:52 Right.
0:18:54 And I feel like this is the thing that.
0:18:56 Do not watch this if you’re an analyst.
0:18:56 Yeah.
0:19:00 And this is not any advice about being an analyst or anything.
0:19:00 Fine.
0:19:04 But this is what’s really going to happen with writing.
0:19:06 And then it’s going to happen with PowerPoint and slides.
0:19:08 And then it’s going to happen with video.
0:19:15 But it’s really important to call out, which is, you’re getting the consensus mean response.
0:19:20 And so, in the limit, it’s offloading a lot of kind of busy work if you’re a professional.
0:19:22 Like, if you’re a professional, you actually know all of these things.
0:19:24 You just don’t have the time to go through all of it.
0:19:25 And you may not remember it.
0:19:29 So, in a way, it’s productivity helpful.
0:19:33 But it’s not, you know, solving some problems where, you know, you are a particular expert in.
0:19:39 And this is maybe why, for those that are non-expert, it’s a little bit more threatening because it can do that job.
0:19:47 Yeah, well, maybe to bridge a few and probably throw in a different tangent, like, so, Stephen, you’re asking, like, so where is the enterprise now?
0:19:48 So, that was the coding piece.
0:19:56 I think where you’re seeing this is, you know, kind of clear understanding, which is, okay, what I’m going to get out will be correlated to what I put in.
0:19:58 So, how precise I put the prompt.
0:20:08 What, like, I think prompting doesn’t go away anytime soon simply because the leverage you get on the set of instructions you’re going to give the AI at the start is still going to be massive.
0:20:11 Wait, wait, wait, what would, the prompting would away, what would you end up with?
0:20:18 Well, I mean, two years ago, I think that, like, people were like, like, you’ll just tell the AGI what you want it to produce.
0:20:22 There’s just one prompt, like, you unbox it and you say, go do something.
0:20:23 My agent.
0:20:24 Be a software engineer.
0:20:33 Right, no, literally, that was, like, that was, like, an open debate and it was like, no, you’re probably missing the fact that what is in my head is going to be unbelievably germane to the thing that I’m trying to produce.
0:20:35 And, like, I have to somehow give you that context.
0:20:39 Like, there’s no world where you have that context without me telling it to you.
0:20:43 And now you’re seeing it, like, you’re seeing these incredibly unhinged prompts, which are, like, pages long.
0:20:48 And the output you’re getting from that is actually, like, way better than if you didn’t give it that context.
0:20:54 So I think there’s a clear understanding of that side on the enterprise use cases and then a clear understanding that you’ve got to go and review it.
0:20:59 And then on this point about, like, well, you know, what is, you know.
0:21:04 Wait, I just have to say, we forget that formal languages came out of natural languages for a reason.
0:21:10 We didn’t start, we didn’t, like, start with, like, we didn’t start with, like, formal languages, like, oh, it’s much easier to speak in English than speak in English.
0:21:11 It’s the opposite.
0:21:15 It’s like, we have this natural language, we’re like, it’s very tough to convey the information that I want to.
0:21:15 Right.
0:21:16 You and I are both experts.
0:21:19 We understand the solution space, so let’s communicate more efficiently.
0:21:19 Right.
0:21:21 So to think that this somehow wouldn’t happen.
0:21:22 And that’s what jargon is.
0:21:23 Of course.
0:21:27 Jargon is just a formalized way that people who have domain expertise talk to each other.
0:21:28 That’s exactly right.
0:21:40 So the thing that is kind of the most, like, fun to kind of think about right now, at least, is, and maybe you could give us a little history lesson on this in kind of interesting parallels.
0:21:49 So when does the style of work change because of the tool versus the tool sort of adapted to the style of work?
0:22:00 And so what I’m starting, we’re like only in day one of this, but what I’m starting to see kind of some patterns emerge, which is we thought agents would go and learn how we work and then automate that.
0:22:03 And then the question, and so basically agents conform to how we work.
0:22:07 The question is, when is the moment when we conform to how agents are best used?
0:22:09 And you’re seeing this in a couple areas.
0:22:16 So you’re seeing this in engineering to start with, which is like people are saying, okay, I’m going to have agents and then sub agents for parts of the code base.
0:22:19 And then I’m going to give them kind of read me files that the agents read.
0:22:25 And then I’m going to actually optimize my code base for the agent as opposed to the other way around in other forms of knowledge work.
0:22:41 So within how we use Box with our AI product, like you’re starting to see people like basically tell the agent like its complete job and the workflow is now starting to be almost like the agent is almost dictating the workflow in the future as opposed to it’s just mapping to the existing workflow.
0:22:49 So I don’t know like what the history is on this of like, when does the work pattern itself shift because of what the technology is capable of?
0:23:00 But I think probably where this goes has to be some version of that, which is it’s not going to just be the agents just plop into how we currently do our work and then just automate everything.
0:23:07 I do think you start to change what the work is itself and then agents actually go in and accelerate that.
0:23:10 Well, as important as that is, it’s actually more important.
0:23:10 Okay.
0:23:24 Like, because what happens is where there’s, to reuse the word in a different, this anthropomorphization of work, what happens in is that the first tools actually anthropomorphize the work.
0:23:28 And so, like, if you go back, this is every single evolution of computing.
0:23:33 I mean, like, how long did it take for Steve Jobs to get rid of the number buttons on a smartphone?
0:23:41 Like, like, they still had number buttons or, like, you look at cars and until Elon got rid of all the controls, everybody kept all of the controls.
0:23:42 I don’t want to get in that fight.
0:23:53 But, like, what happened with every technology shift is, you know, if you were to look at what accounting software looked like in the 60s before IBM said, stop.
0:24:11 We all use double entry, but we need to have people skilled in how computers can do the accounting, not how people can, because we’re never going to figure out how to close the books if we have to automate this whole room of people’s green eyeshades that have a manual process based on how far apart the desks were.
0:24:11 Right.
0:24:27 And everything that happened with the rise of PCs and personal productivity started off, and I always use this example because I’ve watched it happen, like, five times now, which is the first PCs that did word processing, the biggest request was, how do I fill in, like, expense reports?
0:24:35 And so, this whole world grew up of tractor-fed paper that was pre-printed with the expense report.
0:24:45 And so then, software, we wrote all of this code, like, are you using an Avery 2942 expense report, or is it a New England Business Systems A397?
0:24:57 And, like, you know, and then you had, like, these adjustments in the print dialog, like 0.208 inches, and you moved little things around, and then you would print out, like, eight dinner, $22, and that was all you printed.
0:25:04 And then someone said, you know, we could use the computer to actually print the whole thing.
0:25:12 And then, like, fast forward, and finally, Concur said, you know, why just take a picture, why not just take a picture of the receipt?
0:25:14 And then we could do all of it.
0:25:19 And so then the whole thing gets inverted, and every single business process ended up being like that.
0:25:23 And then there are things that really, really do change the tools.
0:25:27 Like, when email came along, you know, it used to be to prepare an agenda for a meeting.
0:25:35 Somebody would open up Word and type in all the things and then print it out, and everybody would show up to the meeting with this very well-format.
0:25:41 And now, and then, like, email came out, and that whole use case for Word just evaporated.
0:25:47 And then an email agenda became no formatting, nothing, just, like, here are the eight things we’re going to talk about.
0:25:49 And you show up, and everybody’s like, did you get the agenda?
0:25:55 You know, what’s interesting about the AI one is it’s kind of, it’s like we’re seeing the same thing but vis-a-vis AI.
0:25:58 So nobody really predicted the generative stuff.
0:25:59 And we’ve had AI for a very long time.
0:26:05 So we’ve had chatbots, we’ve had, you know, and so you had these kind of, like, AI-shaped holes in the enterprise for a long time.
0:26:11 And a lot of the mistakes that we see today is people are taking the generative stuff and trying to kind of cram it into the old models.
0:26:19 And, like, it’s really a new behavior that’s emerging that’s very much more, like, it used to be you’d centrally sell, you know, AI to some platform team.
0:26:25 And then they would kind of try to get the NLP thing to work or the voice to work for, like, talking to people on the phone for support.
0:26:30 And it was this kind of very central, a lot of the adoption that we see is, like, much more individual, for example.
0:26:34 And so I just think that there is a bit of a mixed match, as we’re seeing now, that it’s getting ironed out, too.
0:26:45 Well, and so I think the question is, yeah, are we in the phase where we’re trying to graft the agents and work in, basically, what we’ve been doing for 30, 40 years of software?
0:26:52 And is this going to be actually, like, the first real step function shift we’ve seen in what the workflow itself should look like?
0:26:53 Oh, we are.
0:26:58 I mean, like, if you, you know, remember, people, like, I tried to jam the internet into Office.
0:26:58 Right.
0:27:01 And it was fun to watch.
0:27:03 But, I mean, you were not watching.
0:27:13 But, like, but everybody around was trying to jam the internet into their product because that’s the only way you could envision it.
0:27:13 Right.
0:27:18 And it didn’t really, like, you were like, well, where else would the internet go?
0:27:20 Like, there’s no word processor on the internet.
0:27:21 Right, right.
0:27:22 Like, there’s no spreadsheet on the internet.
0:27:28 And then other people would be like, well, let me just try to implement Excel using these seven HTML tags with no script.
0:27:31 That turned out to not be a really good idea either.
0:27:32 The best was, like, let’s do PowerPoint.
0:27:33 Well, how do you do it?
0:27:39 You give them five edit controls, tell them their bullet points, and then we’ll generate a GIF on the back end and send it back to you as the slide.
0:27:39 Yeah.
0:27:43 Okay, that, that, that was not, and so there was that whole, like, that.
0:27:46 I think, actually, maybe the main point is just the durability of Office.
0:27:47 It transcends all, all disruptions.
0:27:48 It does.
0:27:50 I like to think it pretty much rises above everything.
0:27:53 But the thing is, is that that’s where we are now.
0:27:53 Yeah.
0:27:59 Is everybody, and, you know, like, do you, but do you think, I mean, just to dig a little bit.
0:28:03 So do you think this is similar to the internet and that it’s a consumption layer change?
0:28:05 Because I always viewed the internet as very much a consumption layer change.
0:28:08 Like, I go to a, you know, instead of going to my computer, I go to the internet.
0:28:19 But otherwise, things kind of are the same where AI has got this weird quirk, which for the first time I can recall, programs are abdicating logic to a third party.
0:28:20 Like, we’ve always abdicated resources.
0:28:21 Yeah, yeah.
0:28:24 Like, so we’d be like, okay, I’ll use your disks or whatever, but, like, I’m writing the logic.
0:28:27 But this time it feels like we’re changing the consumption layer.
0:28:35 So, like, you know, when my son, you know, talks to an AI character, and, you know, he’s not going to Wells Fargo.com, he’s going to an AI character.
0:28:38 And so, like, that’s changing kind of how we’re interacting with the computer.
0:28:44 But also, these programs are no longer kind of written by a human in the same way.
0:28:47 So, I feel like the change is maybe a bit more sophisticated.
0:28:52 Oh, I think, but this is the, this is why it’s a platform shift, and not just an application shift.
0:29:00 Like, where each platform shift changes the abstraction layer with which you interact with computing.
0:29:04 But what that also does is it changes what you write the programs to.
0:29:06 Do you remember ever abdicating logic?
0:29:11 Oh, here’s a great, here’s an example of how disruptive this can be.
0:29:19 The first word processors in the DOS era, the character mode era, they all implemented their own print drivers and clipboard.
0:29:27 So, if you were Lotus and you wanted to put a chart into a memo, you, you, you couldn’t, because you didn’t have a word, you didn’t sell a word processor.
0:29:33 So, you actually made a separate program to make something that the leading word processor could consume.
0:29:38 And if you were perfect, your ads said, we support 1700 printers.
0:29:38 Wow.
0:29:42 Like, and you won reviews because you had 1700 and Microsoft had 1200.
0:29:43 Oh, that’s so…
0:29:44 And so then along comes…
0:29:45 That’s a great one, actually.
0:29:46 And so Windows comes along.
0:29:56 And if you were, and if you were trying to enter the word processing business, step one, I need to hire a team of 17 people to build device drivers for Epson and Okidata and Canon printers.
0:29:58 Because you can’t get them anywhere.
0:30:03 Microsoft came along and for Windows built print drivers and a clipboard.
0:30:20 And all of a sudden, and also Macintosh did it, all of a sudden, you, there was a way that two applications that had no a priori knowledge of each other, but of course, if you were word perfect or Lotus, that’s a dis, you got creamed by that because your ability to control your information.
0:30:25 And so, and what happened was a bunch of developers were like, wow, this is cool.
0:30:39 Because now I’m just by my, when we did C++ for Windows, like we were like, where the demo, in fact, at that Cumberly Community Center, I would go and I would show brand new Windows programmers in 1990, like, hey, you don’t have to write print drivers and use the clipboard.
0:30:43 And like literally standing ovation of, you know, 10 people at the thing.
0:30:52 And, but, but they were like more than happy to let data interchange between product because they were like, that’s nothing but opportunity for me.
0:31:03 Can we get, you got, like, they, they probably from an emotional standpoint felt exactly the same way as like a vibe coder does today, which is like, you’ve just given me the platform that it was just a print driver.
0:31:09 The writing code for Windows book was like this big, but the writing a device driver for an Epson printer was this big.
0:31:09 Yeah, yeah, yeah.
0:31:11 Writing it for a Canon printer was this big.
0:31:20 And so, I’m just, I’m just actually trying to think of like the, but the paradigm shift is the same, which is there’s been many times where we’ve reduced the amount of work a developer takes.
0:31:23 But I just don’t remember ever where the programmer advocates logic.
0:31:25 Like, so for example, SDM didn’t?
0:31:27 Not logic.
0:31:30 Like, I, I would always say what is correct and what’s not correct.
0:31:31 Right?
0:31:32 I think you undersold it though.
0:31:38 No, this is the thing, by the way, everybody, that Martine invented and worked on and stuff.
0:31:39 Like, but, but it’s a big deal.
0:31:43 Maybe we should post more of your pitch at the time if you’re not pitching this.
0:31:48 I mean, well, no, no, like logic specifically, which is, I am writing an app.
0:31:54 My app is, whatever, some vertical SaaS app for a certain customer base.
0:31:58 The answer the app gives is based on logic that I’ve written historically, right?
0:32:01 Like, if I run it on the cloud, the cloud is not producing an answer.
0:32:02 It’s providing resources.
0:32:08 If I’m using your device driver, it’s, you know, providing access to a device resources.
0:32:12 But if I’m like, hey, large model, tell me the answer here.
0:32:13 You’re actually abdicating application.
0:32:15 I think maybe you’re right.
0:32:15 Maybe this is not going to feel.
0:32:20 No, I just think, I think what you’re, you’re, you’re almost playing like incumbent in, in the
0:32:26 sense of trying to, no, trying to like decide this is abdicating the logic and this isn’t.
0:32:31 When in fact, like, it, it really was like a huge competitive advantage for Word Perfect.
0:32:34 And, and they didn’t want to give it up.
0:32:36 And they, they fought against it.
0:32:39 And the number of people who didn’t want to do like great clip.
0:32:43 And the next example, of course, is the browser where people literally gave up.
0:32:48 Like you in a, in Windows or in Mac, you could rasterize anything you wanted.
0:32:51 You wanted a button that you pushed and it spun and it animated like a rainbow.
0:32:53 You could do that in your product.
0:32:59 But then the web came along and you’re like, wow, I have to use a gray button that says submit.
0:33:04 And, and that was like, we do use a bunch of third party things.
0:33:06 Well, but it took a long time for those to show up.
0:33:16 And so early in the internet, the magazines in particular and, and the printed media were the ones who absolutely wouldn’t go to the internet because they would not give up their ability to format.
0:33:29 And they, and this is, is another part about the tooling and where, where, what happened, what’s going to happen with AI is that like a huge amount of the productivity software space today is like the preparation of output.
0:33:32 Like Office is basically a format debugger.
0:33:37 Like all it is, is like 7,000 commands for how to do kerning and bold and italic.
0:33:42 And like, it turns out AI not only doesn’t care, you could ask it to make whatever you want.
0:33:46 Like you could just say, I’d like this to be a double index part chart thing.
0:33:47 That’s not a thing.
0:33:49 I just, but, but you, you can do that.
0:33:52 And it, it will just figure out something that looks like that.
0:33:53 And you’ll go, Ooh, cool.
0:33:57 And this was where to this disempowering and experts and who’s not an expert.
0:34:04 When, when productivity software rose, the big thing about it was that there were people who figured out how to like make like killer charts.
0:34:07 Like Ben at the ovens, like killer chart guy.
0:34:07 Yes.
0:34:11 And there were people who were like, every meeting started with, how did you make that chart?
0:34:15 Like I could be on an airplane and somebody would be like making a shitty chart.
0:34:20 So like, in this case, the application is like, actually, what’s the way to, to visually represent the data?
0:34:21 Which is absolutely an abdication.
0:34:21 Right.
0:34:31 And it turns out, well, because like 90% of the people never really got to be expert at doing that task, even though 90% of the tool is about like to even, so what happens is each generation.
0:34:33 But the programmer didn’t abdicate the logic in this case.
0:34:33 This is the user.
0:34:37 But, you know, what’s the user, what’s the programmer in that?
0:34:48 And, and in fact, what the programmer was doing was like, like we would invent a thing called wizards or whatever, you know, and that would make a whole bunch of choices for you, style sheets or whatever.
0:34:55 And so in a sense, we were making a bunch of choices for the user, which to the experts looked like disempowering the experts who were tweaking.
0:35:03 And so I, this is all like this, there, there’s some Steve Jobs quote that he loves about Schopenhauer, how if you’ve seen the, the conjurer.
0:35:06 If you’ve seen the conjurer, it’s not a trick anymore.
0:35:13 And I really feel like this is like the third or fourth time that this has happened just in my lifetime of watching this.
0:35:25 Well, so something that’s really caught my attention, because it’s the most senior people I know, is that a lot of very senior developers are spinning up a lot of background agents, like code agents, and they’re interfacing at like the GitHub PR level.
0:35:32 And so it’s not obvious to me why you do a bunch as opposed to one, and it’s not obvious to me why you wouldn’t interact directly.
0:35:35 So it feels like something’s going on here, but I’m not quite sure what, and I would love your thoughts.
0:35:54 Well, my read on it, and then I guess I would kind of sort of throw out like what then happens next as a result of this, because to me it’s actually a little bit of an epiphany on what the future work design could look like in this world, because engineers, back to the prior conversation, are just the first to experience this.
0:36:12 But I think what my read from talking to kind of similar folks that are like all in on this is this mix of basically effectively the context rot problem, which is, you know, the more that we put in the context window, the more it gets confused, the lossier the answers get.
0:36:16 And so you have to have some kind of way to partition what an agent should work on.
0:36:26 And we see this in building agents internally, which is, you know, the panacea that I think we maybe would have hoped for is like, well, you just put a million tokens into the context window, and then obviously, you know.
0:36:31 Oh, so you’re saying this is almost like a counter trend to the AGI is almost like the opposite.
0:36:36 It’s the opposite, but it only works because the models are so good.
0:36:40 Yeah, but you’re giving more things more specific tasks rather than one thing less specific tasks.
0:36:43 Right, and so, but like, I think this is why it’s happening.
0:36:52 So basically, the craziest version of this is I was talking to somebody who is in startup land, and they have, to your point, they have all these sub-agents.
0:36:58 But what’s amazing is it maps one-to-one to each microservice in their code base.
0:36:59 And so they have an agent per microservice.
0:37:04 They have effectively a readme for the agent, and that agent owns the microservice.
0:37:10 And they, I don’t know the specific number, but let’s just say you could have, you know, dozens or hundreds of these things going on.
0:37:21 And you’re effectively mitigating this issue, which is, if you just said, here’s my entire code base, you know, go run wild, you know, to one agent, it’s, it will just, you know, produce worse and worse code over time.
0:37:22 Yeah.
0:37:23 Because it’s going to have context rot.
0:37:27 It’s not going to know exactly what you’re trying to do in that one area of the microservice.
0:37:30 But the sub-agent model seems to be working for that paradigm.
0:37:32 I love this counter pattern.
0:37:32 Yeah.
0:37:39 Because everybody is like, they’re going to, like, you know, models will get, you know, smarter, and you’ll give them higher level tasks, and they’ll do things longer.
0:37:39 Yes.
0:37:40 This is a counter one.
0:37:42 I want to tweet that, but you have more Twitter followers.
0:37:42 Oh, wow.
0:37:44 We can, we can collectively do it.
0:37:48 Well, but then, so then the question is, okay, so let’s just assume this works in engineering.
0:37:54 You have this interesting dynamic, which is, well, then that means that, like, some of the coding practices will be pretty different in the future.
0:37:58 We’ve talked about this idea of, you know, the individual engineer becomes the manager of agents.
0:38:01 So that was already kind of, I think, a well-understood path.
0:38:04 This is, like, a supercharger of that, of that concept.
0:38:08 And then the question is, like, how does that translate to almost every form of work?
0:38:26 Because if I am now, you know, the lawyer and working on cases, and I can have 20 sub-agents that all, you know, do a different case and then basically, you know, come back in some kind of task queue that I’m going through, like, obviously, one, just the sheer leverage now you get is going to be insane.
0:38:37 But I do think the way that you, you know, might even organize the work and what the, you know, workflows within an organization are, you know, inevitably going to change as a result of that.
0:38:53 Oh, but, I mean, I think this just gets to the, you know, essentially that the flow in the workflow has been serialized or linearized based sometimes on knowledge, but other times on tooling.
0:39:00 And so what happens when the tooling changes is you just get this realignment of what’s truly serial and what’s not.
0:39:09 Like, if you’re planning an event for a company, which is still going to keep happening, you know, like, oh, I have to book the venue, I have to invite all these people, we have to create all these materials.
0:39:12 Well, they’re actually not particularly gated on each other.
0:39:12 Right.
0:39:16 But if you have an events person, they’re gated.
0:39:16 Right.
0:39:23 And so now an events person can start spinning up all of these, these different elements and then they’re going to come back.
0:39:27 Like, I’ve gotten as far as I can on collateral until I get a logo for this event.
0:39:28 Right.
0:39:33 Like, I’ve gotten as far as I can on invites until I get the date and the time and the venue.
0:39:40 And I think there’s no, there’s no reason why you can’t spin up all those in parallel because, of course, how does that happen today?
0:39:46 Well, if you’re a company and you use Box and you’ve done, this is your 58th event, you know, you have a folder called event.
0:39:46 Right.
0:39:52 And people take the folder and go event 59 and they make a copy of it and all the stuff in it.
0:40:01 And, well, if you think about that workflow, that’s exactly what a series of different background tasks or agents could go do.
0:40:11 And so I think the reason that you could be doing all that encoding is, well, there was a natural, there’s a natural way to break that up because there’s a bunch of programming that’s not.
0:40:12 Right, but there’s the other side.
0:40:21 But there’s also a bit of an indictment on the ability of you to give it a high level, you know, it kind of suggests that the human being needs to be, you know, giving them more granular orders.
0:40:27 Otherwise, you know, to start a company, you’d issue one prompt, you’d go to the beach for six months and you’d go back and you’d have a full company.
0:40:36 Which is this almost re-anthropomorphizing effect, which is, like, it turns out we did kind of figure out division of labor.
0:40:45 We figured it out in the context of a lot of physical, you know, kind of analog limits that we clearly had that agents won’t have.
0:40:49 But we now, you know, there’s no kind of, you know, total free lunch.
0:40:55 So you have this context rot issue, which is that you do actually have to subdivide the tasks at some point.
0:40:56 So then the question is, like, what are the right—
0:40:58 I mean, it may not be a context rot issue.
0:41:04 Like, the Occam’s razor here is you need to give them specific instructions for specific tests.
0:41:04 Right.
0:41:08 And if you give them higher level instructions, independent of context, they just don’t know what you want.
0:41:10 Right, and this gets to the formal language part.
0:41:10 Yeah.
0:41:18 Like, at some point, if you try to use, like, the Uber frontier to get the whole thing done, you have to tell it the whole thing.
0:41:19 Yeah, exactly.
0:41:21 And that just seems like a lot of work.
0:41:36 Whereas if you have to tell it less because the part of the model you’re using knows more, it’s basically a different way of thinking about templates or a different way of thinking about starting artifacts or scoping the context in a generic world of context.
0:41:47 But then there’s this—I mean, it might, though, be the right architecture in general if you assume that, you know, there’s—we’re never going to get to a point where the model is just 100% perfect, right?
0:42:02 And so it might also be the right kind of architecture design because at some point you’re going to have—you don’t want an agent or a set of agents to go so far down a path when there was a step that it needed to check in with you on because there’s just the compounding effect of that.
0:42:12 So you do need to kind of subdivide the work also because if you do have gating, you know, moments that are going to have a bunch of dependencies, the agent does need to know, like, at what point should I roll that back up to the user?
0:42:22 Yeah, again, against the common narrative, now that I think about it, it seems that the trend is prompts are getting more complex, not less.
0:42:29 And we’re seeing more agents, not less, doing more narrow tasks, which is almost this kind of counter-AGI narrative.
0:42:33 It’s almost like these are much more specialized and much more deep, looking with much more specific instructions.
0:42:57 And there’s, like, sort of a history of this, wow, maybe we can actually solve it if we’re specialized a little bit more in—like, if you take expert systems, at first they thought expert systems would just be experts, and they would just know—and then, like, by the time you got to the actual published research, like, at Stanford, it was like, this is an expert system in deciding on what type of infectious disease you have, as long as you have one of these seven.
0:43:02 No, literally, there was a paper that was like, there’s just one digestive disorder that actually is a medical type system.
0:43:14 I do want to, though, because you wouldn’t want—like, there is one big difference, which is somehow the model itself is packing in the inherent intelligence or capability to solve all of these.
0:43:21 Like, we are benefiting from the fact that at least you can build these all on CLOD4 and GPT-5.
0:43:24 And that all on a computer, too.
0:43:34 Let me try to show, like, demonstrate this one with an old person example on this one, which was, like, early in the PC era, there were word processors and spreadsheets and graphics and databases.
0:43:38 And a lot of people were like, why are there these four programs?
0:43:40 There should only be one program.
0:43:46 And my answer to that, like, which often involved screaming, was, have you been to an office supply store?
0:43:55 Because if you go to an office supply store, there’s, like, paper with numbers, and then there’s blank rectangles of paper, and then there’s transparency paper.
0:43:58 And, like, this has been around a really long time.
0:44:00 There’s some reason that these are different.
0:44:01 It’s human context.
0:44:05 How many minutes did it take you for you to know Google Wave wasn’t going to work?
0:44:06 Zero.
0:44:07 Okay, okay, okay.
0:44:08 It was instant.
0:44:09 It was instant.
0:44:12 No, I mean, and, but this was the thing.
0:44:20 There was a product, ancient Mac product, that was lauded by the industry called Claris Works, which was like, oh, it does, you could have a spreadsheet inside a word processor.
0:44:23 And my first reaction is, have you seen a person use a spreadsheet?
0:44:25 Because their monitor can’t be big enough.
0:44:34 So they just want as many cells as you could possibly have, and you’re sitting there saying it has to fit on an 8.5 by 11 sheet of paper on a Mac.
0:44:42 And I think that one of the things that happens is that these lenses that humans bring to specialization, like, really, really matter.
0:44:51 And if you think about the medical profession, and you think about going from a GP to the radiologist to a specialist to a nurse practitioner, through the whole series,
0:44:55 they’re each going to look at and use AI in a different way.
0:45:03 So then the only thing would be, okay, so that was, that level of specialization and division of labor emerged over a hundred year period.
0:45:03 Right.
0:45:12 With, you know, alongside tools, but also with, driven by a lot of the physical constraints and realities of how organizations emerged.
0:45:19 So the only question would be, in a post-agent world, in 10 years from now, do those divisions of labor look exactly the same,
0:45:24 or do those shift also because the agents collapse, you know, some of the functions, and is there some blurring?
0:45:26 And then is there just a new set of roles?
0:45:34 Like, like, clearly there’s a role in a bunch of organizations emerging, which is like, no, I’m just like, my role is like, I’m the AI productivity person.
0:45:40 And like, I just like, have a way of, you know, creating all new forms of productivity in the organization with AI.
0:45:46 So like, clearly we’ll have a bunch of new roles, but is our current division of labor going to also collapse in some interesting ways because of AI?
0:45:54 Well, I think that, like, if you actually stick with the medical example, we’re just going to wake up and there’s going to be way more people with way more specialties.
0:45:54 Right.
0:45:57 And AI will have created more jobs.
0:45:58 And in the interim,
0:46:01 Do you think AI causes more specialization over time?
0:46:01 Absolutely.
0:46:01 Yeah.
0:46:04 Because everyone’s getting, every human is going to be way better.
0:46:04 Right.
0:46:06 And more knowledge will amount.
0:46:12 And I think this is a thing that has really happened with computing that people forget.
0:46:15 Like, there used to just be like this morass of marketing.
0:46:16 Right.
0:46:17 And R&D.
0:46:17 Right.
0:46:21 And all of a sudden, like, just, and there used to just be coding.
0:46:28 And then there was coding and testing and design and product management and program management and, you know, usability and research and all of these specialties.
0:46:30 And all of those had their own tools.
0:46:30 Right.
0:46:31 Go to a construction site.
0:46:34 I remember growing up, our neighbors built a house.
0:46:35 We live in an apartment.
0:46:36 And they built a house.
0:46:37 And there was Clem, the carpenter.
0:46:42 And you built a house with a guy named Clem who used all the tools and everything.
0:46:43 And now, like, you build a house.
0:46:50 And it’s like this 20-person list of subcontractors, all who have whole companies that do nothing but, like, put in pavers.
0:46:51 You know?
0:46:52 And that’s what it’s going to be.
0:46:56 I mean, there’s been a long disaggregation in the history of IT, right?
0:46:58 Like, everything in the same sheet metal.
0:47:01 Then, you know, disaggregate the OS and the hardware.
0:47:02 Then you disaggregate the apps.
0:47:03 Right.
0:47:05 And then, it was kind of interesting.
0:47:08 Like, in the last 15 years, we saw the app.
0:47:11 And, like, independent functions got disaggregated, right?
0:47:14 It’s like almost everything became, like, an API would become a company, right?
0:47:15 You’d have, like, Twilio’s.
0:47:16 Like, Auth became a company.
0:47:18 Like, PubSub became a company, et cetera.
0:47:26 And so, it may very well be the case that every agent becomes, like, a whole new vertical and a whole new specialization.
0:47:29 Well, and then you can actually build a company around it.
0:47:33 Like, it may be the case that today, just like with APIs, one company will have a whole bunch of agents.
0:47:37 It may be the case in the future that a third party will provide that agent as an independent company.
0:47:42 Well, it’s so, it’s the opportunity, to your point, is really there for that.
0:47:43 Yeah.
0:47:50 Because, like, it used to be, like, the impedance to creating a company and distributing was infinite.
0:47:56 It used to be ridiculous to think that a single API like Auth could become a company, but then, you know, of course it became a company.
0:48:00 Or it used to be ridiculous to think you could build a whole company out of signing documents.
0:48:00 Right.
0:48:07 And, like, not just a whole company, but then all of a sudden you realize, wow, the addressable market for that is huge.
0:48:16 And it’s way bigger than signing because of all the stuff that got done that was just baked into a company causing headcount and waste and fraud and abuse.
0:48:21 Well, I think you can kind of underwrite thousands of these companies emerging.
0:48:35 So, Jared Freeman had a tweet about basically, like, go deep on a workflow, you know, basically do the job of some part of the economy, payroll specialist, and then build an agent for that.
0:48:39 And it’s not obvious that there’s not literally, like, a thousand of those.
0:48:39 Yeah, yeah, yeah.
0:48:41 So, by every vertical and every line of department.
0:48:44 I just love this because this is, like, literally the anti-AGH.
0:48:50 It’s basically following, like, the long arc of computer science where, as the market grows, the level, the granularity can create a company.
0:48:51 Well, it’s also economic growth.
0:48:52 Like, take that payroll example.
0:49:00 Like, today, just like Salesforce, which is always my favorite example, like, the idea of having a productive Salesforce used to just be a consultancy.
0:49:00 Right.
0:49:09 And the only way you could ever fix it was hiring a consultancy to show up and analyze what everybody does and then do a report that says, this is how you need to reorg.
0:49:12 And it usually meant go the opposite of whatever you have, and then they would leave.
0:49:16 And then, you know, people tried, but there was no cloud.
0:49:22 So, to build, like, CRM, you had to do all that consulting work and then roll it out.
0:49:25 And then it was static and you couldn’t maintain it.
0:49:25 Right.
0:49:29 And then, all of a sudden, there’s, like, oh, here’s Mark Benioff and here’s a whole way to do all this.
0:49:32 And not only that, the people actually like it.
0:49:32 Right.
0:49:39 And they think they’re better at selling because they’re using their phone and they’re putting in a few notes about this client, which helps everybody.
0:49:43 And I think that’s what’s really going to happen with all of this.
0:49:49 And so, suddenly, something that looks really, really small becomes, like, a whole thing because there’s no problem with distribution.
0:49:51 There’s no problem with customization.
0:49:55 You know, we’ll actually have ways to solve security and privacy.
0:49:58 Just like we solved reliability and things like that.
0:50:05 And I think it’s just, I mean, look at, you know, the stuff that you’re a world expert in and the stack of internet technology, of networking technologies.
0:50:09 I mean, you would have asked me 15 years ago, was CDN big companies?
0:50:10 I never would have.
0:50:12 I’m like, I don’t think it makes any sense.
0:50:13 Like, how could you have a company that’s a cache?
0:50:20 I think that people are probably way too afraid of the model providers kind of eating them.
0:50:33 And I think it was, I think it was basically a phenomenon in the first wave, which was if you were just doing, like, basic, like, like, if you had figured out that you could do something on GPT, you know, two and three,
0:50:35 where it was a text interface that produced more text.
0:50:37 Like, yes, chat to PT ate you.
0:50:38 Like, like, that, that clearly happened.
0:50:39 Yeah.
0:50:45 But basically since then, most enterprises want kind of applied use cases for AI and AI agents.
0:50:52 And so, so it’s not obvious that the current crop of companies, if you’re doing AI for healthcare, if you’re doing AI for life sciences,
0:50:57 if you’re doing AI for financial services, if you’re doing AI for coding at the right parts of the stack,
0:51:06 AI for coding may be the one asterisk area, which will be hypercompetitive, simply because the model companies, like, don’t want to use somebody else’s product to build their own models.
0:51:08 And so that kind of almost forces them to get really good at AI for coding.
0:51:18 But with that as the one kind of, you know, exception, I think basically we’re just in a five-year period right now where you’re going to have to build agents for every vertical, every domain,
0:51:21 and there’s a playbook that’s starting to emerge of what that needs to look like.
0:51:27 I mean, so I think there was kind of a technical head fake that happened early on, which was pre-training.
0:51:31 So the pre-training really was a 10 out of 10 technical innovation.
0:51:38 I can’t tell you, like, two years ago, if somebody was, like, I had a friend that was building, like, their own aging model, post-training aging model.
0:51:40 Like, we’re going to make it so good at aging.
0:51:43 Like, you know, this is a text-to-image model.
0:51:46 And they wanted to make it so, like, old people looked really good at it.
0:51:51 And then, of course, the next version of, like, mini-training, whatever comes out, and it does a better job of it.
0:51:54 And the thing with pre-training was you’re just kind of consuming all of the world’s existing data.
0:51:57 You’re draining all of that energy, and it perfectly generalized, right?
0:52:04 But it feels like technically that’s passed, and now we’re more in post-training in RL, which is a lot more domain-specific.
0:52:05 And so—
0:52:09 Well, and the moment that you have access to some set of data that is only—
0:52:09 Exactly.
0:52:12 —just for that enterprise, and so who gets permission to access that data?
0:52:14 Who gets permission to do the workflow on it?
0:52:15 It’s going to be applied companies.
0:52:22 Yeah, so, yeah, and if we had an infinite number of tokens, then the models would just continue to generalize.
0:52:23 But it’s pretty clear that that’s not happening.
0:52:28 And so now we’re going into—which we all understand very well, which is now companies have to choose which domains they go into.
0:52:29 Right.
0:52:31 And they’ve got to solve the long-tailed problems there, get access to the data, et cetera.
0:52:38 And I also think that there’s—that the shadow, having been the shadow, the shadow cast by large companies over,
0:52:41 we’re going to put you out of business and stomp you, it’s ridiculous.
0:52:46 And it has never, in any technology way, lived up to the fear that people have.
0:52:50 Look, if you built a new word processor in 1995, you were an idiot.
0:52:53 Like, that was not the thing to go build.
0:53:00 You know, and—but, you know, there was a time just 10 years earlier where, like, companies built standalone spell checkers.
0:53:02 Like, it was just a thing.
0:53:04 You went to the store and you bought a spell checker.
0:53:07 And, like, it had more words than the other spell checker.
0:53:15 And so the thing is that that’s not being said now, which we should do a whole one on, is, like, what is the actual platform?
0:53:16 Yeah, this is a great topic.
0:53:22 Because you—like, it’s very—it’s all well and good to say that the large models will go subsume every application.
0:53:25 The thing is, the minute they start doing that, no one will be in their platform.
0:53:26 That’s a great topic.
0:53:26 Right.
0:53:36 Because, like, no developer is going to sit around and say, if you’re going to subsume me, then—and this is—there’s a phrase that it’s Sherlocking on the Mac and the Apple world to this thing.
0:53:39 And so it does—it has a real chilling effect.
0:53:42 And that’s one of the things all the model people are going to learn very, very quickly.
0:53:48 There’s a chilling effect, but there’s also just—I think there really is just a problem of, like, it’s hard to go deep in 50 categories.
0:53:49 Like, you just can’t—
0:53:53 I think everybody’s scared because pre-training was actually the one thing that was good at that.
0:53:56 And then now they have to actually—yeah, I agree.
0:54:07 You do have to, like—at some point, it becomes purely just an execution issue, which is, like, I don’t know how anybody would set up a company to be able to beat 50 startups across 50 different domains.
0:54:07 No, it’s ridiculous.
0:54:17 And, in fact, like, it’s only good because what happens is that the big company raises—the big company raises the awareness of a whole category.
0:54:23 And then you just swoop in and you go, to them, I’m just a feature.
0:54:24 Right, yeah.
0:54:26 But to you, I’m—this is my whole life.
0:54:27 Right.
0:54:31 And you’re going to—I just—I always come back, there’s a whole company that just signs things.
0:54:31 Right.
0:54:36 Like, I can’t—I cannot believe there’s a whole company that just signs things.
0:54:38 I have so much to say about this topic.
0:54:53 I mean, even minimally, if you graph, like, like, the cost to produce—so, the willingness to pay for an inference versus the cost to serve it, something like, for most companies, for most spaces, 20% of the inferences are 80% of the cost.
0:54:58 So, like, actually, the problem of the application is just to choose those ones on, which tend to be more domain-specific.
0:54:59 Yeah, yeah.
0:55:02 This is the problem of inviting the three of us on here, which is, like—
0:55:04 We just opened up, like, the next two hours.
0:55:06 Just getting us to shut up in the trick.
0:55:08 Guys, thank you so much for coming on.
0:55:09 This is fantastic.
0:55:13 Thanks for listening to the A16Z podcast.
0:55:19 If you enjoyed the episode, let us know by leaving a review at ratethispodcast.com slash A16Z.
0:55:22 We’ve got more great conversations coming your way.
0:55:23 See you next time.
0:55:27 As a reminder, the content here is for informational purposes only.
0:55:38 A16Z should not be taken as legal business, tax, or investment advice, or be used to evaluate any investment or security, and is not directed at any investors or potential investors in any A16Z fund.
0:55:43 Please note that A16Z and its affiliates may also maintain investments in the companies discussed in this podcast.
0:55:50 For more details, including a link to our investments, please see A16Z.com forward slash disclosures.
What exactly is an AI agent, and how will agents change the way we work?
In this episode, a16z general partners Erik Torenberg and Martin Casado sit down with Aaron Levie (CEO, Box) and Steven Sinofsky (a16z board partner; former Microsoft exec) to unpack one of the hottest debates in AI right now.
They cover:
- Competing definitions of an “agent,” from background tasks to autonomous interns
- Why today’s agents look less like a single AGI and more like networks of specialized sub-agents
- The technical challenges of long-running, self-improving systems
- How agent-driven workflows could reshape coding, productivity, and enterprise software
- What history — from the early PC era to the rise of the internet — tells us about platform shifts like this one
The conversation moves from deep technical questions to big-picture implications for founders, enterprises, and the future of work.
Timecodes:
0:00 Introduction: The Evolution of AI Agents
0:36 Defining Agency and Autonomy
1:54 Long-Running Agents and Feedback Loops
4:49 Specialization and Task Division in AI
6:20 Human-AI Collaboration and Productivity
6:59 Anthropomorphizing AI and Economic Impact
9:10 Predictions, Progress, and Platform Shifts
11:31 Recursive Self-Improvement and Technical Challenges
13:20 Hallucinations, Verification, and Expert Productivity
16:20 The Role of Experts and Tool Adoption
22:14 Changing Workflows: Agents Reshaping Work Patterns
45:55 Division of Labor, Specialization, and New Roles
48:47 Verticalization, Applied AI, and the Future of Agents
54:44 Platform Competition and the Application Layer
55:29 Closing Thoughts and Takeaways
Resources:
Find Aaron on X: https://x.com/levie
Find Martin on X: https://x.com/martin_casado
Find Steven on X: https://x.com/stevesi
Stay Updated:
Let us know what you think: https://ratethispodcast.com/a16z
Find a16z on Twitter: https://twitter.com/a16z
Find a16z on LinkedIn: https://www.linkedin.com/company/a16z
Subscribe on your favorite podcast app: https://a16z.simplecast.com/
Follow our host: https://x.com/eriktorenberg
Please note that the content here is for informational purposes only; should NOT be taken as legal, business, tax, or investment advice or be used to evaluate any investment or security; and is not directed at any investors or potential investors in any a16z fund. a16z and its affiliates may maintain investments in the companies discussed. For more details please see a16z.com/disclosures.