Building the Real-World Infrastructure for AI, with Google, Cisco & a16z

Leave a Reply

AI transcript
0:00:03 The good news is infrastructure is sexy again, so that’s kind of cool.
0:00:10 This is like the combination of the build-out of the internet, the space race, and the Manhattan Project all put into one.
0:00:19 Where there’s a geopolitical implication of it, there’s an economic implication, there’s a national security implication, and then there’s just a speed implication that’s pretty profound.
0:00:23 I mean, I think it’s easy to say. I’ve seen nothing like this. I’m fairly certain no one’s seen anything like this.
0:00:31 The internet in the late 90s, early 2000s was big, and we felt like, oh my gosh, can’t believe the build-out, the rate.
0:00:36 This makes it, I mean, 10x is an understatement. It’s 100x what the internet was.
0:00:42 The AI boom isn’t just changing software. It’s transforming the physical infrastructure that runs it.
0:00:50 Today, you’ll hear a conversation with Amin Vadat from Google, Jitu Patel from Cisco, and Ragu Raghuram from A16Z
0:00:57 on what it takes to build the real-world systems behind large-scale AI, from chips and power to data centers and networking.
0:01:02 They discuss the scale of the current build-out, the new constraints on compute power and interconnect,
0:01:08 and how specialization in hardware and architecture is reshaping both the industry and global geopolitics.
0:01:14 It’s a grounded look at how infrastructure itself is being reinvented for the AI era and what comes next.
0:01:15 Let’s get into it.
0:01:20 What better time and place to talk infrastructure?
0:01:21 All right.
0:01:29 So we were back in the green room, and just as the first question was getting answered, I got cut off.
0:01:31 So this could be an entire repeat for all I know.
0:01:33 So anyway, let’s go, right?
0:01:35 The first question is similar.
0:01:38 So both of you, firstly, welcome and thank you for being here.
0:01:39 Thank you.
0:01:42 And I hope you’ll have a great day and a half as well.
0:01:49 Both of you have been in the industry for a while, and both of you have lived through many infrastructure cycles, right?
0:01:54 So have you seen anything like this cycle from your vantage point?
0:02:04 Not from an investor vantage point, but from your internal vantage point where you are responsible for building things and planning for things and so on.
0:02:06 Any one of you, where do you want to start?
0:02:07 You want to start, Amin?
0:02:09 I mean, I think it’s easy to say.
0:02:10 I’ve seen nothing like this.
0:02:12 I’m fairly certain no one’s seen anything like this.
0:02:20 The internet in the late 90s, early 2000s was big, and we felt like, oh my gosh, can’t believe the build out, the rate.
0:02:23 This makes it, I mean, 10x is an understatement.
0:02:26 It’s 100x what the internet was.
0:02:31 I think the upside is, as big as the internet was, same thing, 10x and 100x.
0:02:32 Yeah, nothing like it.
0:02:33 Yeah, I’d agree.
0:02:36 I don’t think there’s any priors to the size, the speed and scale.
0:02:40 I’d say the good news is infrastructure is sexy again, so that’s kind of cool.
0:02:42 It was a long time where it wasn’t sexy.
0:02:50 The thing I would say that’s really interesting is this is like the combination of the build out of the internet, the space race.
0:03:04 And the Manhattan Project all put into one, where there’s a geopolitical implication of it, there’s an economic implication, there’s a national security implication, and then there’s just a speed implication that’s pretty profound.
0:03:07 So, yeah, none of us have ever seen it at this size and scale.
0:03:10 On the other hand, I think we are grossly underestimating.
0:03:14 Like, the most common question I ask right now is, is there a bubble?
0:03:16 I think we’re grossly underestimating the build out.
0:03:20 I think there’s going to be much more needed than what we are putting the projections towards.
0:03:23 So, that’s the follow-on question.
0:03:26 Where are we, do you think, in the CapEx spend cycle?
0:03:31 But more importantly, what are the signals that you guys use internally, right, in your thinking?
0:03:36 I mean, you have to plan data centers, whatever, four or five years in advance.
0:03:38 You have to buy nuclear reactors and whatnot.
0:03:43 So, how do you think about the demand signals as well as your technology signals?
0:03:49 And, JITO, the same thing for you, but from the point of view of enterprise and neoclards, et cetera.
0:03:50 Amit?
0:03:52 We’re early in the cycle, is what I would say.
0:03:54 Certainly relative to the demand that we’re seeing.
0:04:00 And internally, externally, we’re, I mean, I can say here, oversubscribed tremendously.
0:04:05 In other words, our internal users are, we’ve been building TPUs for 10 years.
0:04:09 So, we have now seven generations in production for internal and external use.
0:04:14 Our seven and eight-year-old TPUs have 100% utilization.
0:04:17 That just shows what the demand is.
0:04:19 Everyone, of course, prefer to be on the latest generation.
0:04:21 But whatever they can get.
0:04:29 So, this tells me that the demand is tremendous, but also who we’re turning away and the use cases that we’re turning away.
0:04:31 It’s not like, oh, yeah, that’s kind of cool.
0:04:35 It’s, oh, my gosh, we’re actually not going to invest in this.
0:04:38 And there’s no option because that’s where we are on the list.
0:04:40 Same with many of you in the room.
0:04:47 We’re working with many of you in the room, and many of yours are telling me directly, and thank you, we need more earlier.
0:04:53 Now, the challenge here, though, is, as you said, we’re limited by power.
0:04:56 We’re limited by transforming land.
0:04:57 We’re limited by permitting.
0:05:02 And we’re limited by backup delivery of lots of things in the supply chain.
0:05:11 So, one worry I have is that the supply isn’t actually going to catch up to the demand as quickly as we’d all like.
0:05:16 I heard in the previous session some of the discussions of the trillions of dollars that we’re going to be spending, which I think is accurate.
0:05:19 I’m not sure that we’re going to be able to cash all those checks.
0:05:24 In other words, literally, you all have some money, you can’t spend it all as fast as you want.
0:05:27 I think that’s going to extend for three, four, five years.
0:05:29 Wow.
0:05:32 And how do you deal with the depreciation cycles that are involved there?
0:05:36 Does the demand curve and the depreciation cycle curves match up?
0:05:38 Well, fortunately, we buy just in time.
0:05:40 But the nice thing is just in time for the hardware.
0:05:46 The depreciation cycle for the space power is more like somewhere between 25 and 40 years.
0:05:47 So, we have benefits there.
0:05:57 I think if you think of on the networking side and you look at both enterprise and the hyperscalers as well as neoclouds, I think the story is quite different.
0:06:03 So, the enterprise is pretty nascent and it’s built out of true infrastructure.
0:06:19 I just don’t think that the data centers, like if you assume that 100% of the data centers at some point in time will need to get re-racked and you will need a very different level of power requirement per rack that’s going to be there compared to what used to be there in the traditional data centers.
0:06:24 I just don’t think that the enterprises are far enough along.
0:06:30 Maybe the few enterprises that are at super high scale might be there, but I don’t think the enterprises are far enough along.
0:06:32 Hyperscalers and neoclouds is a completely different story.
0:06:42 And to Amin’s point on this notion of scarcity of power, compute, and network being the three big kind of constraints in this thing,
0:06:53 I would say right now that because there’s not enough power singularly in one location, data centers are being built where the power is available rather than power being brought to where the data centers are.
0:06:58 And that’s why you’re seeing a lot of projects that are being built out all throughout the world.
0:07:08 The other point, though, is the lion’s share of the constraints that we’re going to have, I think, are going to be sustainable for a long period of time.
0:07:20 And as you have data centers that are being built farther and farther apart, one, there’s going to be a huge demand for scale-up networking so that you can have a rack that gets more and more networking for scale-up.
0:07:26 The second is you’re going to have a lot of demand for scale-out where you have multiple racks and clusters that need to get connected together.
0:07:41 But we just launched a new piece of silicon as well as a new chip and a system for scale across networking, where you might have two data centers that act as a logical data center that could be up to 800, 900 kilometers apart.
0:07:46 And you will see that just because there’s not going to be enough concentration of power in a single location.
0:07:49 So you’ll just have to have different architectures that get built out.
0:07:55 Actually, that brings us to the next topic that I want to discuss, the future of systems and networking and so on and so forth.
0:08:05 So Google bought the first, or at least large-scale, scale-out commodity servers in production for the web revolution.
0:08:09 And now NVIDIA is bringing back the mainframe in a different form.
0:08:12 So what do you think happens next?
0:08:19 I mean, is this a new style of coherent cluster-wide computing that we need and there’s going to be shared memory and all sorts of things?
0:08:21 Or do you think the pattern changes again?
0:08:29 I don’t think we’re quite too back to mainframes in that it is still the case that people are running on scale-out architectures across these pools.
0:08:36 In other words, whether you have GPUs or TPUs, you’re not necessarily saying, hey, that’s my GPU supercomputer.
0:08:39 You’re saying I’ve got 16,384 GPUs.
0:08:42 And maybe I’m going to go grab some subset.
0:08:46 Now I’ve got a uniform all-to-all connectivity in many cases, which is fantastic.
0:08:47 Same with TPUs.
0:08:52 It’s not like I say I have a 9,000 chip pod and I have to make my job fit on that.
0:08:54 Maybe I actually only need 256.
0:08:56 Maybe I need 100,000.
0:09:00 So I do think that actually the software scale-out is still going to be there.
0:09:02 I’ll note two things, though.
0:09:11 One, you’re absolutely right that, say, about 25 years ago at Google and other places simultaneously, there was really a transformation of computing infrastructure.
0:09:19 Like the notion that actually you would scale-out on commodity PCs, essentially, the same ones that you could buy off the shelf, running a Linux stack.
0:09:24 And that’s what you would do for disk, that’s what you would do for compute, that’s what you would do for networking.
0:09:25 I mean, you all take it for granted.
0:09:28 This is sort of, it was radical.
0:09:32 There are many people who thought that this was a terrible idea that wasn’t going to work.
0:09:40 I think the exciting thing about this moment right now is actually that we’re going to be reinventing, I’m not saying Google, we are going to be reinventing computing.
0:09:48 And five years from now, whatever the computing stack is from the hardware to the software, it’s going to be unrecognizable.
0:09:52 And by the way, there was this co-design, because if you think about it, I’ll use Google examples, because I know those best.
0:10:01 Bigtable, Spanner, GFS, Borg, Colossus, they were hand-in-hand co-designed with the hardware, the cluster scale-out architecture.
0:10:06 And you wouldn’t have done the scale-out hardware if you didn’t have the scale-out software.
0:10:09 Same thing is going to happen in this moment.
0:10:12 So I think actually the mainframe is going to look very, very different.
0:10:23 I do think there will be this extreme demand for an integrated system, because right now we are very fortunate at Cisco, where we do everything from the physics to the semantics.
0:10:25 You think about the silicon to the application.
0:10:30 And other than power, one of the constraints is how well integrated are these systems?
0:10:36 And do they actually work with the least amount of lossiness across the entire stack?
0:10:40 And so that level of tight integration is going to be super important.
0:10:48 And what that means the industry will have to evolve into is we will have to work like one company, even though we might actually be multiple companies that actually do these pieces.
0:10:59 And so when we work with hyperscalers like Google or others, there’s a deep design partnership that actually goes on for months and months together ahead of time before we actually even do the deal.
0:11:04 And then once the deal is done, of course, there’s a tremendous amount of pressure to make sure that they’re moving pretty fast.
0:11:14 But I think the industry’s muscle of making sure that you operate in an open ecosystem and not be a walled garden is going to get important at every layer of the stack.
0:11:16 Really, Ray.
0:11:25 And so let’s talk about the, to segregate the stack a little bit, one of the most interesting topics is processors, right?
0:11:33 Clearly, there’s an amazing vendor producing an amazing processor that has massive market share today, right?
0:11:37 And we see startups all the time doing all sorts of processor architectures.
0:11:42 You’ve got an amazing processor inside your fortress.
0:11:46 What do you think happens next in processor land?
0:11:48 Yeah, we’re huge fans of NVIDIA.
0:11:53 We sell a lot of NVIDIA products and chips.
0:11:54 Customers love them.
0:11:56 We’re also huge fans of our TPUs.
0:12:00 I think the future is actually really exciting.
0:12:08 And actually, it’s not that, I don’t think that we’ve hit the point of, okay, there’s TPUs, there’s GPUs, there’s whatever, Traniums or something else.
0:12:10 We’re really seeing the golden age of specialization.
0:12:13 And that’s my observation.
0:12:23 In other words, if you look at it, a TPU, I’ll use that example again, because I know it best for certain computation, is somewhere between 10 and 100 times more efficient per watt.
0:12:26 And it’s this watt that really matters than a CPU.
0:12:28 That’s hard to walk away from, right?
0:12:29 10 to 100x.
0:12:41 And yet, we know that there are other computations that if you built even more specialized systems for, but not just a niche computation, computations that we run a lot of at Google, right?
0:12:48 For example, maybe for serving, maybe for agentic workloads, that would benefit from an even more specialized architecture.
0:12:56 So, I think that actually one bottleneck is, how hard is it and how long does it take to turn around a specialized architecture?
0:12:57 Right now, it’s forever.
0:13:06 For the best teams in the world, really from concept to live in production, the speed of light is two and a half years.
0:13:10 I mean, that’s if you nail everything, right?
0:13:12 And there are a few teams that do.
0:13:16 But how do you predict the future two and a half years out for building specialized hardware?
0:13:18 So, A, I think we have to shrink that cycle.
0:13:25 But then, B, at some point when things slow down a little bit, and they will, I think we’re going to have to build more specialized architectures.
0:13:30 Because the power savings, the cost savings, the space savings are just too dramatic to ignore.
0:13:35 And this will actually have a really interesting implication on geopolitical structures as well.
0:13:40 Because if you think about what’s happening in China, China actually doesn’t make two nanometer chips.
0:13:42 They make, you know, seven nanometer chips.
0:13:48 And so, if you think about what, but they have unlimited amount of power.
0:13:52 And they have unlimited amount of engineering resource.
0:13:59 And so, what they can do is do the optimization on the engineering side, keep the seven nanometer chips, and make sure that they give people unlimited amount of power.
0:14:03 We might have a different architectural design, where you have to get extremely power efficient.
0:14:07 You don’t have as many engineers as you might enjoy in China.
0:14:10 And you can actually go to two nanometer chips.
0:14:16 And those might be power efficient in some ways, but they might have thermal lossiness in other ways.
0:14:24 So, like, there’s a whole bunch of things that have to get factored in on the architecture that will get more specialized, even by geo and by region.
0:14:34 And then, depending on how the regulatory frameworks evolve, you know, how that geo then expands.
0:14:42 Like, if China expands to different regions in the world, you will have a very different architecture that plays out than if America expands to different regions in the world.
0:14:50 So, this is a very interesting kind of game theory exercise to go through on what happens in the next three years in tech in general.
0:14:52 And no one knows right now.
0:14:53 Yeah.
0:14:55 That’s the beauty of the world that we live in.
0:14:56 Yeah, yeah.
0:15:01 So, we’ll soon be measuring systems by engineers per token in addition to watts per token.
0:15:02 All right.
0:15:05 So, let’s turn to another topic, which…
0:15:06 Engineer per kilowatt.
0:15:07 Engineer per kilowatt.
0:15:08 In the U.S.
0:15:11 Networking, right?
0:15:15 Obviously, you alluded to it, scale up, scale out.
0:15:18 In your case, you mentioned scale across.
0:15:24 So, it seems to me that networking is also going to get reinvented in a fairly significant way.
0:15:31 So, what are the leading signs that you’re seeing and the signals that you’re seeing and the direction networking is going to take?
0:15:34 Yeah, networking is going to need a transformation for certain.
0:15:41 In other words, the amount of bandwidth that’s needed at scale within a building is just astounding.
0:15:44 I mean, and it’s going up.
0:15:49 The network is becoming a primary bottleneck, which is scary.
0:15:53 So, more bandwidth translates directly to more performance.
0:16:03 And then, given that the network winds up actually being a small power consumer, that delivered utility you get per watt, like it’s a super linear benefit.
0:16:05 Like, spend a little bit here, get way more there.
0:16:09 So, I think that that side is absolutely there.
0:16:18 I’ll put in a plug here in that for these workloads, we actually know what the network communication patterns are a priori.
0:16:20 So, I think this is a massive opportunity.
0:16:29 In other words, do you then need the full power of a packet switch when actually you know what the rough circuits are going to be?
0:16:32 And I’m not saying you need to build a circuit switch, but there is an optimization opportunity.
0:16:37 The other aspect of this here is these workloads are just incredibly bursty.
0:16:38 Yeah.
0:16:49 And to the point where, and we’ve written about this, power utilities notice when we’re doing network communication relative to computation at the scale of tens and hundreds of megawatts.
0:16:58 Like, massive demand for power, stop all of a sudden and do some network communication, and then burst back to computing.
0:17:06 So, how do you build a network that needs to go at 100% for a really short amount of time and then go idle?
0:17:07 Yeah.
0:17:11 And then same actually for the scale across use case, which we’re absolutely seeing.
0:17:16 You don’t run large scale free training across all your wide area data center sites 12 months of the year.
0:17:25 So, and then you’re going to, this is the problem I think about a lot is let’s say you build the latest, greatest chips in these three data center sites.
0:17:30 How long are you going to be there before you migrate to the latest, latest chips in three other sites?
0:17:34 And then what do you do with the network that you left behind?
0:17:35 People are going to run jobs on them.
0:17:36 Yeah.
0:17:42 But you’re not going to need nearly the network capacity that you did for large scale training, pre-training anyway.
0:17:51 So, the shift of needing massive networks for like 5% of the time, I don’t know how to build a network like that.
0:17:54 So, if any of you do, please let me know.
0:17:56 I mean, if you don’t know how to build this, there’s nobody that knows how to build this.
0:17:57 We’re trying to figure it out.
0:17:59 It actually is a fascinating problem.
0:17:59 Yeah.
0:17:59 Yeah.
0:18:07 I do think like, if you think of, if power is the constraint and if compute is the asset, I think network is going to be the force multiplier.
0:18:07 Mm-hmm.
0:18:21 Because, you know, if a packet, if you have low latency and low performance and high energy inefficiency, then the packet, every kilowatt of power you save, moving the packet is a kilowatt of power you can give to the GPU.
0:18:22 Yeah.
0:18:24 Which is, you know, super important.
0:18:37 The other thing is, you know, when you think about scale up versus scale out versus scale across, you’ll also need, especially on inference versus training, there are different things that get optimized.
0:18:41 Like, you might optimize for latency much more on training runs.
0:18:43 You might optimize much more for memory on inferencing.
0:18:47 There’s architectural.
0:19:04 And so, and so, I also feel like the way that networking will evolve is, rather than it being a training infrastructure that then gets applied to inferencing, you might have inferencing native infrastructure that gets built over time.
0:19:12 And so, there’s good considerations to look at on, like, how all of the architectural components are moving.
0:19:29 But, in my mind, like, if I were to say, strategically, one of the biggest things that’s happening in networking from our vantage point is, if you’re just a wrapper around Broadcom, then you’ve got a monopoly that’s going to be a very predatory one.
0:19:46 And so, one of the big reasons where Cisco is super relevant is, you don’t just have a Broadcom world with people just wrapping Broadcom, that their systems are on Broadcom, but you will actually have a choice of silicon.
0:19:53 And that choice and diversity of silicon is going to be super important, especially for high-volume, you know, kind of consumption patterns.
0:19:59 So, last question on the system, since you brought that up, and we’ll move to use cases.
0:20:05 Inference, both of you have mentioned, I mean, you talked about it in the context of the processors.
0:20:07 You just started talking about the architecture.
0:20:13 Are you deploying today’s specific architectures for inference, I mean?
0:20:18 Are you, is it still shared workloads?
0:20:31 We are deploying specialized architectures for inference, and I think as much software as hardware, but the hardware is also deployed in different configurations, is the way I would say it.
0:20:41 And then the other aspect of inference that is becoming really interesting is reinforcement learning, especially on the critical path of serving, because latency just becomes absolutely critical.
0:20:52 And I think that, so how you would build your system, and how you would connect it up to one another, and of course networking plays a key role there, becomes increasingly interesting.
0:21:05 And are there singular choke points that, if removed, would accelerate the thousand-fold reduction in the cost of inference that we need, or is this just a natural curve that we are writing down?
0:21:07 So, we’re massive, I mean, two things here.
0:21:12 One, again, maybe many of you are familiar with this, pre-fill and decode on inference look very, very different.
0:21:18 So, actually, ideally, if you’ve, you would have different hardware, actually, the balance points are different.
0:21:23 So, that’s one opportunity, it comes with downsides, we can talk about that.
0:21:30 What I would say, though, is that maybe something people don’t realize is that we’re actually driving massive reductions in the cost of inference.
0:21:42 I mean, 10Xs and 100Xs, the problem or opportunity is the community, the user base, keeps demanding higher quality, not better efficiency.
0:21:53 So, just as soon as we deliver all the efficiency improvements we’re looking for, the next generation model comes out, and it is, whatever, intelligence per dollar is way better.
0:21:58 But, you still pay more, and it costs more, relative to the previous generation.
0:21:59 And then we repeat the cycle.
0:22:10 And it’s almost like the longer the reasoning that you have, the more impatient the market gets, right?
0:22:18 So, for example, if you have a 20-minute reasoning cycle, like, for example, with deep research, you could have autonomous execution for about 20 minutes.
0:22:19 That was interesting.
0:22:28 Now, you have, you know, most of the coding tools that can go up to 7 hours to 30 hours of, you know, duration of autonomous execution.
0:22:32 When that happens, there’s actually a greater demand for, say, compress that time down.
0:22:43 And so, it’s kind of a self-fulfilling prophecy where you need to have more performance because of the fact that you’ve been able to go out and do things for a longer autonomous amount of time.
0:22:50 And so, it’s almost a never-ending loop where you’ll need to have more performance for inference in perpetuity.
0:22:50 Yeah.
0:22:59 Though, intelligence per dollar is a business model metric, so it is not just a processor capability.
0:22:59 No, it’s end-to-end.
0:23:00 Absolutely.
0:23:00 Yeah.
0:23:01 So, okay.
0:23:05 So, let’s change topics and talk about actual usage, right?
0:23:09 So, both of you have massive organizations.
0:23:17 Where are the key wins that you’re getting today with applying all the AI that’s available to you?
0:23:21 And then we’ll talk about what your customers are doing.
0:23:24 But I’m actually curious about what you’re doing internally.
0:23:25 Within the teams?
0:23:26 Yeah.
0:23:26 Yeah.
0:23:28 So, I mean, coding is the obvious one.
0:23:32 And that’s actually picking up increasing traction and increasing capability.
0:23:41 We just actually, in the last couple of days, published a paper that showed how we applied AI techniques to do instruction set migration.
0:23:55 So, in other words, we actually had a fairly massive migration from x86 to ARM, making our entire code base, and at Google, it’s a very, very large code base, sort of instruction set agnostic, and including to, you know, future RISC-V or whatever else might come along.
0:23:59 Tens and thousands, hundreds of thousands of individual.
0:24:01 Your entire code base, you’re going to make it agnostic.
0:24:06 Entire code base, because we want and need all of our code base to be agnostic.
0:24:07 Man, that’s a crazy-ass project.
0:24:08 Yeah.
0:24:10 So, it was.
0:24:13 The motivation, though, for this actually was a few years ago.
0:24:20 We had this amazing legacy system called BigTable, and then a new amazing system called Spanner.
0:24:24 And we decided to tell the company, hey, everyone needs to move from BigTable to Spanner.
0:24:28 And by the way, BigTable was amazing for its time, but Spanner was better.
0:24:33 The estimate for doing that migration for Google was seven-staffed millennia.
0:24:35 How much?
0:24:36 How much?
0:24:37 Seven-staffed millennia.
0:24:43 We had a new unit that we had to actually, to see.
0:24:45 And it wasn’t like made-up people being lazy.
0:24:47 It’s like, this is what it was.
0:24:49 It’s endearing that they came up with that, though.
0:24:50 And you know what we decided?
0:24:52 Long live BigTable.
0:24:56 It just wasn’t worth it, honestly.
0:24:59 Like, the opportunity cost was too high.
0:25:04 So, and we have these sorts of migrations, TensorFlow to JAX.
0:25:08 We actually, I mean, again, somewhat private, but not too secret.
0:25:13 We’ve affected this internally with AI assist went integer factors faster.
0:25:20 Now, there are other tasks which the tools probably aren’t quite yet up to the, whatever, standard for.
0:25:23 But the area under the curve is getting bigger and bigger and bigger.
0:25:30 So, we’re seeing probably like three or four really good use cases.
0:25:32 And then we’re seeing some use cases which are not working yet.
0:25:35 And so, what is working?
0:25:38 Code migrations are working relatively well.
0:25:46 So far, we use largely a combination of Codex, Claude, and Cursor, some Winsurf.
0:25:50 And so, code migrations tends to work pretty well.
0:26:01 Debugging, oddly enough, has actually been very, very productive with these tools, especially with CLIs.
0:26:07 Where we’ve not done as good a job.
0:26:11 And then front-end zero-to-one projects tend to do extremely well.
0:26:12 Like, the engineers are super productive.
0:26:18 When you go to code that’s older, and especially further down in the infrastructure stack,
0:26:21 much harder to go out and get that to happen.
0:26:24 But the challenge that we have to orient our engineers on,
0:26:29 this is actually much more of a cultural reset problem than it is just a technical problem.
0:26:33 Which is, if someone uses something and says, this isn’t working right,
0:26:39 you can’t put it back on the shelf saying, this doesn’t work for another six or nine months.
0:26:42 You have to come back to it within four weeks and see if it works again.
0:26:48 Because the speed at which these tools are kind of advancing is so fast
0:26:53 that you almost have to kind of get, like, I was with 150 of our distinguished engineers today.
0:27:01 And what I had to urge them to do is assume that these tools are going to get infinitely better within six months.
0:27:01 Yeah.
0:27:05 And make sure that you get your mental model to where that tool is going to be in six months
0:27:08 and what are you going to do to be best in class in six months,
0:27:10 rather than assessing it for where it is today,
0:27:13 and then putting it aside for six months,
0:27:15 assuming that that’s not going to work for the next six months.
0:27:17 I think that’s a big strategic error.
0:27:19 So, we’ve got 25,000 engineers.
0:27:26 I’m hoping that we can get at least two or three X productivity
0:27:29 within a very short amount of time within the next year.
0:27:34 And we’ll be able to see if that happens.
0:27:40 The second, a couple of the big areas that we’re starting to see some good responses is in sales.
0:27:43 Preparation going into an account call, really good.
0:27:48 Legal contract reviews, actually much better than what we had thought.
0:27:54 And then the last one is not super high inference volume, but product marketing.
0:28:00 I think the first chat GPT take on competitive
0:28:04 is always better than what any product marketing person comes up by themselves.
0:28:06 So, we should never start from my slate.
0:28:08 Just start from chat GPT and then go from there.
0:28:09 Okay.
0:28:11 We could be talking about the topic for a long time,
0:28:13 but they showed me the two-minute warning.
0:28:15 So, I want to focus on one last question here.
0:28:18 So, we’ve got a lot of founders here, right?
0:28:19 Building amazing companies.
0:28:22 So, what is the most interesting development
0:28:26 they should look forward to in the next calendar year,
0:28:28 let’s call it, or the next 12 months?
0:28:32 A, from your company, and B, from the industry.
0:28:34 If you were to look at your crystal ball.
0:28:38 I mean, I think to build on the point,
0:28:40 these models are getting more spectacular
0:28:42 by the month.
0:28:45 And then they’ll be from whatever companies you like,
0:28:47 a bunch of really exciting, including ours.
0:28:50 I forgot to say, you’re not allowed to say models will get better.
0:28:51 Everybody knows.
0:28:52 The models are going to get,
0:28:55 but I mean, they’re getting scary good,
0:28:57 is the part that I would say.
0:29:02 But I think that then the agents that get built on top of them
0:29:04 and the frameworks for making that happen
0:29:05 are also getting scary good.
0:29:10 So, the ability to have things go quite right
0:29:13 for quite long over the coming 12 months
0:29:14 is going to be transformative.
0:29:17 I mean, do you want to leak any aspect of your roadmap?
0:29:19 Next 12 months?
0:29:20 Not right now.
0:29:21 Okay.
0:29:23 Do you two?
0:29:26 I’d say the big shift,
0:29:28 and what I would urge startups to do,
0:29:31 is don’t build thin wrappers around models
0:29:32 that are other people’s models.
0:29:34 I think the combination of a model
0:29:37 working very closely with the product
0:29:39 and the model getting better
0:29:41 as there’s feedback in the product
0:29:42 is going to be super important.
0:29:44 So, you are going to need foundation models,
0:29:46 but if you just have a thin wrapper,
0:29:49 I think the durability of your business
0:29:51 will be very, very short-lived.
0:29:55 So, that would be something that I would urge you on,
0:29:57 and I think the intelligent routing layer of some sort
0:29:57 that says,
0:29:59 I’m going to use my models for these things,
0:30:01 I’m going to probably use foundation models
0:30:02 for other things,
0:30:04 and dynamically keep optimizing will be,
0:30:06 I think Cursor does that pretty well,
0:30:10 but that’ll be a good way
0:30:12 that the software development lifecycle will evolve.
0:30:15 What you should expect from Cisco is,
0:30:16 look, truth be told,
0:30:17 for the longest time,
0:30:19 people thought Cisco was a legacy company.
0:30:21 Like, there has been,
0:30:23 and I think in the past year,
0:30:25 hopefully, you’ve paid attention,
0:30:26 and if you haven’t,
0:30:28 like, our stock software is doing pretty well.
0:30:30 I think there’s a level of momentum in the business,
0:30:32 there’s a spring in the step in the employee base.
0:30:34 So, you should expect,
0:30:36 like I said,
0:30:38 from the physics to the semantics
0:30:40 in every layer from silicon to the application,
0:30:44 a fair amount of innovation in silicon,
0:30:44 and networking,
0:30:45 and security,
0:30:46 and observability,
0:30:47 and the data platform,
0:30:49 as well as applications,
0:30:50 you know, from us.
0:30:52 And we’re excited to work with
0:30:54 the startup ecosystem,
0:30:57 and so if you ever feel like you want to work with us,
0:30:59 make sure that you reach out to us.
0:31:01 Were you going to say something you meant?
0:31:02 I mean,
0:31:04 one aspect that I want to highlight about the models
0:31:07 is where we were with,
0:31:07 let’s say,
0:31:09 text models two and a half,
0:31:10 three years ago.
0:31:11 They were fun.
0:31:12 Like, hey,
0:31:13 write me a haiku about Martine.
0:31:15 Did a great job.
0:31:15 Now,
0:31:16 they’re amazing.
0:31:18 I think that what’s going to happen
0:31:19 in the next 12 months
0:31:20 is the same thing is going to be happening
0:31:22 with input and output of images and video
0:31:23 to these models.
0:31:25 And to the extent that,
0:31:26 even for images,
0:31:28 imagine them as productivity
0:31:30 and educational tools.
0:31:31 Not just,
0:31:31 okay,
0:31:33 here’s Martine as Superman.
0:31:33 I don’t know,
0:31:34 like that’s cool too,
0:31:35 right?
0:31:38 But using it for productivity gains
0:31:39 and learning,
0:31:40 I think is going to be
0:31:40 really,
0:31:41 really transformative.
0:31:42 Awesome.
0:31:44 So I’m not allowed to end this session.
0:31:45 Thanks for a great conversation,
0:31:46 Amin.
0:31:52 Thanks for listening to this episode
0:31:53 of the A16Z podcast.
0:31:55 If you liked this episode,
0:31:56 be sure to like,
0:31:57 comment,
0:31:57 subscribe,
0:31:59 leave us a rating or review
0:32:01 and share it with your friends and family.
0:32:02 For more episodes,
0:32:03 go to YouTube,
0:32:04 Apple Podcasts,
0:32:05 and Spotify.
0:32:07 Follow us on X at A16Z
0:32:09 and subscribe to our substack
0:32:11 at A16Z.substack.com.
0:32:13 Thanks again for listening
0:32:14 and I’ll see you in the next episode.
0:32:16 As a reminder,
0:32:17 the content here
0:32:18 is for informational purposes only,
0:32:19 should not be taken
0:32:20 as legal business,
0:32:21 tax,
0:32:22 or investment advice,
0:32:23 or be used to evaluate
0:32:24 any investment or security
0:32:25 and is not directed
0:32:26 at any investors
0:32:27 or potential investors
0:32:29 in any A16Z fund.
0:32:30 Please note that A16Z
0:32:31 and its affiliates
0:32:32 may also maintain investments
0:32:33 in the companies discussed
0:32:34 in this podcast.
0:32:35 For more details,
0:32:36 including a link
0:32:37 to our investments,
0:32:39 please see A16Z.com
0:32:41 forward slash disclosures.
0:32:55 A16Z.com forward slash disclosures.

AI isn’t just changing software, it’s causing the biggest buildout of physical infrastructure in modern history.

In this episode, Raghu Raghuram (a16z) speaks with Amin Vahdat, VP and GM of AI and Infrastructure at Google, and Jeetu Patel, President and Chief Product Officer at Cisco, about the unprecedented scale of what’s being built — from chips to power grids to global data centers.

They discuss the new “AI industrial revolution,” where power, compute, and network are the new scarce resources; how geopolitical competition is shaping chip design and data center placement; and why the next generation of AI infrastructure will demand co-design across hardware, software, and networking.

The conversation also covers how enterprises will adapt, why we’re still in the earliest phase of this CapEx supercycle, and how AI inference, reinforcement learning, and multi-site computing will transform how systems are built and run.

 

Resources:

Follow Raghu on X: https://x.com/RaghuRaghuram

Follow Jeetu on X: https://x.com/jpatel41

Follow Amin on LinkedIn: https://www.linkedin.com/in/vahdat/

 

Stay Updated: 

If you enjoyed this episode, be sure to like, subscribe, and share with your friends!

Find a16z on X: https://x.com/a16z

Find a16z on LinkedIn: https://www.linkedin.com/company/a16z

Listen to the a16z Podcast on Spotify: https://open.spotify.com/show/5bC65RDvs3oxnLyqqvkUYX

Listen to the a16z Podcast on Apple Podcasts: https://podcasts.apple.com/us/podcast/a16z-podcast/id842818711

Follow our host: https://x.com/eriktorenberg

Please note that the content here is for informational purposes only; should NOT be taken as legal, business, tax, or investment advice or be used to evaluate any investment or security; and is not directed at any investors or potential investors in any a16z fund. a16z and its affiliates may maintain investments in the companies discussed. For more details please see a16z.com/disclosures.

Stay Updated:

Find a16z on X

Find a16z on LinkedIn

Listen to the a16z Podcast on Spotify

Listen to the a16z Podcast on Apple Podcasts

Follow our host: https://twitter.com/eriktorenberg

 

Please note that the content here is for informational purposes only; should NOT be taken as legal, business, tax, or investment advice or be used to evaluate any investment or security; and is not directed at any investors or potential investors in any a16z fund. a16z and its affiliates may maintain investments in the companies discussed. For more details please see a16z.com/disclosures.

Hosted by Simplecast, an AdsWizz company. See pcm.adswizz.com for information about our collection and use of personal data for advertising.

a16z Podcasta16z Podcast
0
Let's Evolve Together
Logo