Tesla’s Road Ahead: The Bitter Lesson in Robotics

AI transcript
0:00:02 This is the big race in robotics.
0:00:04 The smarter your brain, so to speak,
0:00:07 the less specialized your appendages have to be.
0:00:11 AI has pushed every single one of these kind of to its limit
0:00:13 and to a new state of the art.
0:00:14 The way they’re solving precision
0:00:17 is instead of throwing more sensors on the car,
0:00:19 is to basically throw more data at the problem.
0:00:22 Data is absolutely eating the world.
0:00:23 What is good enough?
0:00:24 We used to have the Turing test,
0:00:27 which obviously we’re blown past now.
0:00:30 His short hand for it was like the AWS of AI.
0:00:33 He’s got this idea of this distributed swarm
0:00:36 of unutilized inference computers.
0:00:39 Whether that’s an oil rig, whether that’s a mine,
0:00:41 whether that’s a battlefield,
0:00:43 there’s so many different use cases
0:00:46 for a lot of this underlying technology
0:00:49 that are really starting to see the light of day.
0:00:51 It’s basically an if not a win.
0:00:53 In inevitability.
0:00:54 Early this month,
0:00:58 Elon Musk and the team at Tesla held their We Robot event,
0:00:59 where they unveiled their plans
0:01:02 for the unsupervised full self-driving cyber cab
0:01:05 and RoboVan, plus Optimus,
0:01:08 their answer to consumer-grade humanoid robots,
0:01:10 and also what Musk himself predicted
0:01:14 would be, quote, “the biggest product ever of any kind.”
0:01:16 Now, of course, none of these products are on the market yet,
0:01:19 but several demos were on show at the event.
0:01:21 Naturally, the response was mixed.
0:01:23 Supporters said we got a glimpse of the future,
0:01:26 while critics said the details were missing.
0:01:29 But in today’s episode, we’re not here to debate that.
0:01:31 What we do want to talk about is what this indicates
0:01:35 about the intersection of where hardware and software meet.
0:01:38 So what does Rich Sutton’s 2019 blog post, The Bitter Lesson,
0:01:41 tell us about the decisions that Tesla’s making in autonomy?
0:01:45 And how realistic is the quoted $30,000 price range?
0:01:48 Also, what are the different layers of the autonomy stack?
0:01:50 And where do we get the data to power it?
0:01:51 And what does any of this look like
0:01:53 when you exit the consumer sphere?
0:01:56 We cover all this and more with A16C partners,
0:01:58 Anjani Midda and Aaron Price-Wright.
0:02:01 Anjani previously founded Ubiquiti 6,
0:02:04 a pioneering computer vision and multiplayer technology company
0:02:07 that sat right at this intersection of hardware and software,
0:02:10 and was eventually acquired by Discord.
0:02:13 Aaron, on the other hand, invests on our American dynamism team
0:02:15 with a focus on AI for the physical world.
0:02:17 And if you’d like to dig even deeper here,
0:02:20 Aaron has penned several articles on the topic
0:02:22 that we’ve linked in our show notes.
0:02:24 All right, let’s get to it.
0:02:29 As a reminder, the content here is for informational purposes only,
0:02:33 should not be taken as legal, business, tax, or investment advice,
0:02:35 or be used to evaluate any investment or security
0:02:38 and is not directed at any investors or potential investors
0:02:39 in any A16C fund.
0:02:42 Please note that A16C and its affiliates
0:02:46 may also maintain investments in the companies discussed in this podcast.
0:02:48 For more details, including a link to our investments,
0:02:51 please see a16c.com/dispotures.
0:03:00 So last week, Tesla had their WeRobot event
0:03:03 and Musk announced the CyberCab, the RoboVan,
0:03:07 or as he liked to call it, the RoboVan, and the Optimus.
0:03:10 You guys are so immersed in this hardware software world,
0:03:13 I’d love to just get your initial reaction.
0:03:16 From my perspective, it wasn’t that there was anything in particular
0:03:17 that was super surprising.
0:03:20 But what was exciting as just sort of a culmination
0:03:22 of one thing that Elon Musk does really well
0:03:24 and Tesla has done really well,
0:03:28 which is continue to pour love and energy and money and time
0:03:33 into a dream and a vision that’s been going on for a really long time,
0:03:36 like well past when most financial investors
0:03:39 and most people kind of lost the luster of self-driving cars
0:03:44 after their initial craze in the mid to late 2010s.
0:03:46 And they’ve just continued to plot along
0:03:48 and to continue to make developments
0:03:51 and now we’re finally seeing this glimpse of the future
0:03:53 for the first time in a really long time.
0:03:57 I think that’s right. I think it was very impressive, but unsurprising.
0:03:58 Yeah.
0:04:01 So I think the two schools of thought when people watched the event was,
0:04:05 one was absolutely this whole, “Oh my God, this is such vaporware.”
0:04:08 He shared literally nothing on engineering details.
0:04:12 “What the hell? Come on, give us the meat on timelines and dates and prices.”
0:04:16 And then the opposing view was like, “Holy shit. They’re still going.
0:04:18 They haven’t given up on any of this autonomy stuff
0:04:19 that he’s been talking about for years.”
0:04:22 And I’m absolutely more empathetic towards the view that it was a lean towards the ladder,
0:04:25 which is that it saw it as an homage to the bitter lesson.
0:04:27 It’s sort of amazing blog post that I’m going to do a terrible job
0:04:30 of summarizing by this great computer scientist, Rick Sutton,
0:04:34 which basically says that over the last 70 years or so of computer science history,
0:04:36 what we’ve learned is that general purpose methods
0:04:40 basically beat out any specific methods in artificial intelligence,
0:04:43 in particular, basically the idea that if you’re working on solving a task
0:04:47 that requires intelligence, you’re usually better off leveraging Moore’s law
0:04:52 and more compute and more data than trying to hand engineer a technique
0:04:54 or a set of algorithms to solve a particular task.
0:04:57 And broadly speaking, that’s been the big grand debate
0:04:59 in self-driving and autonomy I would say for the last two decades, right?
0:05:03 This is the sort of general purpose bitter lesson school versus the,
0:05:06 let’s model self-driving as a specific task.
0:05:11 As a set of discreet decision-making algorithms unconnected to each other.
0:05:15 A system to solve, let’s say, edge detection around stop signs, right?
0:05:18 Whereas self-driving is a really hard problem.
0:05:20 And you could totally say, well, there’s so many edge cases in the world
0:05:23 that we should map out each of those edge cases.
0:05:25 And I think it was an homage to the bitter lesson.
0:05:28 So that’s what I was most excited about is he did share actually details
0:05:31 that their pipeline is basically an end-to-end deep learning approach.
0:05:34 Which is incredible and probably true only for the last,
0:05:37 my guess is 18 to 24 months, right?
0:05:39 Yeah. And I mean, in the bitter lesson,
0:05:44 he also talks about the fact that it’s really appealing to do the opposite.
0:05:46 Because in the short term, you will get the benefit,
0:05:51 but the broader deep learning approach ends up winning out in the long term.
0:05:54 And a lot of people talk about Musk Musk says about himself
0:05:56 that the timelines sometimes are off,
0:05:59 but he’s basically banking on that premise in the long term.
0:06:02 It’s basically an if not a win.
0:06:03 In inevitability.
0:06:09 And I think the event was the first time that it really did feel,
0:06:13 in an emotional sense, for the average American consumer.
0:06:16 I’m not talking about the super-duper tech literate people
0:06:18 who wanted the details of the underlying models and their weights,
0:06:21 but like for the average American consumer,
0:06:25 the first time that this version of the future felt like in inevitability.
0:06:29 And before we get into maybe the specifics around where else hardware
0:06:31 and software are intersecting,
0:06:34 I’d love to just talk about that, that average person who’s watching,
0:06:37 because you guys are meeting with companies and investors,
0:06:40 and this has been going on for quite some time.
0:06:43 So I’m just curious if maybe you noticed anything under the hood,
0:06:47 or maybe the meta in that announcement or event,
0:06:50 that maybe the average person watching is, you know, what are they seeing?
0:06:53 They’re saying things like, oh, maybe it was human-controlled
0:06:55 and not like fully AI and device,
0:06:58 or other people are commenting on the fact that these humano
0:07:01 are shaped like humans, like why do we need that?
0:07:03 On the topic of humanoids,
0:07:08 I think humanoids are a great choice of embodiment for a robot
0:07:13 to really emotionally connect to and speak to a human being watching,
0:07:16 because I can relate to a human form factor.
0:07:18 Obviously, we found out that it was teleoperative,
0:07:23 which is in my opinion still doesn’t take away from like how cool and amazing it was.
0:07:28 The human form factor is a way to connect what is happening with robotics
0:07:31 to a regular person who is like, okay, yes, I like see myself in that.
0:07:34 This looks like Star Wars or some other sci-fi movie.
0:07:37 In reality, maybe this is like a controversial opinion.
0:07:41 I don’t see the vast majority of economic impact over the next decade
0:07:44 from robotics coming from the humanoid form factor,
0:07:47 but that doesn’t take away from the power of the symbol
0:07:50 of having a humanoid make a drink at this event,
0:07:55 because it just like connects back to this sort of science fiction promise
0:07:59 of our childhoods getting sort of finally delivered.
0:08:02 The opening sequence, he started with like a sci-fi,
0:08:04 I think he was a Blade Runner visual and he was like,
0:08:06 we all love sci-fi and I want to be wearing that jacket
0:08:07 that he’s wearing in the picture,
0:08:09 but we don’t want any of the other dystopian stuff.
0:08:13 And so that definitely stuck out to me is that he did not start
0:08:14 the way he usually does.
0:08:16 It’s often a technical first sort of story,
0:08:20 but he started with here’s a vision for where I think the world should go.
0:08:22 So it was much more Disney-esque in that and it was quite poetic.
0:08:25 I think they literally did it on the Warner Brothers Studio lot.
0:08:28 And so they like recreated a bunch of cities
0:08:31 and I think they had on site at the event the robovans
0:08:34 taking people around from these simulated cities.
0:08:37 There was a sort of theatricality to it all that stuck out to me,
0:08:38 which I thought was quite different.
0:08:40 And I thought it was refreshing
0:08:43 because the core problem with this branch of AI,
0:08:46 which is largely deep learning based and bitter lesson based,
0:08:47 is that it’s an empirical field.
0:08:50 Unlike, call it Moore’s Law, which was predictive,
0:08:53 where you basically know if you double that number of transistors,
0:08:55 you get this much more performance on the chip.
0:08:56 And it’s just about pure execution.
0:08:58 AI is much more empirical.
0:09:01 You don’t really know when the model is going to get done training
0:09:03 and when it does get training, whether it will converge or not.
0:09:06 Or even what is converging mean, like what is good enough?
0:09:10 We used to have the Turing test, which obviously we’re blown past now.
0:09:13 It’s a feeling more than it is a set of discrete metrics
0:09:14 that you can really point to.
0:09:15 Right.
0:09:19 So it made a lot of sense to me that he’s trying to decouple
0:09:23 this idea of progress from a specific timeline.
0:09:23 I see.
0:09:25 Because I just think we’re setting ourselves up
0:09:28 for every time you ask a deep learning researcher.
0:09:30 So when’s that GPT-5 model going to show up?
0:09:32 It’s like the most frustrating question ever, right?
0:09:34 Because they don’t know.
0:09:34 We don’t know.
0:09:37 And frankly, sometimes they show up earlier than schedule
0:09:38 and show up later.
0:09:40 And by the way, you can look at the stock market’s reaction.
0:09:43 It’s a prime example of how people have been so conditioned by,
0:09:47 I would say, the Steve Jobsian, Apple-like cadence year on year of like,
0:09:50 here’s your new iPhone, it’s incremental but predictable.
0:09:53 I think forecasting that the tech industry keeps trying to reward.
0:09:55 And I think what he’s doing is pretty refreshing, which is saying,
0:09:58 look, here’s a vision for where we want to go.
0:09:59 But it’s decoupled.
0:10:01 The second thing on the humanoid piece that I was quite impressed by
0:10:04 is actually the quality of the tele-operation.
0:10:06 So everybody’s talking about how, oh, this is fake.
0:10:07 This is all smoke and mirrors.
0:10:08 It’s just people.
0:10:10 Tele-operation is real.
0:10:12 I was going to say, why is no one talking about that?
0:10:13 Have you ever tried?
0:10:14 I mean, I’ve tried.
0:10:15 It’s so hard.
0:10:19 We were at a company two weeks ago and they’ve got these tele-op robots.
0:10:25 And the founder was demoing a mechanical arm that he was tele-operating with a gamepad.
0:10:27 And he was folding clothes with it.
0:10:28 And I was like, oh, that looks simple.
0:10:30 He’s like, here, try it.
0:10:32 It was one of the hardest manipulation things I’ve ever tried.
0:10:37 And by the way, we tried that with VR headset, with six-stop motion controllers,
0:10:39 almost harder to do.
0:10:42 Tele-operating something, especially over the internet,
0:10:45 in a smooth fashion with precision is incredibly hard.
0:10:49 And I don’t think people appreciate the degree to which they’ve really solved that pipeline.
0:10:50 Yeah, I was actually really impressed by that.
0:10:56 And I think that there’s huge opportunity for tele-op in production applications
0:10:59 that will have massive economic benefit,
0:11:03 even before we have true robots running around managing themselves.
0:11:07 Because if you think about there’s all these really hard and really dangerous
0:11:10 or hard to get to jobs, or there’s labor differentials
0:11:14 where it’s a lot harder to hire people to do certain things in certain locations.
0:11:19 And if we can imagine a future where the tele-op that we saw last week at the event
0:11:22 is something that’s widely available, that’s incredible.
0:11:25 Imagine not having to go and service a power line,
0:11:29 but you can actually tele-op a robot to do that for you,
0:11:31 but still have the level of human training and precision needed
0:11:35 to make a really detailed and specific evaluation.
0:11:38 The promise of that is really cool, even before we get to robots.
0:11:40 So that was really exciting.
0:11:42 Yeah, it’s like a stop along this journey.
0:11:46 And so if we talk about that journey, the arc of hardware and software coming together
0:11:49 in maybe a different way than we’ve seen in the past.
0:11:52 Just as an example, so Mark famously said, software is eating the world.
0:11:54 That was in 2011 or in 2024.
0:11:58 And it does feel like the last decade has been a lot of traditional software,
0:12:02 not so much integrating with the physical world around us.
0:12:05 And so where would you place us in that trajectory?
0:12:07 Because we’re seeing it with autonomous vehicles,
0:12:10 but I get the sense that’s not the only place where this is happening.
0:12:15 Yeah, this is where I spend 95% of my time in all of these industries
0:12:19 that are just starting to see the glimmers of what autonomy
0:12:22 and sort of software-driven hardware can bring.
0:12:25 What’s really interesting is just actually a dearth of skills
0:12:29 of people who know how to deal with hardware and software together.
0:12:31 You have a lot of people that went and got computer science degrees
0:12:34 over the last decade, and relatively speaking,
0:12:38 a lot fewer than when got electrical engineering or mechanical engineering degrees.
0:12:39 And we’re starting to see the rise of, oh, shoot,
0:12:43 we actually need people who understand not just maybe how the software works.
0:12:49 In the cloud with Wi-Fi, where you have unlimited access to compute,
0:12:51 and you can retry things as many times as you want,
0:12:54 and you can ship code releases all day every day.
0:12:57 But you actually have kind of a hardware deployment,
0:12:59 where you have limited compute in an environment
0:13:02 where you maybe can’t rely on Wi-Fi all the time,
0:13:04 where you have to tie your software timelines
0:13:06 to your hardware production timelines.
0:13:10 Like, these are a really difficult set of challenges to solve.
0:13:13 And right now, there just isn’t a lot of standardized tooling
0:13:15 for developers and how to do that.
0:13:19 So it’s interesting, we’re starting to see portfolio companies of ours
0:13:22 across really different industries that are trying to use autonomy,
0:13:27 whether it’s oil and gas or water treatment or HVAC or defense.
0:13:31 They’re like sharing random libraries that they wrote
0:13:33 to connect to, like, particular sensor types.
0:13:36 Because there’s not this, like, rich ecosystem of tooling
0:13:38 that exists for the software world.
0:13:42 So we’re really excited about what we’re starting to see emerge in the space.
0:13:44 Even Elon said when he’s talking about these two different products
0:13:47 that he’s unveiling, right, Optimus,
0:13:50 and then you have the RoboVans or CyberCabs.
0:13:53 And those seem like two completely different things,
0:13:55 but he even said in the announcement,
0:13:56 he said, “Everything we’ve developed for our cars,
0:13:59 the batteries, power electronics, advanced motors,
0:14:03 gearboxes, AI inference computer, it all applies to both.”
0:14:04 Right, so you’re seeing this overlap.
0:14:05 That’s super exciting.
0:14:06 When I was watching it, I was just nerding out
0:14:09 because my last company was a computer vision 3D mapping
0:14:10 and localization company.
0:14:14 So I unfortunately spent too much of my life calibrating LiDAR sensors
0:14:15 to our computer vision sensors.
0:14:18 Because our whole thesis when I started back in 2017
0:14:21 was that you could do really precise positioning
0:14:22 just off of computer vision.
0:14:25 And that you didn’t need fancy hardware like LiDARs or depth sensors.
0:14:27 And to be honest, not a lot of people thought that we could pull it off.
0:14:30 And frankly, I think there were moments when I doubted that too.
0:14:33 And so it was just really fantastic to see that his bet
0:14:36 and the company’s bet on computer visions
0:14:38 and a bunch of these sensor fusion techniques
0:14:41 that would not need specialized hardware
0:14:43 would ultimately be able to solve
0:14:45 a lot of the hard navigation problems,
0:14:48 which basically means that the way they’re solving precision
0:14:51 is instead of throwing more sensors on the car,
0:14:53 is to basically throw more data at the problem.
0:14:56 And so in that sense, data is absolutely eating the world.
0:14:59 And you asked, where on the trajectory are we of soft-rating the world?
0:15:01 And I think we’re definitely on an exponential
0:15:03 that has felt like a series of stacked sigmoids.
0:15:05 Often it feels like you’re on a plateau.
0:15:08 But a series of plateaus totally make up an exponential
0:15:09 if you zoom out enough.
0:15:11 And earlier in the conversation we talked about the bitter lesson,
0:15:13 a number of other teams in the autonomy space
0:15:15 decided to tackle it as a hardware problem,
0:15:16 not a software problem, right?
0:15:17 Where they said, well…
0:15:22 More LiDAR, more expensive LiDAR, more GPUs, more GPUs.
0:15:25 And Elon’s like, you know, actually I want cheap cars
0:15:27 that just have computer vision sensors.
0:15:30 And what I’m going to do is use a bunch of the custom,
0:15:34 really expensive sensors that many other companies put on the car,
0:15:35 which is at inference time.
0:15:37 And he’s just going to use them at train time.
0:15:40 So Tesla does have a bunch of like really custom hardware
0:15:43 that’s not scalable, that drives around the world
0:15:45 in their parking lots and simulation environments and so on.
0:15:49 And then they distill the models they train on that custom hardware
0:15:50 to a test time package.
0:15:53 And then they send that test time package to their retail cars,
0:15:55 which just have computer vision sensors.
0:15:57 And the reality is that’s a raw arbitrage, right?
0:15:58 Between sensor stack.
0:16:02 And it allows the hardware out in the world to be super cheap.
0:16:04 The result there is software is eating the sensor stack
0:16:08 out in the world that makes the cost of these cars so much cheaper
0:16:11 that you can have a $30,000 fully autonomous car
0:16:14 versus $100 plus thousand dollars of cars
0:16:17 that are fully loaded with these LiDAR sensors and so on.
0:16:19 But I think in order to have the intuition
0:16:20 that you can even do that,
0:16:23 you really actually have to understand hardware.
0:16:26 If you just understand software and hardware
0:16:30 is like a sort of a scary monster that lives over here
0:16:32 and maybe you have a special hardware team that does it,
0:16:35 it’s going to be hard for you to have the confidence to say,
0:16:37 “No, we can do it this way.”
0:16:38 I think you’re totally right,
0:16:41 which is that the superpower that Tesla has
0:16:44 is his ability to go full stack, right?
0:16:46 Because a lot of other industries
0:16:49 often segment out software versus hardware like you’re saying.
0:16:51 And that means that people working on algorithms
0:16:55 and the autonomy part just treat hardware as like an abstraction, right?
0:16:57 You throw over a spec, it’s an API,
0:16:59 it’s an interface that I program against
0:17:00 and I have no idea what’s going on.
0:17:02 You don’t have to worry about the details, it doesn’t matter.
0:17:03 Which by the way is super powerful.
0:17:06 It’s unlocked this whole general purpose wave
0:17:09 of models like ChatGPT and so on, right?
0:17:11 Because it allows people who specialize in software
0:17:13 to not have to think about the hardware.
0:17:16 It’s also what’s driven sort of the software renaissance
0:17:17 of the last 15 years.
0:17:18 Absolutely, decoupling, right?
0:17:20 Composition and abstraction is sort of the fundamental basis
0:17:22 of the entire computing revolution.
0:17:24 But I think when you’re like him
0:17:26 and you’re trying to bring a new device to market,
0:17:28 kind of like what Jobs did with the iPhone,
0:17:29 by going full stack,
0:17:32 you end up unlocking massive efficiencies of cost.
0:17:35 And I think that’s what this event
0:17:38 may have been lost in the sort of theatricality of it all
0:17:40 is the fact that he’s able to deliver
0:17:43 an autonomous device to retail consumers
0:17:45 at a cost profile through vertical integration
0:17:47 that would just not be possible
0:17:48 if it was just a software team
0:17:50 buying hardware from somebody else and building on top.
0:17:52 Can we talk about those economics, by the way?
0:17:54 Just attacking that head on.
0:17:56 Both Optimus and CyberCAD were quoted
0:17:59 as being under the 30K range.
0:18:00 Is that really realistic?
0:18:02 And then tied into what you were saying,
0:18:04 we see other autonomous vehicles
0:18:07 which are betting more on the lidar and the sensors,
0:18:11 which also have come down in price pretty substantially.
0:18:14 My guess is Elon is backing into the cost
0:18:16 based on what people are willing to pay.
0:18:18 And he will do whatever it takes
0:18:19 to get those costs to line up.
0:18:21 I mean, it’s the same thing he did with SpaceX.
0:18:24 He will operate within whatever cost constraints
0:18:25 he needs to operate within,
0:18:26 even if the rest of the market
0:18:29 or the research community is telling him
0:18:30 it’s not possible.
0:18:33 Obviously, like a 30K humanoid robot
0:18:37 is way less than what most production
0:18:39 industrial robotic arms cost today,
0:18:41 which I think are more in the 100K range
0:18:43 for the ones that are used in like the high end factory.
0:18:47 So if he can get it down to 30K, that’s really exciting.
0:18:48 I also don’t necessarily think
0:18:50 you need even a 30K humanoid robot
0:18:53 to accomplish a wide swath of the automation tasks
0:18:55 that would pretty radically transform
0:18:57 the way our economy functions today.
0:18:58 Yeah, I think Aaron’s right
0:19:00 in that there’s probably a top-down director
0:19:01 who just do whatever it takes
0:19:02 to get into the cost footprint of–
0:19:04 This car has to cost 30K.
0:19:04 Right.
0:19:06 But I think if you do a bottoms-up analysis,
0:19:07 I don’t think you end up too far
0:19:09 because actually if you just break down
0:19:11 the kind of bomb on a Tesla Model 3,
0:19:13 you’re not dramatically far off
0:19:14 from the sensor stack you need
0:19:17 to get to a $30,000 car, right?
0:19:19 This is the beauty of solving
0:19:20 your hardware problems with software
0:19:24 is you don’t need a $4,000 scanning lighter
0:19:25 on the car.
0:19:28 So I think on the CyberCab,
0:19:29 I feel much more confident
0:19:32 that the cost footprint is going to fall in that range
0:19:33 because it’s frankly–
0:19:35 We kind of have at least an ancestor on the streets, right?
0:19:38 The thing that gets prices up is custom sensors
0:19:40 because it’s really expensive
0:19:43 to build custom sensors in short production runs.
0:19:45 And so you either have scale of manufacturing
0:19:47 like an Apple and you make a new CMOS sensor
0:19:49 or a new Face ID sensor
0:19:51 and you get cost economies of scale
0:19:53 because you’re shipping more like 30 million devices
0:19:54 in your first run.
0:19:56 Or you just lean on commodity sensors
0:19:57 from the past era
0:19:59 and then you tackle most of your problems in software,
0:20:01 which is what he’s doing.
0:20:03 And to that point, when he’s betting on software,
0:20:05 another interesting thing that he announced
0:20:07 was really over-specking these cars
0:20:09 to almost change the economics potentially
0:20:11 based on the fact that those cars
0:20:13 could be used for distributed computing.
0:20:13 To your point, Ange,
0:20:17 if you put a bunch of really expensive sensors on the car,
0:20:20 you can’t really distribute the load of that
0:20:23 in any other way than driving the car, right?
0:20:25 But if you actually have this computing layer
0:20:27 that’s again, in his case,
0:20:29 he’s saying he’s planning to over-spec,
0:20:30 that actually can fundamentally change
0:20:32 like what this asset is.
0:20:34 And you kind of saw the same thing
0:20:35 even with Tesla’s today
0:20:37 where he’s talking about this distributed grid, right?
0:20:39 Where all of a sudden these large batteries
0:20:41 are being used not just for the individual asset.
0:20:43 So do you have any thoughts on that idea
0:20:44 or if we’ve seen that elsewhere?
0:20:46 He was a bit skimpy on details on that.
0:20:47 Of course.
0:20:50 But I think he did say that the AI5 chip is over-spec’d.
0:20:54 It’s probably going to be four to five times more powerful
0:20:56 than the HW4, which is their current chip.
0:20:58 It’s going to draw four times more power,
0:21:01 which probably puts in that like 800 watts or so range,
0:21:02 which for context,
0:21:05 your average hair dryer is at about 1800 watts.
0:21:07 I mean, it’s hard to run power on the edge.
0:21:11 But I think what he said was something to the effect of like,
0:21:14 your car’s not working 24 hours a day.
0:21:17 So if you’re driving, call it eight hours a day in LA traffic.
0:21:19 God bless whoever’s having to do that.
0:21:20 For real.
0:21:21 Hopefully they’re using self-driving.
0:21:22 One would hope.
0:21:24 Actually, he opened up his pitch with a story
0:21:25 about driving to El Segundo
0:21:27 and he’s saying you can fall asleep
0:21:28 and wake up on the other side.
0:21:30 But I think the t-shirt size he gave
0:21:37 was about 100 gigawatts of unallocated inference compute
0:21:38 just sitting out there in the wild.
0:21:41 And I think his shorthand for it was like the AWS,
0:21:42 of AI, right?
0:21:45 He has got this idea of this distributed swarm
0:21:48 of unutilized inference computers.
0:21:50 And it’s a very sexy vision.
0:21:51 I really want to believe in it.
0:21:53 Ground us on, is this realistic?
0:21:55 Well, I think it’s realistic for workloads
0:21:58 that we don’t know yet in the following sense, right?
0:22:00 That the magic of AWS is that it’s centralized
0:22:03 and it abstracts away a lot of the complexity
0:22:05 of hardware footprints for developers.
0:22:08 And by centralizing almost all their data centers
0:22:09 in single locations with very cheap power
0:22:12 and electricity and cooling,
0:22:14 these clouds are able to pass on very cheap inference costs
0:22:15 to the developer.
0:22:16 Now, what he’s got to figure out is
0:22:19 how do you compensate for that in a decentralized fashion?
0:22:20 And I think we have kind of prototypes of this today.
0:22:23 Like, they’re these vast decentralized clouds.
0:22:24 I think one is literally called vast
0:22:26 of people’s unallocated gaming rigs.
0:22:31 People have millions of Nvidia 4090 gaming cards
0:22:32 sitting on their desks that aren’t used.
0:22:34 And historically, those have not yet turned
0:22:37 into great businesses or high-utilized networks
0:22:39 because developers are super sensitive
0:22:42 to do things, cost and reliability.
0:22:44 And by centralizing things,
0:22:47 AWS is able to ensure very high uptime and reliability,
0:22:50 whereas somebody’s GPU sitting on their…
0:22:52 Maybe available, maybe they’re driving to Elson now.
0:22:53 Right, they’re right.
0:22:54 And there are just certain things,
0:22:56 especially with AI models that are hard to do
0:22:58 on highly distributed compute
0:23:00 where you actually need good interconnect
0:23:03 and you need things to be reasonably close to each other.
0:23:03 Maybe in his vision,
0:23:06 there’s a world where you have optimist robots
0:23:10 in every home and somehow your home optimist robot
0:23:12 can take advantage of additional compute
0:23:14 or additional inference with your Tesla car
0:23:16 that’s sitting outside in your driveway.
0:23:17 Who knows?
0:23:18 Right.
0:23:20 Okay, well, this event clearly was focused
0:23:24 on different models that are consumer-facing.
0:23:25 So again, CyberCab,
0:23:27 that’s for someone using an autonomous vehicle.
0:23:30 Optimist is a human-eyed robot probably in your home.
0:23:33 But, Erin, you’ve actually been looking
0:23:36 at the hardware software intersection
0:23:37 in a bunch of other spaces, right?
0:23:39 And as you alluded to earlier,
0:23:41 maybe different applications
0:23:43 with better economics at least today.
0:23:45 I think long-term,
0:23:48 there’s no market bigger than the consumer market.
0:23:51 So everyone having a robot in their home
0:23:54 and a Tesla car in their driveway
0:23:57 that’s also a robot taxi has huge economic value.
0:23:59 But that’s also a really long-term vision
0:24:02 and there’s just so much happening in autonomy
0:24:05 that’s taking advantage of the momentum
0:24:07 and the developments that companies like Tesla
0:24:09 have put forward into the world over the last decade
0:24:11 that actually have the potential
0:24:15 to have meaningful impact on our economy in the short term.
0:24:17 I think the biggest broad categories for me
0:24:21 are largely the sort of dirty and unsexy industries
0:24:24 that have very high cost of human labor,
0:24:28 often because of safety or location access,
0:24:32 whether that’s an oil rig out in the middle of Oklahoma
0:24:35 somewhere that’s three hours drive from New York City,
0:24:38 whether that’s a mine somewhere in rural Wyoming
0:24:40 that freezes over for six months out of the year
0:24:43 so humans can’t live there and mine,
0:24:46 whether that’s a battlefield where you’re starting to see
0:24:49 autonomous vehicles go out and clear bombs and mines
0:24:51 from battlefields to protect human life.
0:24:54 There’s so many different use cases
0:24:56 for a lot of this underlying technology
0:24:58 that are really starting to see the light of day.
0:25:00 So very excited about that.
0:25:02 And as we think about that opportunity,
0:25:05 you’ve also talked about this software-driven autonomy stack.
0:25:07 So as you think about the stack, what are the layers?
0:25:09 Can you just break that down?
0:25:09 Yeah, sure.
0:25:12 So across whether it’s a self-driving car
0:25:14 or sort of an autonomous control system,
0:25:19 we’re seeing the stack break down into pretty similar categories.
0:25:20 So first is perception.
0:25:23 You have to see the world around you,
0:25:25 know what’s going on, be able to see if there’s a trash can,
0:25:28 be able to understand if there’s a horizon, if you’re a boat.
0:25:30 The second is something Anjno’s really well,
0:25:32 which is location and mapping.
0:25:33 So, okay, what do I see?
0:25:36 How do I find out where I am within that world
0:25:39 based on what I can see and what other sensors I can detect,
0:25:42 whether it’s GPS, which often isn’t available
0:25:46 in battlefields or in warehouses, et cetera.
0:25:47 The third is planning and coordination.
0:25:52 So that’s, okay, how do I take a large task
0:25:55 and turn it into a series of smaller tasks?
0:25:58 So what is more of an instant reaction?
0:26:02 I don’t have to really think about how to take a drink of water,
0:26:07 but I might have to think about how to make a glass of lemonade from scratch.
0:26:12 So how do I think about compute across those different types of regimes
0:26:13 when something is more of an instinct,
0:26:16 versus when something has to be sort of taken down
0:26:19 and processed into discrete operations?
0:26:20 And then the last one is control.
0:26:23 So that’s like, how does my brain talk to my hand?
0:26:26 Like, how do I know what are the nerve endings doing
0:26:29 in order to pick up this water bottle and take a drink out of it?
0:26:31 And that’s a really interesting kind of field
0:26:34 that’s existed for decades and decades,
0:26:36 but for the first time, probably since the ’70s,
0:26:38 we’re starting to see really interesting stuff happen
0:26:41 in the space of controls around autonomy and robotics.
0:26:45 And I would say like, all of these are pre-existing areas.
0:26:49 None of this is wildly new, but I think in the last two years,
0:26:51 especially with everything that’s happening
0:26:53 with deep learning video language models,
0:26:57 broadly speaking, AI has pushed every single one of these
0:27:00 kind of to its limit and to a new state of the art.
0:27:03 And there just aren’t tools that exist to tie all that together.
0:27:05 So every single robotics company,
0:27:06 every single autonomous vehicle company
0:27:10 is basically like rebuilding this entire stack from scratch,
0:27:13 which we see as investors as a really interesting opportunity
0:27:15 as the ecosystem evolves.
0:27:17 And as you think about that ecosystem,
0:27:19 people kind of say that as soon as you touch hardware,
0:27:21 you’re literally working on hard mode
0:27:23 compared to just a software-based business.
0:27:25 So what are the unique challenges?
0:27:29 Even with maybe that AI wave today that’s pushing things ahead,
0:27:32 how would you break down what becomes so much harder?
0:27:34 I think Ange touched on this a little bit before,
0:27:37 but the more you can commoditize the hardware stack, the better.
0:27:39 So the most successful hardware companies
0:27:42 are the ones that aren’t necessarily inventing a brand new sensor,
0:27:45 but are just taking stuff off the shelf and putting it together.
0:27:47 But still like tying everything together is really hard.
0:27:51 Like when you think about releasing a phone, for example,
0:27:53 Apple has a pretty fast shipping cadence,
0:27:57 and they’re still releasing a new phone only every once a year.
0:28:02 So you have to essentially tie a lot of your software timelines
0:28:05 to hardware timelines in a way that doesn’t exist
0:28:08 when you can just sort of ship whenever you want in a cloud.
0:28:09 If you need a new sensor type,
0:28:12 or you need a different kind of compute construct,
0:28:14 or you need something fundamentally different in the hardware,
0:28:16 you’re bound by those timelines.
0:28:19 You’re bound by your manufacturer’s availability.
0:28:24 You’re bound by how long it takes to quality engineer and test a product.
0:28:26 You’re bound by supply chains.
0:28:29 You’re bound by figuring out how these things have to integrate together.
0:28:34 So the cycles are often just quite a lot slower.
0:28:37 And then the other thing is when you’re interacting with the physical world,
0:28:41 you get into use cases that touch safety in a really different way
0:28:44 than we think about with pure software alone.
0:28:49 And so you have to design things for a level of hardiness and reliability
0:28:52 that you don’t always have to think about with software by itself.
0:28:55 If your chat GBT is a little slow, it’s fine.
0:28:56 You can just try again.
0:28:57 But if you have an autonomous vehicle
0:29:02 that’s like driving a tank on a battlefield autonomously,
0:29:05 and something doesn’t work, you’re kind of screwed.
0:29:09 So you have to have a much higher level of rigor and testing and safety
0:29:13 built into your products, which slows down the software cycles.
0:29:18 The Holy Grail is sort of general purpose intelligence for robotics,
0:29:22 which we still don’t have when you train a general model.
0:29:26 You basically get the ability to build hardware systems
0:29:28 that don’t have to be particularly customized.
0:29:31 And that reduces hardware iteration cycles dramatically.
0:29:33 Because you can basically say, look, roughly speaking,
0:29:36 these are the four or five commodity sensors you need.
0:29:38 The smarter your brain, so to speak,
0:29:41 the less specialized your appendages have to be.
0:29:43 And I think what a number of really talented teams
0:29:44 are trying to solve today is,
0:29:47 can you get models to generalize across embodiments?
0:29:50 Can you train a model that can work seamlessly
0:29:53 on a humanoid form factor or a mechanical arm,
0:29:55 a quadruped, whatever it might be?
0:29:57 And I’m quite bullish that it will happen.
0:29:59 I think the primary challenge there
0:30:00 that teams are struggling with today
0:30:03 is the lack of really high quality data.
0:30:06 The big unknown is just how much data,
0:30:08 both in quantity and quality,
0:30:10 do you really need to get models to be able to reason
0:30:13 about the physical world spatially
0:30:16 in a way that abstracts across any hardware?
0:30:18 I’m completely convinced that once we unlock that,
0:30:20 the applications are absolutely enormous.
0:30:23 Because it frees up hardware teams,
0:30:23 like Aaron was saying,
0:30:26 from having to couple their software cycles
0:30:27 from hardware cycles.
0:30:29 It decouples those two things.
0:30:30 And I think that’s the holy grail.
0:30:34 I think what Tesla, the victory of the autonomy team over there,
0:30:37 is having realized eight years ago
0:30:40 the efficacy of what we call early fusion foundation models,
0:30:42 which is the idea that you take a bunch of sensors
0:30:46 at training time and different inputs of vision, depth.
0:30:48 You take in video, audio.
0:30:51 You take a bunch of different six-doff sensors
0:30:52 and you tokenize all of those
0:30:55 and you fuse them all at the point of training.
0:30:57 And you build an internal representation
0:30:59 of the world for that model.
0:31:02 In contrast, the LLM world does what’s called late fusion.
0:31:04 You often start with a language model.
0:31:05 Let’s train just on language data
0:31:07 and then you duct tape on these other modalities,
0:31:09 like image and video and so on.
0:31:11 And I think the world has now started to realize
0:31:13 that early fusion is the way forward,
0:31:16 but of course they have an eight-year head start.
0:31:17 And so I get really excited when I see teams
0:31:20 either tackling the data sort of challenge
0:31:22 for general spatial reasoning,
0:31:24 or teams that are taking these early fusion
0:31:26 foundation model approaches to reasoning
0:31:29 that then allow the most talented hardware teams
0:31:31 to focus really on what they know best.
0:31:34 Where are these companies getting training data?
0:31:36 Because you mentioned Tesla, for example.
0:31:38 Yes, we’ve had cars on the road,
0:31:40 tons of them with these cameras and sensors.
0:31:44 I still think that one of the smartest things Elon did
0:31:47 was turn on full self-driving for everybody
0:31:50 for like a month-long trial period last summer.
0:31:53 I have a Tesla and I turned it on for my free month.
0:31:56 And it was like a life-changing experience using.
0:31:58 And I obviously couldn’t get rid of it.
0:32:00 And so now, not only do I now pay for full self-driving,
0:32:01 but I also–
0:32:02 You’re feeding the pipeline.
0:32:04 I’m giving him all my data.
0:32:06 So to me, that’s really clever.
0:32:08 And so I’m curious if you talk about
0:32:10 some of these other applications.
0:32:13 Do they have the number of devices,
0:32:16 or in this case, cars for Tesla capturing this data?
0:32:19 Or how else are we going to get this spatial data?
0:32:22 This is the big race in robotics right now.
0:32:24 I think there are several different approaches.
0:32:27 Some people are trying to use video data for training.
0:32:30 Some people are investing a lot in simulation
0:32:33 and creating digital 3D world.
0:32:37 And then there’s a mad rush for every kind of generated data
0:32:38 that you could possibly have.
0:32:41 So whether that’s robotic tele-operated data,
0:32:44 whether that’s robotic arms and offices,
0:32:48 most of these robotics companies have pretty big outposts
0:32:50 where they’re collecting data internally.
0:32:52 They’re giving humanoids to their friends
0:32:53 to have in their homes.
0:32:55 It’s a yes and scenario right now
0:32:58 where everyone is just trying to get their hands on data
0:32:59 literally however they can.
0:33:00 I think it’s the Wild West.
0:33:03 But if you’re Tesla, then the secret weapon you have
0:33:05 is you’ve got your own factories, right?
0:33:08 So the Optimus team has a bunch of humanoids
0:33:09 walking around the factories,
0:33:11 constantly learning about the factory environment.
0:33:13 And that gives them this incredible self-fulfilling
0:33:14 sort of compounding loop.
0:33:16 And then of course he’s got the Tesla fleet,
0:33:19 like Aaron was saying earlier with FSD.
0:33:21 I’m proud to have been a month one subscriber for it.
0:33:24 And I’m happy that I’m contributing to that training cycle
0:33:26 because it makes my Model X smarter next time around.
0:33:29 So the challenge then is for companies that don’t have
0:33:32 their own full stack, sort of fully integrated environment,
0:33:34 right, where they don’t have deployments out in the field.
0:33:37 And Darren’s point, you can either take the simulation route
0:33:38 for that and say we’re going to create
0:33:40 these sort of synthetic pipelines.
0:33:43 Or we’re seeing this huge build out of teleop fleets.
0:33:44 Like with language models,
0:33:46 you had people all around the world in countries
0:33:47 showing up and labeling data.
0:33:50 You have teleop fleets of people
0:33:52 piloting mechanical arms halfway around the world.
0:33:56 I think there’s an interesting sort of third new category
0:33:57 of efforts we’re tracking,
0:33:59 which is crowdsourced coalitions, right?
0:34:02 So an example of this is the DeepMind team
0:34:04 put out this maybe a year and a half ago,
0:34:06 robotics dataset called RTX,
0:34:08 where they partnered with a bunch of academic labs
0:34:09 and said, hey, you send us your data.
0:34:11 We’ve got compute and researchers.
0:34:13 We’ll train the model on your data and then send it back to you.
0:34:15 And what’s happening is there’s just different labs
0:34:18 around the world who have different robots of different kinds.
0:34:20 Some are arms, some are quadruped, some are bipeds.
0:34:21 And so instead of needing all of those
0:34:23 to be centralized in one place,
0:34:25 there’s a decoupling happening where some people are saying,
0:34:28 well, we’ll specialize in providing the compute
0:34:30 and the research talent, and then you guys bring the data.
0:34:32 And then it’s a give-to-get model, right?
0:34:34 Which we saw in some cases with the internet early on.
0:34:35 And Vidya is an example of this
0:34:39 where their research team is stacking a bunch of robots in-house.
0:34:41 So they instead partnering with people like pharma labs
0:34:44 who have arms doing the betting and wet lab experiments
0:34:45 and saying, you send us the data.
0:34:46 We’ve got a bunch of GPUs.
0:34:47 We’ve got some talented deep learning folks.
0:34:49 We’ll train the model, send it back to you.
0:34:51 And I think it’s an interesting experiment.
0:34:53 And there’s reason to believe this sort of give-to-get model
0:34:56 might end up actually having the highest diversity of data.
0:34:59 But we’re definitely in full experimentation land right now.
0:35:01 Yeah, and my guess is we’ll need all of it.
0:35:03 So it sounds like data is a big gap
0:35:05 and it sounds like some builders are working on that.
0:35:08 But where would you guys like to see more builders focused
0:35:11 in this hardware software arena,
0:35:14 especially because I do think there are some consumer-facing areas
0:35:15 where people are drawn to.
0:35:16 They see an event like this and they’re like,
0:35:18 oh, I want to work on that.
0:35:20 Yeah, I’m pretty excited about the long tail
0:35:23 of really unsexy industries
0:35:26 that have outsized impact on our GDP
0:35:28 and are often really critical industries
0:35:31 where people haven’t really been building for a while,
0:35:36 things like energy, manufacturing, supply chain, defense.
0:35:38 These industries that really carry the U.S. economy
0:35:41 and where we have under-invested from a technology perspective,
0:35:43 probably in the last several decades,
0:35:45 are poised to be pretty transformed
0:35:48 by this sort of hardware software melding in autonomy.
0:35:49 I’d love to see more people there.
0:35:52 I’m very excited for all the applications
0:35:53 they’re in and talked about.
0:35:54 And I think to unlock those,
0:35:57 we really need a way to solve this data bottleneck, right?
0:36:00 So startups, builders who are figuring out really novel ways
0:36:02 to collect that data in the world,
0:36:04 get it to researchers, make sense of it, curate it.
0:36:06 I think that’s sort of a fundamental limit
0:36:08 around progress across all of these industries.
0:36:09 We just need to sort of 10x
0:36:11 the rate of experimentation in that space.
0:36:16 All right, that is all for today.
0:36:19 If you did make it this far, first of all, thank you.
0:36:21 We put a lot of thought into each of these episodes,
0:36:23 whether it’s guests, the calendar tetras,
0:36:25 the cycles with our amazing editor, Tommy,
0:36:27 until the music is just right.
0:36:29 So if you like what we put together,
0:36:33 consider dropping us a line at ratethespodcast.com/a16z.
0:36:36 And let us know what your favorite episode is.
0:36:38 It’ll make my day, and I’m sure Tommy’s too.
0:36:40 We’ll catch you on the flip side.
0:36:43 (gentle music)
0:36:53 [BLANK_AUDIO]

What does Rich Sutton’s “Bitter Lesson” reveal about the decisions Tesla is making in its pursuit of autonomy?

In this episode, we dive into Tesla’s recent “We, Robot” event, where they unveiled bold plans for the unsupervised full-self-driving Cybercab, Robovan, and Optimus—their humanoid robot, which Elon Musk predicts could become “the biggest product ever.”

Joined by a16z partners Anjney Midha and Erin Price-Wright, we explore how these announcements reflect the evolving intersection of hardware and software. We’ll unpack the layers of the autonomy stack, the sources of data powering it, and the challenges involved in making these technologies a reality.

Anjney, with his experience in computer vision and multiplayer tech at Ubiquity6, and Erin, an AI expert focused on the physical world, share their unique perspectives on how these advancements could extend far beyond the consumer market.

For more insights, check out Erin’s articles linked below. 

 

Resources: 

Find Anj on Twitter: https://x.com/anjneymidha

Find Erin on Twitter: https://x.com/espricewright

Read Erin’s article ‘A Software-Driven Autonomy Stack Is Taking Shape’: https://a16z.com/a-software-driven-autonomy-stack-is-taking-shape/

AI for the Physical World: https://a16z.com/ai-for-the-physical-world/

 

Stay Updated: 

Let us know what you think: https://ratethispodcast.com/a16z

Find a16z on Twitter: https://twitter.com/a16z

Find a16z on LinkedIn: https://www.linkedin.com/company/a16z

Subscribe on your favorite podcast app: https://a16z.simplecast.com/

Follow our host: https://twitter.com/stephsmithio

Please note that the content here is for informational purposes only; should NOT be taken as legal, business, tax, or investment advice or be used to evaluate any investment or security; and is not directed at any investors or potential investors in any a16z fund. a16z and its affiliates may maintain investments in the companies discussed. For more details please see a16z.com/disclosures.

Leave a Comment