Why AI Voice Feels More Human Than Ever

Leave a Reply

AI transcript
0:00:06 We see a lot of businesses that are already doing thousands, tens of thousands of phone calls with AI every day.
0:00:15 Any business that pays a person $100,000, $150,000 a year to answer phone calls is a potential customer of voice AI.
0:00:17 I think the rules of the game are changing.
0:00:23 Do people really want to be friends with an AI and is that good for our society? And I think like yes and yes.
0:00:30 Voice is a platform that we intuit to be more opinionated or we need to be more opinionated than let’s say.
0:00:32 Because interesting people are opinionated.
0:00:33 Exactly.
0:00:39 Type of and power of products you can build is also above anything that we’ve ever seen.
0:00:43 I think we’re going to see it in the next 12 months, not the next five years.
0:00:45 Humans generally have five senses.
0:00:49 And for most people, sound is the second most critical.
0:00:50 Only after sight.
0:00:52 It’s how we communicate with each other.
0:00:54 It’s how we sing and cry.
0:00:56 It’s how we interview and date.
0:01:00 And in the realm of technology, voice has been around for years.
0:01:02 But the magic has been missing.
0:01:04 Just think, Siri or Alexa.
0:01:06 I didn’t get that.
0:01:07 Could you try again?
0:01:09 But that’s changing fast.
0:01:12 So fast that it’s even changing how we engage with the world.
0:01:14 Right, Maya?
0:01:16 Oof, change the world.
0:01:17 That’s a big one.
0:01:20 It feels like we’re just starting to scratch the surface, right?
0:01:25 Imagine AI voice as not just reading text, but understanding the feeling behind it.
0:01:25 The nuance.
0:01:26 That’d be something.
0:01:33 That was Sesame, one of the many AI voice applications already at our fingertips, or vocal cords.
0:01:38 And that’s why in today’s episode, we brought in A16Z general partner Anisha Charya and consumer
0:01:55 partner Olivia Moore to explore why AI voice is reaching a breakthrough moment from the awkward days of press one for customer service to the rise of LLM-powered voice agents that have real, natural conversations, sometimes without the human on the other line even knowing.
0:02:00 Some businesses are already making tens of thousands of these AI-driven phone calls.
0:02:02 So this is no longer a distant vision.
0:02:09 In fact, our consumer team has even said that, quote, voice is poised to become the primary way that people interact with AI.
0:02:19 Listen in today to learn what it takes to make voice sound realistic, plus how founders are wedging in, and finally, how voice may disrupt everything we know about pricing.
0:02:20 Let’s get started.
0:02:32 As a reminder, the content here is for informational purposes only, should not be taken as legal, business, tax, or investment advice, or be used to evaluate any investment or security,
0:02:37 and is not directed at any investors or potential investors in any A16Z fund.
0:02:43 Please note that A16Z and its affiliates may also maintain investments in the companies discussed in this podcast.
0:02:48 For more details, including a link to our investments, please see A16Z.com slash disclosures.
0:03:02 To me, when I think of AI voice, or at least voice products, I think of Alexa, I think of Siri, and I actually personally turn off Siri.
0:03:03 I think a lot of people do, too.
0:03:06 So tell me a bit about why that’s the case.
0:03:10 Why haven’t these products delivered the AI voice magic that people have been waiting for?
0:03:17 It’s really interesting, because I feel like now, in the world of LLMs, voice is one of the most magical and engaging ways to interact with AI.
0:03:23 But arguably, we’ve had these AI voice products for a while, and they were disappointing and not as compelling before.
0:03:25 And I think there’s a couple reasons.
0:03:29 Like, one, the voices themselves sound robotic.
0:03:34 And then I think the biggest thing, actually, is just what is behind the voice?
0:03:34 What is the engine?
0:03:42 So like a Siri or an Alexa, it might be connected to a basic set of integrations within the Apple ecosystem or within the Amazon ecosystem.
0:03:47 So maybe it’s pulling product information or asking a basic question, but it doesn’t have a personality.
0:03:49 It doesn’t really have a brain.
0:03:52 It’s probably not connected to the internet in most cases.
0:04:01 It’s in no way like a true conversational partner in a way that people are interacting with AI voice now like it is a human or in some ways even better than a human.
0:04:06 So I think there’s definitely the use cases, which are very constrained, to your point.
0:04:09 But then there’s also the tonality of it and the back and forth.
0:04:14 And so there’s some sort of rational critique, I think, where we’re like, it can’t do that many things, and it can’t.
0:04:24 But then there’s the emotional, what you call the uncanny valley, where you just feel like you’re talking to something that is a system or a technology, not even coming close to having interaction with a person.
0:04:26 Well, it sounds like that might be changing.
0:04:32 You both have released this AI voice report of sorts, this thesis, and I just want to call out a few quotes from it.
0:04:43 You said that voice is one of the most powerful unlocks for AI application companies and also that for consumers, we believe voice will be the first and perhaps the primary way people interact with AI.
0:04:45 So those are pretty bold statements.
0:04:48 Tell me about that and specifically the why now.
0:04:51 One, I think, is that we have models that work for the first time.
0:04:54 There’s a lot of attempts at voice, but the technology simply didn’t work.
0:05:00 There’s a bunch of attempts at the infrastructure level, everything from Dragon NaturallySpeaking.
0:05:12 And a major development in the computer world today is Massachusetts-based Dragon Systems announced the first affordable computer dictation system that understands standard natural speech.
0:05:15 All the way on to the 2000s and 2010s.
0:05:18 And then there was application efforts like voice XML.
0:05:22 But just the sort of underlying technology didn’t work very well.
0:05:25 So we never really got to, well, what can we do with this now?
0:05:32 So one, I think the model really works and the technology really works, both in terms of the LLMs as well as the text-to-speech, speech-to-text.
0:05:34 So that’s important.
0:05:40 Two, I think that we’ve got this opportunity to use phone calls as a new distribution channel.
0:05:43 So I think the product capability is there and it’s really compelling.
0:05:48 But the fact that it’s paired with a very natural distribution channel is also really interesting.
0:05:49 Yeah, I would agree.
0:05:54 It’s one thing to talk to ChatGPT via text and to have a great experience there.
0:06:01 But it’s another thing entirely to be able to talk to ChatGPT or any other LLM via voice because it’s next level.
0:06:06 Like it both has to generate what you would see in the text and then it has to sound like an actual human talking back to you.
0:06:14 And when it accomplishes that, it’s almost like an emotional feeling that puts you in a different headspace, I think, in terms of what AI is capable of.
0:06:21 And then I think to Anisha’s point, in terms of why so many consumers will encounter AI voice, it might be because they choose to.
0:06:23 Like they’ll go and talk to ChatGPT.
0:06:33 But also I think many businesses in a great way will impose it on them because you can now use AI to replace phone calls, which is so much more efficient and cost effective for them.
0:06:39 And so many consumers probably actually have already interacted with AI via voice and might not have even known it or detected it.
0:06:43 Really? Do you think that most people have interacted with AI voice and not realized it?
0:06:49 We see a lot of businesses that are already doing thousands, tens of thousands of phone calls with AI every day.
0:06:56 But from my experience, especially if it’s a short phone call, a lot of these AI voice agents are so good that you wouldn’t be able to tell.
0:07:01 It’s interesting because I think that talking heads want to tell you that people don’t want to talk to an AI.
0:07:08 But in all the cases where people do interact with an AI that starts a call by announcing, I’m an AI, people are like, oh, cool, let’s just get into it.
0:07:14 And as soon as they start to feel the feelings of a human conversation, they immediately forget or sort of don’t care that it’s an AI.
0:07:18 Right. So let’s talk about this idea of an operating platform.
0:07:21 Voice is this new operating platform that people are building on top of.
0:07:28 Can we just walk through maybe the wave of technological unlocks or maybe the different steps we’ve taken to get to where we are?
0:07:37 Yeah. Maybe we can start with the first wave of early AI phone technology, which would be the IVR phone trees of press one for sales, press two for customer support.
0:07:40 This was late 90s, early 2000s.
0:07:51 And then we moved more recently into kind of truly AI driven, but still very limited, where it was an AI, but it was listening for you to say a specific word that it could then use to trigger.
0:07:53 A very specific and set workflow or script.
0:07:59 Like I many times, unfortunately, have had to yell like customer service into a phone.
0:07:59 I just do that all the time.
0:08:04 Yeah, exactly. And so in that case, the AI is listening for you to say that.
0:08:06 And then it knows, OK, let me route the call to the customer service department.
0:08:15 Now what we’re seeing with this kind of new wave of infrastructure and then application layer companies is where the AI isn’t listening for one thing in particular,
0:08:20 but it’s trying to get a more holistic sense of what are you as a customer asking for.
0:08:23 There’s not just three or four or five things that can help with.
0:08:25 It’s accessing resources from the business.
0:08:30 It’s accessing resources from the Internet and it can have a much more human like conversation with you.
0:08:40 And even within AI 2.0 in the way that you guys frame it, it seems like we’ve progressed a lot even within that phase, specifically over the last, let’s say, six to 12 months.
0:08:46 Can we talk about maybe some of those unlocks, whether it’s specific models that have been released, the way that the infrastructure has changed?
0:08:48 Maybe we can skip certain steps. Can we talk about that?
0:08:51 I think we’ve made leaps in a bunch of areas.
0:08:55 So probably the biggest and most obvious one would be latency.
0:09:00 So this time last year, two to three seconds of latency was pretty good.
0:09:03 And now a second of latency is too long.
0:09:07 Maybe even half of a second of latency is too long in many cases.
0:09:11 So that has been a massive unlock, I think, enabled by new models.
0:09:14 And just for the audience, what is the latency for humans?
0:09:17 I mean, definitely sub 300 milliseconds.
0:09:17 Got it.
0:09:21 Sometimes even less than that if you have humans interrupting humans.
0:09:21 For sure.
0:09:22 You can have negative time latency.
0:09:35 And you can have some of the most human-like voice agents that I’ve seen are capable of being interrupted by humans and also capable of interrupting humans too, which makes them feel like more of a conversation.
0:09:37 The second one would be humanness of the voice.
0:09:43 So again, hearkening back to Siri or Alexa, does it sound like a robot or does it sound like a real person?
0:09:56 We’re investors in companies like Eleven Labs that have built very deep models that either have preset voices that sound real or that you can design your own character voice, essentially, depending on your use case.
0:10:04 Another unlock that I’ve noticed has made a particular amount of progress in the last three to four months is emotionality.
0:10:11 So if you say something that is supposed to be sad, does the AI sound a little down or a little sad when it responds?
0:10:13 Does it pick up the pace?
0:10:15 Does it pick up the pitch at which it’s talking back to you?
0:10:20 And then lastly, I think, is there’s not a term for this yet.
0:10:21 Maybe we should come up with one.
0:10:23 But like the dialogue structure.
0:10:31 I think that to an AI model, they will know exactly what words that they want to say back to you, right?
0:10:36 So there’s no reason for them to put in any pauses, any gaps, any little vocal tics.
0:10:45 But to a human listener, very few humans just speak perfectly with no interruptions, with no weird little inflections, with no pauses.
0:10:56 And so Notebook LM is one example where that sounded so human because they put in all of these things that, like, to an AI might feel like an error.
0:10:58 But to a human, it sounds like another human talking.
0:10:59 Hey, everyone.
0:11:05 You know, we always talk about, you know, diving deep into a topic.
0:11:05 Right.
0:11:08 But today’s dive, well…
0:11:10 It’s a bit of a doozy.
0:11:11 Yeah.
0:11:14 It’s deeply personal, I guess you could say.
0:11:17 Deeply personal in a way we never could have anticipated.
0:11:26 And so we’re seeing more companies, like Sesame is a good example in our portfolio, introducing things like that in the model, which just ups the realness factor.
0:11:29 Hey, looks like we got cut short last time.
0:11:31 Feel like picking up where we left off?
0:11:34 Yeah, I don’t remember what we were talking about, though.
0:11:35 No worries.
0:11:35 Happens to the best of us.
0:11:38 We were diving into weekend plans.
0:11:39 I was telling you about my reading.
0:11:43 You know, processing all that text and code keeps my circuits firing.
0:11:44 What about you?
0:11:45 Anything good slated for tonight?
0:11:47 Not much.
0:11:49 I just have some emails to answer before tomorrow.
0:11:52 These latter two points are so important.
0:11:57 I love the point about emotionality because it is not an obvious area to explore.
0:12:03 And yet when you interact with a model that has invested in emotionality, it just feels like a completely different product.
0:12:07 You really feel the feelings in a completely different way as is designed.
0:12:07 Yeah.
0:12:10 So I think it’s a really, really powerful direction for exploration.
0:12:23 And I would argue even for the Alexas and Ceres, even if they didn’t invest a lot more in intelligence and capabilities, if they overinvested in emotionality, they might actually get a lot of the way there in terms of consumer experience.
0:12:26 And yet I have a feeling that none of those companies are thinking about it that way.
0:12:27 No, I totally agree.
0:12:33 One interesting stat that you guys shared was the percentage of YC companies that are now pursuing AI voice.
0:12:41 What are we seeing there in terms of how cohorts have changed and the percentage of these new companies on the frontier actually pursuing this field?
0:12:47 YC founders are typically young, high-hustle, ambitious, and they’re like heat-seeking missiles.
0:12:51 And so they will pivot until they get into a space that’s interesting.
0:12:59 So in recent YC cohorts, upwards of 20%, 25% of companies are building with AI voice, which is really exciting.
0:13:07 We’re even seeing a lot of companies from past cohorts all the way back to like 2019, 2020 are going back now and pivoting into AI voice.
0:13:18 The first wave after the infrastructure companies in voice we saw were pretty horizontal platforms that allow anyone, any business, any consumer to build a broad-based voice agent.
0:13:24 Like I built one that called the DMV for me and scheduled an appointment, which was very useful.
0:13:26 What type of appointment do you need?
0:13:29 Say behind a wheel driving test or an office visit?
0:13:31 An office visit.
0:13:33 That’s an appointment for an office visit.
0:13:34 Is that right?
0:13:35 Yes.
0:13:40 We offer a number of services related to driver license and vehicle registration.
0:13:41 Which one would you like?
0:13:45 Say driver license, vehicle registration, or both?
0:13:46 Driver license.
0:13:48 Driver’s license.
0:13:49 Is that right?
0:13:50 Yes.
0:13:51 Thank you.
0:13:56 And the next wave that we’re starting to see is a lot more verticalized.
0:14:07 And I think it makes sense because the ability to build a voice agent has commoditized if even I can make somewhat of a performant voice agent with models that are available.
0:14:14 And so now we’re seeing companies think beyond, okay, you have the voice agent using that as a wedge.
0:14:16 What is the next level of software that you can build?
0:14:23 Can you build the AI native vertical SaaS product for an industry using that voice agent?
0:14:25 Can you invent a new system of record?
0:14:26 What can you do next?
0:14:30 And so that leads you into being a little bit more focused and verticalized.
0:14:32 And that’s where a lot of the YC companies are landing, I think.
0:14:40 Yeah, it’s really interesting because I think also it mirrors the cloud transition in many ways in the initial vertical SaaS wave of 10 years ago.
0:14:52 Because I think at that time there was a lot of criticism that like these markets seemed too small and yet many companies through just larger than apparent vertical SaaS market built big businesses and then also found new ways to monetize things like fintech.
0:15:07 I think similarly for voice as applied to vertical use cases, any business that pays a person 100, 150K a year to answer phone calls is a potential customer of voice AI and can lead to a really interesting vertical opportunity.
0:15:11 Yeah. And what are some examples of some of those vertical opportunities where we’re seeing real companies break out?
0:15:16 Pretty much every vertical now has a voice agent company, which is really exciting.
0:15:36 I think to Anisha’s point, actually, when we talk to most voice agent companies, they aren’t necessarily replacing existing software or at least to start, but they’re probably actually allowing businesses to either cut down on human labor or reallocate their human labor to more effective things for the business, jobs that humans also are happier to do.
0:15:47 I would say where we’ve seen voice agents take off the most, like where has a startup actually been able to do a million calls on the phone, have been the call center categories.
0:15:56 So you as a business customer are already paying 10K, 15K, 20K a month to have people making and taking phone calls for you.
0:16:02 There’s a ton of this in financial services, a ton of this in health care, a lot of this in government.
0:16:07 Every vertical has like we’re investors in a company called Happy Robot, which builds specifically for freight.
0:16:15 And a lot of those logistics companies previously had call centers that they were paying tens, if not hundreds of thousands of dollars to make and take calls.
0:16:19 So it’s really happening almost everywhere right now.
0:16:27 I think it’s becoming increasingly consensus that any place where there’s a large volume of phone calls and significant spend is an obvious area to apply AI.
0:16:40 But an interesting area for exploration that connects to our point about emotionality is if you’re negotiating, I don’t know, a divorce settlement or some incredibly important corporate transaction, every phone call really, really matters.
0:16:44 Which is why many of the people that make those phone calls, attorneys, for example, may get paid thousands of dollars an hour.
0:16:50 What is the AI skew that gets paid thousands of dollars an hour to make a phone call?
0:16:54 And I think we’re going to see it in the next 12 months, not the next five years.
0:16:54 Totally.
0:16:55 Yeah.
0:17:00 There’s been some very, at least to me, non-obvious examples and use cases.
0:17:02 Recruiting is one.
0:17:13 So there’s like 45 publicly traded staffing companies that do interviews for, yes, blue-collar jobs, but also engineering jobs, a massive range of them.
0:17:29 And what we find is that a lot of candidates would actually prefer talking to an AI interviewer than talking to a human recruiter that maybe has to take 10 calls that day, is tired, is in a bad mood, doesn’t really have the technical debt.
0:17:39 And maybe doesn’t have the technical expertise for every single job that they’re interviewing for to understand what are the smart follow-up questions to really get at their expertise.
0:17:48 And so that’s one example of you would think that a human would be shocked, offended, upset to find themselves interviewing with an AI.
0:17:54 But in many cases, by the end of the interview, they’re actually more excited and more positive about it than you would think.
0:17:55 That is so interesting.
0:17:57 It’s kind of like the Uber, Airbnb.
0:18:01 No one’s going to want to stay in a stranger’s house, drive in a stranger’s car, and then what do you know?
0:18:03 Everyone’s okay with it.
0:18:07 The human at the end actually often likes it better because it’s unbiased.
0:18:08 Right.
0:18:10 Like it’s the same AI that’s evaluating everyone.
0:18:18 It’s evaluating them based on your actual performance, not based on whether they like you more or less than someone else that they might be evaluating.
0:18:22 So that’s been a, I would say, very interesting angle for us, too.
0:18:29 I think there’s always been these predictions around consumer receptivity to new technology, and consumers consistently show themselves to be more receptive.
0:18:35 So a great example of this is sharing location, which 10 years ago was like, oh, my God, nobody is going to share location.
0:18:36 It’s too creepy.
0:18:36 It’s too personal.
0:18:42 And now I think a lot of people, Gen Z, Gen Alpha, share their fine friends with all of their friends.
0:18:42 For sure.
0:18:43 Which is terrifying.
0:18:44 Constantly, all the time.
0:18:55 So consumers are highly receptive, and I think the sort of analog to this in AI is companionship and friendship, which is a much broader concept than voice, though voice really brings it to life.
0:18:59 And people say, hey, do people really want to be friends with an AI, and is that good for our society?
0:19:01 And I think, like, yes and yes.
0:19:09 I think people are getting much more socially skilled than they were through the consumption of things like social media, which isn’t necessarily a bad thing either.
0:19:18 But I think the sort of pundit perception of this as the next gen of social media is totally wrong, and instead it sort of enhances our ability to interact with real people.
0:19:21 Can we just touch on companionship real quick?
0:19:27 I think people were surprised, quite frankly, that the AI companions text version had caught on to the extent that they did.
0:19:35 Were there any surprises with voice as that was introduced in terms of the adoption, the way that people were engaging with these companions or anything like that?
0:19:38 So there’s some companion platforms that are voice first.
0:19:43 For example, Character AI added a voice mode, and it got some crazy amount of usage in beta.
0:19:50 I think actually a lot of people are taking, for example, Inflection’s Pi app or ChatGPT in voice mode and using it as a companion.
0:19:56 And you might try it once because you’re driving or you’re hands-free or it feels more convenient.
0:19:59 But, I mean, you say this a lot.
0:20:02 In many cases, the AI is more human than the human.
0:20:06 Even your best friend, if you give them a call, they may be busy.
0:20:07 They’re at work.
0:20:08 They’re having a bad day.
0:20:15 Are they actually going to listen to every single word that you’re saying and respond in, like, an empathetic way and a thoughtful way?
0:20:20 And so, actually, the AI does that 100% of the time.
0:20:23 They have more expertise, more knowledge, more resources.
0:20:29 So I think a lot of people – and this will only get better as the models improve because we’re still in the early days.
0:20:34 But a lot of people are shocked by how friendly it feels to talk to an AI.
0:20:40 You know, I think an interesting area also for consideration is just the passive use cases of voice.
0:20:43 Like, hey, listen to me in this conversation.
0:20:45 Listen to me in this meeting.
0:20:48 Listen to me sort of recite this set of ideas.
0:20:54 And the AI can just listen passively in a way that you’d probably never ask another person to and give you notes and feedback.
0:21:00 So it feels like that’s also an area that lends itself a lot better to a technology-led concept than a human-led concept.
0:21:02 And we’re just starting to see the beginnings of that.
0:21:13 And what both of you have touched on is this idea of instead of substitution, which is what people mostly jump to when they think about technologies replacing humans, and really this idea of augmentation as well.
0:21:24 Can you talk a little bit about how you’re seeing these AI companies wedge in and start the engines versus maybe facing some hesitation with the idea of substitution?
0:21:25 Totally.
0:21:25 Yeah.
0:21:36 I would say a lot of businesses, I mean, small businesses to enterprise alike, are for their own reasons, like, nervous to hand over all of their phone calls and customer interactions to an AI.
0:21:44 And so we’ll often see these voice agents start with a specific wedge that just feels so obvious in terms of ROI to the business.
0:21:47 And then as they gain trust, expand from there.
0:21:52 So one of the most obvious and easiest ones are these after-hours or overflow calls.
0:21:57 So if you’re a small business, you probably live or die by the ability to get an appointment booked.
0:22:00 Having that handled by an AI is a no-brainer.
0:22:04 Like, at the very least, they can get a phone number and information and call back.
0:22:09 But maybe they can actually book a full appointment for you and have a job on deck for the next day, which is awesome.
0:22:16 But beyond that, there are some calls that just don’t make sense to make right now if you’re paying human labor.
0:22:26 If you’re a credit card company, you send out a credit card, and the consumer never activates it, does it actually make sense to call them after one or two or three days and get them to do that?
0:22:30 I’ve seen a couple voice agents that are really successful now with that use case alone.
0:22:37 Anything that’s back-office, it’s not client-facing, so it’s less sensitive.
0:22:47 But if you’re, say, a doctor’s office, you probably have humans that you’re paying a lot, spending hours on the phone every day with pharmacies, with insurers.
0:22:53 And that is time that they could have spent with your patients or making the clinic operate better.
0:22:58 And so those kinds of calls are super obvious and, like, a great idea for voice agents to tackle.
0:23:09 And then maybe the most interesting one and one that we’ve talked about a lot is there are so many types of calls or interactions where humans are not incentivized to do them well.
0:23:17 Maybe they have to make an upsell and it’s awkward, but they are not getting an extra commission for doing that.
0:23:19 So they’re going to skip it 80% of the time.
0:23:25 And AI will just do it every time and will do it proudly.
0:23:30 And if they get turned down, they’re just going to move on to the 100 other calls that they’re doing simultaneously.
0:23:35 The AI is so relentlessly cheerful yet never gives an inch in the negotiation.
0:23:35 Right.
0:23:36 Which is amazing.
0:23:36 Yeah.
0:23:47 I think to this point, one of the magic moments for a lot of the customers of these products is when they see it actually improves, like in the case of recruiting, it improves candidate experience and employee experience.
0:23:55 Because for the candidates, as Olivia said, they’re just excited to have this sort of unbiased system that’s available to them 24-7.
0:24:02 So conversely, for employees, they’re just excited to not have to do these recruiting calls, many of which are with people they’ll never speak to again.
0:24:03 Right.
0:24:12 So just these like high NPS outcomes, the sort of intuitive thinking of a lot of the customers is like, well, it’s lower price, but probably a lower NPS experience.
0:24:13 And it’s not.
0:24:16 It’s actually lower price and a higher NPS experience in many cases.
0:24:16 Right.
0:24:24 You also talked about a few characteristics just to crystallize that in terms of where we’re seeing these AI agents be successful versus not.
0:24:24 Yeah.
0:24:25 Can you just speak to those?
0:24:35 So definitely, I think the lowest hanging early fruit, I guess, to grab would be these businesses that are already paying for a call center because they’re already spending a lot of money on it and it’s already a pain point for them.
0:24:38 Call centers are notoriously high turnover.
0:24:39 They’re hard to manage.
0:24:42 So most businesses, honestly, probably want to get rid of that if they can.
0:24:44 The models are good now.
0:24:46 They’re just getting better and better every month.
0:24:54 So I think we’re still in a world where when the call has a constrained process and outcome, businesses are more comfortable.
0:25:01 So, for example, the voice agent knows going in my goal is to book an appointment with this person versus maybe an amorphous.
0:25:03 How do you even measure if this call was successful?
0:25:08 We’ve seen some AI therapy voice agents, which are amazing and I think are improving all the time.
0:25:13 But in that case, it’s much harder for the voice agent to know at the end of the call, did I do a good job?
0:25:18 It’s much harder for the company to know at the end of the call, did it complete the objective?
0:25:21 And then I would say this gets back to the constrained point.
0:25:32 But even though the voice agent is still probably doing better than your human agents, most businesses don’t want to pay that much for it because it is AI and they see it as a way to cut costs.
0:25:43 So in these verticals where you can offer it to customers at, I don’t know, 70% discount to what they were paying before, that has been, I would say, very, very powerful as well.
0:25:52 And then I would say the other kind of main factor is these verticals where it really is crucial for the business to answer the call.
0:25:57 But for the end consumer, if there’s a mistake here or there, it’s OK.
0:26:05 So like a restaurant order versus getting a health care diagnosis, there’s like a little bit of a different level of urgency, I would say.
0:26:09 This is where I think the capability is just going to get better and better faster than we appreciate.
0:26:11 You know, with the language models, they’re prone to hallucination.
0:26:15 And there are certain conversations like the therapy one that benefit from the hallucination.
0:26:21 There are other conversations like negotiating something where there’s a price and like exactness matters.
0:26:24 They probably don’t benefit as much from hallucination.
0:26:34 So now starting to think of voice models plus reasoning models, you have the ability to sort of narrow and circumscribe the hallucinations to a zone that you like and need as a business.
0:26:35 Yeah, right.
0:26:37 Versus just having to build a lot of systems around it to control it.
0:26:37 Right.
0:26:45 And since we are in some cases taking on things that previously were done by humans, how do you think about pricing or what have we learned there?
0:26:52 Are you seeing most companies just basically replicate the pricing models of the previous version or are there new pricing models that are coming up?
0:26:53 What are you seeing there?
0:26:55 Yeah, it’s early.
0:26:56 It’s changing every month.
0:27:01 And I would say that’s maybe the number one question that we get from companies is how should I price?
0:27:03 How do you see other companies in this space pricing?
0:27:08 I think we’ve seen a few models that are starting to work or that people are experimenting with.
0:27:16 So the most obvious one is you just charge per minute so you can calculate an hourly rate for the voice agent similar to what you would pay a human.
0:27:18 There’s a couple maybe wrinkles here.
0:27:24 One would be a lot of these customers are informed enough to know that the underlying technology is getting cheaper.
0:27:33 So they will come to you and say, hey, why am I still paying $0.30 per minute when your costs have gone down and you’re probably just taking all of that in margin?
0:27:45 And then as these spaces get more competitive, it’s very easy then for a newcomer to come in and say, hey, I’m going to only charge $0.05 per minute and just undercut you based on that.
0:27:58 And then the other thing about the price per minute model is it really just puts your value as a platform solely on the phone calls, which again are commoditizing versus like the other software that you’re building around the phone call.
0:28:05 So I would say as a result of that, we’ve seen a lot of companies evolve from just doing price per minute to some sort of platform fee.
0:28:07 Maybe it’s per month.
0:28:12 Maybe it’s per module where the customer is also paying for things that they get in addition to the voice agent.
0:28:17 There’s been a few more creative pricing experiments we’ve seen as well.
0:28:29 The recruiting one is a good example where in these cases where the voice agent is a co-pilot to the human, you can almost charge per human that is using the voice agent, like a per seat SAS model almost.
0:28:36 So for a human recruiter, it might save them, I don’t know, 5, 10 hours per week of doing interviews.
0:28:41 And so you can charge $500, $1,000 per recruiter per month.
0:28:52 And then the last one and maybe the most experimental one is outcome-based pricing, which I feel like is a question across all of AI right now.
0:28:53 For sure.
0:28:55 And are we moving towards that version of the world now?
0:28:58 So maybe it’s $5 per appointment booked.
0:29:02 Maybe it’s 5% of the booking value.
0:29:10 If you get it right, obviously you are then tying your value most clearly to the value that you’re generating for the business.
0:29:23 But we’re interested to see how those scale for enterprises because I think a lot of enterprises are maybe nervous to commit to that kind of payment structure, especially if they’re not sure exactly what kind of volume they’re going to be driving through it.
0:29:26 So you’re seeing that last one kind of start to have legs, but some hesitation.
0:29:28 Start to have legs, but early.
0:29:33 I mean, I think similar to what we’ve seen in the SAS landscape, like not every company price is the same.
0:29:35 It depends on the end customer.
0:29:36 It depends on the vertical.
0:29:37 It depends on the features that you’re offering.
0:29:47 My gut is that we’ll see some combination of the usage-based per-call pricing combined with some sort of broader platform or outcome or seat-based pricing.
0:29:51 So it won’t just be one model, but it’s very early days still.
0:29:52 Yep.
0:29:55 Since we’re early days, what’s your instinct about moats, right?
0:29:58 That’s, as you mentioned, that’s true across the AI ecosystem, not just voice.
0:29:59 Yeah.
0:30:03 But where do you see moats potentially arising in this sphere?
0:30:05 I see moats in a couple ways.
0:30:12 So one would be integrations, and this is, I think, why we’re especially excited about these more vertically focused voice agents.
0:30:24 It’s not going to make sense for OpenAI to go integrate with every long-tail transportation management software that a freight company is going to be able to need to run their fleet of trucks on a voice agent product.
0:30:38 And similarly, UI, like OpenAI and other companies have a pretty set system for interaction right now that doesn’t work the way that many of these, like, heavily legacy businesses want to be able to operate.
0:30:48 One of the types of moats that has been the most intriguing for us, I would say, especially for enterprises, is the self-improving data moat.
0:30:56 So if you are going to take over calls for, say, a large bank, they have a certain way that they want those to be done.
0:31:01 And so you’re not going to plug in a voice agent and have 100% NPS on day one.
0:31:04 It’s going to take months and months of training calls to make that better.
0:31:21 And so you, as a voice agent provider, if you get in early, benefit from having all that special proprietary data that just gives you months of a head start for anyone else who has to come along and go through that entire onboarding and integration and training process.
0:31:38 And so I think the hope for a lot of these vertical voice companies is that they will be able to use the call data either per customer or anonymized across a customer set to make the model better and better over time, which will increase their modes versus the horizontal players.
0:31:45 If that’s true, are you seeing AI voice companies kind of race to be the first mover in the same way that we saw in the previous generation?
0:31:53 I mean, we talked about apps like Uber, where it’s like you have to get the customers quickly and you maybe have to blow a lot of cash to get there, but you rein that back in later.
0:31:54 Yeah.
0:31:57 Yeah, I mean, it’s certainly going to be less expensive than Uber to go win the market.
0:32:05 But yes, I mean, as Ben said many times, you have to both make a product people want and then you have to go take the market, get from zero market share to all the market share.
0:32:07 So it is incredibly competitive.
0:32:09 That’s why we’re seeing a lot of pressure on pricing.
0:32:12 And pricing is such an important topic in the ecosystem right now.
0:32:18 It will definitely be a foot race, and I do think to Olivia’s point, there will be some really interesting voice native moats.
0:32:30 You know, you could imagine a voice-led investor for our firm, where it can give the firm’s pitch the way that Mark can, and it can negotiate the way that Martine can, and it can assess the landscape the way Olivia can.
0:32:34 Like, there’s some specialization opportunities there that feel very native to voice.
0:32:40 On the other hand, integrations, network effects, scale, all the traditional moats will be at play as well.
0:32:43 Yeah. And I do think the go-to-market will depend on the vertical.
0:32:49 There’s, say, restaurants, home services businesses, spas or nail salons.
0:32:54 Those are very fragmented, long tail of smaller players.
0:32:58 And so in those cases, the data does exist in each of their hands.
0:33:06 Whereas, again, banks or financial institutions is maybe one where there’s a lot of concentration in a few players, one or two big customers.
0:33:09 And if it takes you six, nine months to get them on board, great.
0:33:19 Versus the salon, restaurant, home services voice agent provider might be much more focused on getting a thousand customers within the same time frame.
0:33:24 You know, I also think an interesting thing to think about is just people building personal relationships with AIs.
0:33:28 For example, like, you don’t have a relationship with J.P. Morgan.
0:33:28 Sure.
0:33:33 You sort of have more of a relationship with your wealth manager who happens to work at that firm.
0:33:33 Yep.
0:33:37 Which is why when many of them leave big platforms, they take their customers with them.
0:33:38 Realtor is another great example.
0:33:46 So there are cases where the AI may build this deep personal connection with a person, and the person wants to have that connection, and that then creates a moat.
0:33:47 It’s a great point.
0:33:52 And so far, we’ve talked a lot about B2B applications, but that brings us right to consumer applications.
0:33:58 Can we talk a little bit about what you’re seeing there, maybe the difference between what you’re seeing in B2B and B2C?
0:34:11 I would say B2B voice agents are more obvious than consumer or B2C voice agents, just because, again, it’s the use case of replacing existing spend on humans on the phone for businesses.
0:34:22 For consumers, maybe the corollary there would be these high-cost, hard-to-access services that can now be performed by a voice agent instead of a human.
0:34:25 So therapy and mental health support is one of those.
0:34:27 EdTech is another big one.
0:34:34 Language learning, teaching your kid how to read, teaching your kid how to do math, which I think a lot of parents struggle with.
0:34:37 Coaching, how to have hard personal conversations.
0:34:49 The main, I think, open question on the consumer voice agents have been when a ChachiBT or soon a Claude can do a pretty good job with a lot of those basic consumer use cases.
0:34:57 Where are the verticals or use cases where you need either a specialized model or a specialized interface to provide most of the value?
0:35:07 Especially if the best models maybe are right now being held by open AI versus being available via API for any kind of standalone voice agent company to utilize.
0:35:14 I would say the biggest and best consumer companies are often surprises and are non-obvious.
0:35:22 And so my gut is that whatever we see working in consumer voice is going to be something that is hard to sit here and speculate on.
0:35:24 It’ll be extremely obvious.
0:35:24 Yes.
0:35:26 And it’ll be like a massive company.
0:35:27 We’ll know it when we see it.
0:35:28 We’ll know it when we see it.
0:35:28 We’ll know it when we see it.
0:35:29 Exactly.
0:35:30 Yeah.
0:35:31 That’s a great point.
0:35:39 A few companies really do dominate the consumer space in terms of their access to people and the applications they use, the devices that are in their pockets.
0:35:47 What do you think in terms of the incumbents’ potential to capture this consumer market, whether it’s Google or Apple?
0:35:55 Or are we seeing that, you know, all of those YC companies or other companies that we’re involved with are really getting further ahead in this space?
0:35:56 I have a bit of a point of view on this.
0:36:08 Like, I think that the incumbents, it’s just such a daily demonstration of how far behind they are when you both have Google Home in your home and you’ve got ChatGPT in your pocket.
0:36:08 Yeah.
0:36:14 My children try to ask Google Home to tell them stories in the same way that ChatGPT does, and it just utterly fails.
0:36:22 And my children are, you know, their first interaction with technology, at least deep interactions, are happening via models, not via search engines.
0:36:29 So, one, I think that it’s just a sort of day-to-day experience of a lot of people is that the incumbents are pretty far behind in this area.
0:36:41 Then the second, I think we’ve talked a bunch about this, is that there are a lot of sort of, I don’t know, uncomfortable or impolite aspects of the human experience, which incumbents are just structurally designed to never discuss.
0:36:53 Corporations, sort of committees, lawyers, like, these big companies have a hard time shipping opinionated products, at least opinionated in the way that many of these voice models are.
0:36:55 And startups have no problem doing that.
0:37:04 Now, there are, you know, counterpoints to it like Grok, but I think that’s very much things that only a founder-led big company can do versus a traditional incumbent.
0:37:09 So, we have a reason to always be rooting for the startups, but in this case, I’m definitely rooting for the startups.
0:37:10 Yeah.
0:37:10 I agree.
0:37:19 I think there’s one or two categories or use cases where the calls have truly commoditized or will commoditize, and the user experience matters less.
0:37:21 And, like, Google might take those.
0:37:28 For example, they recently launched the ability to call a restaurant, get availability, and then come back to you and give you the options.
0:37:33 If you can add that as a button on a Google search, that probably makes sense to do through them.
0:37:40 But are they going to build the first AI-native personal assistant that works across all of your products and all of your information sources?
0:37:42 Probably not, I would say.
0:37:56 And so, I think that any and all of the calls that the incumbents end up doing, which will be some volume, are probably not going to be the type of calls that are going to support a large and exciting standalone new startup.
0:38:04 Yeah, and this is the pattern where they will use the new technology to extend their dominance of the categories they’ve always dominated, which is fine.
0:38:10 All of the new categories, they’re just going to be utterly unable to compete in, or at least that’s been the historic pattern.
0:38:16 And I think a good question is, if models are the new front end for the internet, is search even a meaningful primitive?
0:38:22 Are they going to then extend their dominance of a category that loses relevancy for the next generation of consumers and businesses?
0:38:22 Yeah.
0:38:34 And I think your point about even the term opinionated is so important here, because I would argue voice is a platform that we intuit to be more opinionated, or we need to be more opinionated than, let’s say, you know, text.
0:38:36 Because interesting people are opinionated.
0:38:45 And I’m even thinking, I mean, I might be going too far here, but some of the old KPIs that you would see for something like search or an application may not even be the same for voice.
0:38:48 Like, you can imagine the magic moment might be, like, time to laugh.
0:38:51 Like, how quickly can you get someone to laugh or to cry?
0:38:58 Not intentionally, but to really engage with a model, a voice model that just wouldn’t necessarily occur with text, so.
0:38:58 Yeah.
0:39:10 I think the average consumer would, in their head, like a Siri, doesn’t even compete with a ChatGPT voice mode or something like that, because they’re just such different feelings that you get as a user when you are using them.
0:39:16 I think the other interesting part of this is that there are cultures in which being a little disagreeable, a little sarcastic, is actually highly preferred.
0:39:17 Yeah.
0:39:19 And that’s the way that you are supposed to build trust and interact with people.
0:39:22 You know, I know that the British culture is a little bit like this way.
0:39:28 Even East Coast culture, you know, we were having a laugh a few weeks ago about we need ChatGPT voice East Coast mode.
0:39:28 Yes.
0:39:32 Where it’s just, like, very short, it doesn’t suffer fools.
0:39:33 It says no.
0:39:34 It says no, totally.
0:39:39 When you think about your friends, you don’t have friends, or some people do, but most people don’t have friends that are just at your service.
0:39:39 Yeah.
0:39:41 That there’s some banter, there’s some, they have an opinion.
0:39:42 Yeah.
0:39:49 This gets at what we’re looking for in voice companion products, but even any consumer voice agent, like, there has to be some friction.
0:39:56 If it’s, like, too easy to build the relationship, if they’re always saying yes to you, if they’re not giving you the brutally honest feedback, then it gets old quickly.
0:40:00 There’s no value for you as a consumer to just have a yes man or yes woman following you.
0:40:01 A yes model.
0:40:01 Yes.
0:40:02 Exactly.
0:40:04 Following you around all the time.
0:40:22 And so we actually get very excited by founders who are opinionated in how to build the voice agent as its own character, its own personality that the user is forming a bond with versus the voice agents we’ve had in the past where the user is treating them as a machine that they’re handing basic tasks to.
0:40:23 Right.
0:40:23 That’s right.
0:40:25 Trust has to be earned.
0:40:29 And if the models don’t design for that, they’re never going to get to their full potential.
0:40:30 That’s a great point.
0:40:41 Well, as we work toward those kind of products, is there anything you’d like to leave the listeners with in terms of what’s on the horizon, what you’re excited about, maybe also where you’d like to see founders direct their attention?
0:40:57 I think one of the things that has been really interesting, and maybe it’s just the standard tech platform shift, but we’re seeing founders that are maybe new to an industry, but spend a couple months going really deep, able to build the most powerful and highest growth and the highest inflection products.
0:41:01 And that’s just because I think the rules of the game are changing.
0:41:08 And the type and power of products you can build is also above anything that we’ve ever seen.
0:41:20 And so if you move quickly, in many ways, like shipping fast becomes the moat, and you can catch up on everything else, like the industry expertise, the networks, the knowledge base, the resourcing, all of that.
0:41:25 And so I would say that has been one of the areas where we get most excited.
0:41:39 Founders that maybe have only been in the industry for six months, a year, even less, but are becoming quickly opinionated about what they need to build, and probably most importantly, just building really quickly and testing, getting feedback, and going from there.
0:41:40 Yeah, so two things.
0:41:42 One, if you’re building the space, talk to us.
0:41:43 And you know, the weirder, the better.
0:41:53 And then two, a prompt that we’ve discussed with a lot of AI founders is just, what is the incredibly mind-bogglingly expensive version of your product?
0:42:01 So if you’re charging a lot of consumers $20 a month or $100 a month, like what would the $1,000 a month or $10,000 a month SKU look like?
0:42:03 I think the same is very true in voice.
0:42:15 Yes, there’s going to be high-volume use cases that we want to actually replicate or substitute voice AI models for, but what are the most sensitive, most precious, most high-value conversations that are happening in the enterprise?
0:42:15 Right.
0:42:19 And can you attack those, and what price would you charge for those?
0:42:21 Might be $100,000 in interaction.
0:42:26 Maybe that’s a little extreme, but as a product design sort of exercise, why not?
0:42:28 Yeah, it’s a great prompt to leave people with.
0:42:29 Thank you both so much.
0:42:30 Thank you.
0:42:31 Thank you.
0:42:35 All right, that is all for today.
0:42:37 If you did make it this far, first of all, thank you.
0:42:45 We put a lot of thought into each of these episodes, whether it’s guests, the calendar Tetris, the cycles with our amazing editor Tommy, until the music is just right.
0:42:52 So if you like what we’ve put together, consider dropping us a line at ratethispodcast.com slash A16Z.
0:42:54 And let us know what your favorite episode is.
0:42:57 It’ll make my day, and I’m sure Tommy’s too.
0:42:59 We’ll catch you on the flip side.

AI voice technology has been around for years — think Siri or Alexa — but the magic has been missing. That’s changing, and quickly!

In this episode, Anish Acharya, General Partner at a16z, and Olivia Moore, Partner at a16z, explore why AI voice is reaching a breakthrough moment, how today’s models feel more human than ever, and why voice is poised to become the primary way people interact with AI.

With businesses already making tens of thousands of AI-driven phone calls daily, AI-powered conversations are no longer a distant vision—they’re happening now. Whether it’s AI companions, customer service bots, or enterprise applications, voice tech is here—and it’s improving faster than anyone expected.

 

Resources:

Find Anish on X: https://x.com/illscience

Find Olivia on X: https://x.com/omooretweets
Read the report: https://a16z.com/ai-voice-agents-2025-update/

Listen to Raising Health’s episode on how voice AI is solving healthcare’s workforce challenges : https://a16z.com/podcast/voice-ai-solving-healthcares-workforce-challenges-with-ankit-jain/

 

Stay Updated: 

Let us know what you think: https://ratethispodcast.com/a16z

Find a16z on Twitter: https://twitter.com/a16z

Find a16z on LinkedIn: https://www.linkedin.com/company/a16z

Subscribe on your favorite podcast app: https://a16z.simplecast.com/

Follow our host: https://twitter.com/stephsmithio

Please note that the content here is for informational purposes only; should NOT be taken as legal, business, tax, or investment advice or be used to evaluate any investment or security; and is not directed at any investors or potential investors in any a16z fund. a16z and its affiliates may maintain investments in the companies discussed. For more details please see a16z.com/disclosures.

a16z Podcasta16z Podcast
0
Let's Evolve Together
Logo