A Big Week in Tech: NotebookLM, OpenAI’s Speech API, & Custom Audio

AI transcript

0:00:05 There’s elements of it that are almost similar to early chat GBT.
0:00:12 Anyone who’s now building a conversational voice product can have access to that level of conversational performance.
0:00:18 The majority of people may experience AI for the first time is actually going to be via the phone call.
0:00:27 We’re taking the oldest and most information-dense of all of our mediums of communication and finally making it almost programmable.
0:00:31 Phone calls are kind of this API to the world.
0:00:39 Within a couple weeks of deploying their voice model, they’d had three million users do 20 million calls.
0:00:42 Last week was yet another big week in technology.
0:00:50 For one, Nobo Gelam, Google’s latest sensation, has been making its way across the Twitterverse with its new audio overview feature.
0:01:00 The feature uses end-user customizable RAG, which basically means that people can create their own context window for generating surprisingly good podcasts across 35 languages.
0:01:11 And to add to the voice mix, OpenAI held their developer day and announced their real-time speech-to-speech API, enabling any developer to add real-time speech functionality to their own apps.
0:01:16 Plus, they noted a whopping three million active developers on the platform.
0:01:28 Finally, we saw one video model company, Pika, break through the AI noise with their 1.5 model, giving us fodder to discuss what is really required to capture attention in 2024 and beyond.
0:01:36 Today, we discuss all that and more with A16Z Consumer Partners, Olivia Moore, Brian Kim, and General Partner Anish Acharya.
0:01:44 This was also recorded in two segments, one with Olivia and another with all three partners, so you’ll hear us pivot between the two.
0:01:50 Plus, Anish actually predicted that this would be the year of voice, despite it never historically working as an interface.
0:01:58 In fact, Microsoft CEO Satya Nadella even previously called the past decades generation of assistants “dumb as a rock.”
0:02:01 Well, it certainly seems like returning a corner.
0:02:04 Let’s get started.
0:02:13 As a reminder, the content here is for informational purposes only, should not be taken as legal, business, tax, or investment advice, or be used to evaluate any investment or security,
0:02:17 and is not directed at any investors or potential investors in any A16Z fund.
0:02:23 Please note that A16Z and its affiliates may also maintain investments in the companies discussed in this podcast.
0:02:34 For more details, including a link to our investments, please see a16z.com/disclosures.
0:02:41 Another big week in tech, I think the biggest thing I’ve seen is Notebook LM, so just quick recap for the audience.
0:02:46 Google is kind of known for these side quests becoming main quests, and this product actually has been around for a while.
0:02:58 It originated in 2023, but its new audio overview feature has been taking over Twitter with these AI-generated podcast hosts, which are surprisingly good.
0:03:01 And I’m saying that as a podcast host who has this job.
0:03:10 And so basically what people can do is they can drop in their own information in a context window, and then it’ll use that to spin up these podcasts.
0:03:13 Olivia, you’ve actually tried these out, right?
0:03:17 Yeah, so I think it originated as something for researchers or academics.
0:03:24 The idea was that you would store all of your notes, all of your papers, all of your information within this Google workspace.
0:03:31 And then this new feature that they’ve added is these two AI agents essentially that play the role of podcast hosts,
0:03:36 and they go back and forth talking about the data, asking questions, getting into examples.
0:03:49 The thing that’s really interesting to me about it going viral in the past week or so has been there’s actually nothing that feels incredibly new or even incredibly in some ways cutting edge about it.
0:03:55 Like it’s not the open AI brand new real-time model that cuts voice latency down to almost nothing.
0:04:04 In fact, with Notebook LM, you have to wait three to five to sometimes ten minutes for them to generate the episode once you click the button.
0:04:12 I think what’s really striking about it is the realism and the humanness of the voices and then also how they interact with each other.
0:04:16 Yes, those fellow words, the intonation, the interruptions.
0:04:18 Exactly, they disagree with each other, they interrupt each other.
0:04:21 Like this is not just upload a script and get a read out.
0:04:24 It does feel like two human beings talking.
0:04:34 And to that point, the other kind of striking thing about it is it’s not just repeating or summarizing the points that you upload in whatever data sources.
0:04:38 They’re actually answering and asking really interesting and deep questions.
0:04:46 They’re making comparisons, they’re making analogies, they’re taking it a step deeper of almost like how would you teach someone about this topic?
0:04:58 I uploaded basically a bunch of true crime court case filings and it did a podcast about the case and then it spent the last two minutes diving into the ethics of why are we entertained by true crime?
0:05:03 Should we be using this information to create media, things like that?
0:05:07 So it’s really kind of like a next level interpretation of the content, I would say.
0:05:14 Totally, I’ve seen so many examples of this, someone uploaded just their credit card statement and they were able to grill them on that.
0:05:20 Even that, I don’t think the grilling was prompted per se, it was like, just talk about this, find something interesting within this.
0:05:25 Yeah, there has to be some sort of very creative LLM or something behind the scenes.
0:05:30 One of the other youth cases I loved was someone uploaded their resume and their LinkedIn profile.
0:05:38 And it made like an eight minute podcast describing them as this incredible legendary mythic figure and going over all the high points of their careers.
0:05:45 I really like that because I see some people using some of the music LLMs and then using them for let’s say a really nice birthday song.
0:05:55 And so when you played with notebook LLM, was it the kind of thing where sometimes you’re on, let’s say, Dolly or Mid Journey and you’re like, oh, it’s not quite what I want and you’re just playing the AI slot machine?
0:06:01 Was it like that? Or was it first shot? I’m getting exactly the kind of podcast I was hoping for.
0:06:14 It’s a little bit slot machine in that the output is different every time, but I would say it’s a lot more reliable in that almost every generation that I would do something would be interesting, it would be on topic, it would be usable.
0:06:22 One example I did, I got very into it. At first I was sticking to uploading academic papers, I was like, I’m going to use this for its intended purpose.
0:06:27 And then one of my generations, I was like, the hosts, they sound like they’re flirting with each other, right?
0:06:36 Yes. They have such good chemistry. And so I was like, what would happen if I upload literally a one sentence document that’s like, I think you guys are in a secret relationship?
0:06:44 And they went off on like a two to three minute podcast that sounds, I swear, like the meat cut in a romantic comedy or something.
0:06:52 It’s incredibly emotionally compelling, I would say. And so now my vision, I have to do like a full audio drama, then we have to end it on Netflix.
0:07:01 Exactly. It’ll be like the first fully AI avatar movie using the voices inspired by the notebook, LM characters.
0:07:05 This one’s about AI, but like AI and relationships.
0:07:05 Really?
0:07:12 Yes, specifically AI that are like hosting a show like us.
0:07:12 Interesting.
0:07:15 And Google’s notebook, LM environment.
0:07:16 Oh, wow.
0:07:19 So like, could we be secretly dating?
0:07:20 Exactly.
0:07:20 That’s wild.
0:07:22 That’s what the document asks.
0:07:26 Someone thinks we’re giving away like secret love notes to each other through our banter.
0:07:29 Well, what was the end? Did they agree?
0:07:33 I mean, you have to listen to it and get your take.
0:07:37 What if those AIs, you know, actually developed feelings for each other?
0:07:38 Like real feelings.
0:07:39 Yeah, exactly.
0:07:43 So it’s like you’re saying two lines of code could fall in love over a spreadsheet or something.
0:07:44 That’s the idea.
0:07:44 Yeah.
0:07:47 It’s kind of wild, but also kind of, I don’t know.
0:07:48 I know, right?
0:07:49 Intriguing.
0:07:55 And so given that you have played around with it and that a lot of the feedback is really good
0:07:59 and people are pleasantly surprised by this, what’s your reaction?
0:08:02 Because like you said, there are products like this out there.
0:08:06 I mean, with AI, there are so many trends as we’ve seen, like products that get really hot one week
0:08:08 and then something more interesting comes along.
0:08:10 Could be me just being optimistic.
0:08:12 It feels like there’s something here.
0:08:16 And I hate to make this comparison, but there’s elements of it that are almost
0:08:18 similar to early chat GBT.
0:08:22 In that one, it’s really usable even for people who aren’t academics,
0:08:24 people who don’t know that much about prompting.
0:08:29 Anyone can upload a paper and kind of generate a podcast.
0:08:35 The other thing that feels chat GBTS is like people are already pulling it quote unquote off label.
0:08:39 And maybe it’s not notebook LM itself that becomes the winning product.
0:08:40 We’ll see.
0:08:43 I think there’s a lot Google could do to extend this more.
0:08:44 They could make it a mobile app.
0:08:46 You could customize the voices.
0:08:51 I could see it being used for kid bedtime stories if they tweaked it a little bit.
0:08:57 But I think something about the format of personalized podcasts or personalized audio is going to happen.
0:09:03 Some of the experiences or the podcast being generated are no doubt impressive,
0:09:06 but also feel a little maybe gimmicky or like cool once.
0:09:11 But is this really something that you can see evolving into something practical useful?
0:09:16 I for one can see it actually becoming a real product because right now it’s doing podcasts,
0:09:16 for example.
0:09:22 But over time, it may be easier to add avatars or videos as backdrop of what they’re talking about.
0:09:26 And that becomes basically a short from YouTube video that is very personalized.
0:09:30 So one of the fun examples was like kids love Minecraft.
0:09:34 I love Minecraft when there’s like a new bedrock edition that drops,
0:09:36 and there’s like a release notes that are pages and pages long.
0:09:40 And kids rely on YouTube to figure out what’s new, like what changed.
0:09:44 If you drop the release note into notebook LM and just say, tell me what’s new
0:09:46 and tell it in a way that kids love.
0:09:51 And then it generates this 20 minute or 10 minute back and forth of a,
0:09:53 can you believe this new update?
0:09:55 It allows this character to fly.
0:10:00 But those are the type of things that actually becomes really interesting in a everyday use case.
0:10:04 It makes me want to have like a digital diary or something where you can upload it.
0:10:07 And then it gives you like a podcast of like how the last month of your life has been.
0:10:08 Oh my God.
0:10:13 Because the innovation is less like a new medium and more how they’ve really unlocked
0:10:18 something to your point around how to make any topic exciting and generate insights
0:10:21 and make it something that you really want to listen to and spend time on.
0:10:23 Potentially unlimited outputs.
0:10:24 I totally agree.
0:10:27 It could be videos, it could be avatars.
0:10:31 The interesting thing about that is I’d always thought of it as you can read something,
0:10:36 watch something or listen to something, but maybe a nuance of listening is listening to it
0:10:37 in conversation format.
0:10:41 I do think there’s something really magical about this, just the two hosts going back and forth on the top.
0:10:42 Yes.
0:10:45 There was a TikTok I saw yesterday that had 2 million likes,
0:10:46 completely organic.
0:10:50 And it was a law school student who was studying for her midterm and she had uploaded like,
0:10:52 I don’t know, 60 pages of lecture notes.
0:10:56 And then it generated a 12 minute podcast for her to review before the exam.
0:11:01 If you even hear another human being telling a story around an example or a case,
0:11:04 it makes it so much easier to remember and understand.
0:11:06 You’re basically opening up another lane, right?
0:11:09 Because you can read something as you’re listening to something,
0:11:12 as you’re immersed in something else in the real world.
0:11:16 Maybe another thing to talk about is OpenAI’s Dev Day.
0:11:21 They released a lot, but maybe the highlight point was this real-time speech-to-speech API.
0:11:25 Anish, I know you’ve thought a lot about this idea that real-time really matters
0:11:29 for speech and that latency is almost like a metric that we’re going to hear a lot more about.
0:11:30 Yeah.
0:11:35 There’s a threshold above which voice doesn’t really work as a modality to interact with the
0:11:37 technology because it doesn’t feel real.
0:11:41 And below that threshold, which is maybe three or 400 milliseconds,
0:11:43 sort of holds the illusion of talking to a person.
0:11:46 Phone calls are kind of this API to the world.
0:11:51 So it feels like the way that the majority of people may experience AI for the first time
0:11:53 is actually going to be via the phone call.
0:11:55 And that is unlocked by this real-time technology.
0:11:58 And the crazy thing is like so much still runs on the phone system.
0:11:59 Absolutely.
0:12:01 Even if you just think about one vertical, like healthcare,
0:12:04 it’s like taking incoming calls from patients.
0:12:07 It’s like doctors calling other doctors, calling pharmacies, insurers.
0:12:10 So if we think about how this becomes more real-time,
0:12:13 are there different applications that you think are unlocked,
0:12:15 like let’s say music, education?
0:12:18 How does real-time voice maybe change some of those industries?
0:12:20 Most of the EdTech products we’ve seen so far have been like,
0:12:23 you attempt a homework problem, maybe then you take a screenshot,
0:12:26 you upload it to an AI product, it tells you if it’s right or not.
0:12:31 And now with real-time, both voice and some of the video and vision model stuff,
0:12:33 it’s actually almost like having a tutor sitting next to you,
0:12:36 going through it with you, even with some of the vision stuff.
0:12:38 Show it your piece of paper.
0:12:43 So now it’s like AI is moving towards actually helping you learn versus
0:12:49 a lot of the use cases so far have been maybe cheating adjacent in like,
0:12:50 how do I just get to the answer?
0:12:52 Now what is your process?
0:12:53 That’s actually really interesting.
0:12:57 You’re basically saying that in a way the lack of latency allows
0:13:03 for people to integrate in that moment and in the past maybe because there was more latency,
0:13:06 people took shortcuts because they didn’t want to wait.
0:13:09 Or if it’s with you, it can say here’s the way you’re doing it.
0:13:13 Here’s another way actually that might make more intuitive sense for you to solve this math problem.
0:13:18 It’s going along the journey of understanding with you versus just being kind of answers or
0:13:21 outcome base, which a lot of the AI products have been historically.
0:13:27 What’s really interesting about that is that there’s a sort of design language or design queues
0:13:29 that are already built into conversations.
0:13:33 So interrupting is one or the sort of uh-huh, uh-huh is another.
0:13:38 So that actually should unlock much more interesting product experiences as well because,
0:13:40 and of course the latency is necessary for that.
0:13:44 But so is the ability to even understand these parts of sort of, I don’t know,
0:13:48 they’re not quite nonverbal, but they’re not also a part of the explicitly spoken
0:13:48 language.
0:13:53 A lot of products, especially in consumer, it’s not just about being optimal per se
0:13:54 or perfect, right?
0:13:57 In fact, what a lot of people are commenting on when you see the notebook LLM examples,
0:14:02 it is the filler words, it is the interrupting, it is the imperfections that people are drawn to.
0:14:08 This is a big step forward and for anyone who tried to use the chat GBT voice mode before,
0:14:12 essentially you would press a button, you would say something,
0:14:16 the LLM would pause, it would interpret it, it would generate something to say back,
0:14:19 and then it would return an answer, but it’d take at least a couple seconds.
0:14:21 It was very buggy, it was very glitchy.
0:14:25 It was more like sending a voice memo, having someone hear it,
0:14:29 and send back a voice memo than having an actual live conversation with a human.
0:14:36 And so the new model is truly more like almost zero latency, full live conversation.
0:14:42 This has been available through chat GBT’s own advanced voice mode, which people are using
0:14:46 and loving. But what happened this week at Developer Day was they’re essentially making
0:14:52 that available via API for every other company. So anyone who’s now building a conversational
0:14:58 voice product can have access to that level of conversational performance, which is huge
0:15:03 and really exciting because it brings a lot of AI conversation products from barely workable,
0:15:08 not really workable to suddenly extremely good and very human like.
0:15:12 Yeah, totally. You had a tweet that said this is a massive unlock for AI voice agents.
0:15:16 I’m expecting to see a lot more magical products in the next few months.
0:15:20 We’re quickly leaving the era of latency and conversational experience being a blocker.
0:15:22 Can you speak just a little more to that in particular?
0:15:29 Yeah, absolutely. Many of their AI voice products didn’t really feel even SMB caliber in terms of
0:15:35 quality, let alone maybe like an enterprise could actually deploy this. So now it is,
0:15:40 I think arguably enterprise quality in terms of real companies being able to replace humans
0:15:45 on the phone with an AI on the phone. We’re seeing this for all sorts of use cases.
0:15:51 The most obvious is maybe having someone answer the phone at a pizza shop to take orders or at a
0:15:56 small business to book nail appointments, all the way to things that are a lot more complicated,
0:16:01 like even doing interviews, first round interviews with AI, which is crazy to think about,
0:16:08 but it’s happening, or even more kind of vertical specific use cases like freight brokers spend
0:16:13 all day on the phone calling carriers, calling truckers and trying to find someone to haul a
0:16:20 load in a certain price range. Now you could do that with an AI that can call 100 carriers at once
0:16:25 and negotiate the price instead of having a human being do those calls sequentially all day.
0:16:29 This new API, and there’s other open source attempts at the same type of model,
0:16:33 is really going to allow those products to shine. Yeah, and some of the products you’re
0:16:38 describing are kind of voice first, but many of the apps that we’ve had to date
0:16:42 are typically not voice first, perhaps because we actually haven’t had the technology.
0:16:48 And so I want to refer to Anisha’s big idea at the end of 2023, which right now feels very
0:16:53 accurate. He was right on. Yeah, it said that voice first apps will become integral to our lives,
0:16:58 and he basically says that despite voice specifically being the oldest and most common form of human
0:17:03 communication, it’s never really worked as an interface for engaging with technology.
0:17:08 It feels like voice is one of the biggest things that’s being unlocked by AI. Voice is the easiest
0:17:13 content to create, and we’re all creating audio all day every day, essentially, but that content
0:17:20 has never really been captured or used or automated in some ways. Like now, even outside of real time,
0:17:25 there are so many products that will listen to your meeting and will hear you say something and
0:17:32 can automatically slack someone with a follow up or use it to trigger a commit and GitHub or a task
0:17:37 on a sauna that your team has to follow up on. And so I think what we’re seeing now, both real
0:17:42 time voice and non real time voice is we’re taking the oldest and most information dense of all of
0:17:48 our mediums of communication and finally making it almost programmable and usable in a really
0:17:54 powerful way. The one thing I think we didn’t quite predict when we were forecasting voice for
0:18:01 this year was that it’s really, really been working for B2B and not as much on consumer quite yet.
0:18:07 We’re getting there. I think on B2B, even thinking about the voice agents, a lot of businesses are
0:18:12 struggling to find people to answer the phones for all sorts of roles or struggling to retain them.
0:18:17 It’s expensive. And so it’s supernatural to plug in an AI that can perform at similar quality.
0:18:22 The consumer use cases are a little bit less obvious. It’s probably worked the most in companion
0:18:28 so far. So again, chat, GBT advanced voice mode or character AI, I think they announced within a
0:18:34 couple weeks of deploying their voice model, they’d had three million users do 20 million calls.
0:18:40 Really? Yes. Wow. Because if you’re spending hours each day anyway, talking to this companion,
0:18:46 giving it a voice and making it more real makes a lot of sense. So that to me was like the shining
0:18:53 star of voice so far. OpenAI did highlight two other use cases on developer day in consumer.
0:19:00 And both of them were actually these kind of high touch, expensive human services almost
0:19:06 that are now democratized with AI. So one of them is a company called Speak that does language
0:19:11 learning. This might be controversial. I love Duolingo as a product. I love it as a brand,
0:19:15 but I think it’s hard to use it to learn a language end to end because it’s just limited
0:19:20 as an interface. So if you really want to learn a language, you might have to pay someone, I don’t
0:19:26 know, $50 to $100 an hour to tutor you. And so the idea of Speak is you have an AI voice agent
0:19:30 that is essentially your language tutor, and it’s much more accessible and affordable. So that was
0:19:36 one. And then the second one they highlighted was what if you had a nutritionist via AI? So this
0:19:40 is a product called Healthify where you can send in photos and then talk live about what you’re
0:19:45 eating every day in your diet. So I think we’ll see more of those use cases unlocked with better
0:19:50 voice model. Yeah, I need that. I’ve been saying for a while, I didn’t think of it specific to
0:19:54 voice, but that I need an AI to just call me out on my BS to be like, yeah, these are your goals,
0:19:58 you said you were going to run like, yeah, you didn’t do the things that you said you were going
0:20:04 to do. But also what you’re describing use the Duolingo versus Speak example. But in Anisha’s
0:20:09 prediction, he also talks about how, yes, some of these big companies are going to integrate these
0:20:14 APIs or integrate this technology. But Gmail probably still going to look like Gmail. And so
0:20:18 how do you think about that balance between the incumbents utilizing this technology and then
0:20:23 what’s going to sprout that’s completely new. It’s really interesting and something that we watch
0:20:26 really closely and consumer in particular, because you would think that the Google’s the
0:20:31 Microsoft’s have all of your data, they have all of your permissioning, there’s a lot that they
0:20:37 could do. I think what we’ve seen is they’re structurally in some ways disadvantaged in building
0:20:42 towards this AI shift in a really native way. One, it’s like, these are big companies now,
0:20:46 they have a lot of people, they have a lot of competing priorities. And then the second thing
0:20:53 would be, in some ways, they would cannibalize their own products, like RV has been Google is
0:20:59 likely to maybe add AI to augment Gmail, but are they likely to create the AI native version of
0:21:05 Gmail that you could only conceptualize in the past three to six months, probably not just because
0:21:10 again of how big of a company they are and the fact that they have so much riding on the continued
0:21:16 success of the existing product. A good example of this is actually Zoom added transcriptions.
0:21:22 Are people using that? Yes, but there’s also been a ton of products that are independently
0:21:27 successful in doing AI meeting notes. And those largely are building towards more specific and
0:21:32 opinionated workflows for different types of jobs or tasks. And it’s just something that Zoom is
0:21:36 never going to do because they’re such a broad based platform. Talk about a completely new
0:21:40 platform like Imagine Zoom, but it’s asynchronous. Yes, right. They’re never going to build that
0:21:46 to a point because they’re inherently synchronous. Clearly open AI is investing in voice, right?
0:21:51 And that’s not necessarily a given, right? If you think about, they also do imagery,
0:21:56 they haven’t really talked about Dali in a while, right? They also do video. Sora came out a little
0:22:01 while ago, but there really seems to be this voice push despite them operating across modalities.
0:22:03 Is that a signal people should be paying attention to?
0:22:08 I think so. I think we’ve seen already almost, even though it’s still so, so early, like
0:22:15 eras of AI so far. Creative Tools was the first era and still a massive era. And I think we saw
0:22:20 a ton of investment in image generation, video generation, music generation, much of which
0:22:28 is still happening, especially it feels like as AI moves from pure consumer use cases into more
0:22:35 kind of controllable, highly monetizable enterprise use cases, it does feel like voice is kind of a
0:22:42 unique unlock in that it’s a real game changer for companies in particular to be able to capture
0:22:49 and utilize this audio data that they never had before. Maybe another thing worth talking about
0:22:54 here from Dev Day is that they announced that they have three million active developers in the
0:23:01 ecosystem and they tripled the number of active apps in the last year. Since you’ve been studying
0:23:06 consumer for so long, maybe ground the audience and how much quicker is this happening per se
0:23:11 than, let’s say, the app era when Apple released its app store. How long did it take for three
0:23:16 million active developers to be building on it? And just how big is that kind of number today?
0:23:18 Yeah, that’s a great question. I have no idea.
0:23:23 As you were asking the question, I was like, do I know that for app store? I’m like, I do not.
0:23:25 Well, it took, I assume, years.
0:23:29 Three million developers, that’s incredible. Like my math was like, look, I don’t know the
0:23:34 app store number, but let’s say each developer has a ability to, I don’t know, like maybe reach out
0:23:39 to hundreds or thousand unique users. That’s sort of how I think about basically the reachability
0:23:43 of what they’re building. I think the other question is like, what is the revenue per developer in
0:23:49 the app store and is that a proxy for an AI? Yeah, that’s super interesting. There is a data that I
0:23:55 think I put out where you look at, it’s not necessarily the app store ones, but it’s the SaaS,
0:24:00 like historical SaaS companies versus JNAI companies and how the JNAI companies
0:24:07 are reaching a scale of revenue way faster than their SaaS counterparts. It’s very interesting.
0:24:12 Yeah, I think a big part of that though is because JNAI is so well set up for consumption revenue.
0:24:17 And so many SaaS businesses are SaaS. They’re like, you pay a fixed fee for the service monthly.
0:24:21 And with a lot of these new businesses, you’re paying on a consumption basis.
0:24:25 You’re also pricing it as a subset of labor costs, which are traditionally priced far
0:24:29 higher than software costs. I think that’s like a far more compelling argument for why the revenue
0:24:36 ramp is much faster versus I think the reason why the report said was because the JNAI companies
0:24:42 require training costs upfront, therefore they’re imperative to make money as higher than SaaS,
0:24:47 which maybe, but we know the ones that are making money aren’t necessarily incurring a huge training
0:24:52 cost upfront. Much more likely as they’re replacing labor costs or it’s just so useful or so unique
0:24:58 that the willingness to pay is just higher. For sure. I might buy that argument in consumer in
0:25:03 that the willingness to pay of consumers is way higher post-JNAI than pre-JNAI, so maybe,
0:25:08 but for SaaS, I mean SaaS businesses have always existed to make money. But the developer community
0:25:12 at 3 million people are actively developing on it today based on how old this is platform.
0:25:16 Like that is incredible. Yeah, I also think I’m seeing so many people who wouldn’t have previously
0:25:22 called themselves a developer or creating just really small apps or even using the API for themselves
0:25:26 in a way that if we use the parallel of the app store in the past, you weren’t really creating
0:25:30 an app for yourself back in the day. That was like the barrier to entry for that would just be too
0:25:35 high and it just wasn’t on many people’s radars. You know, the story of a lot of productivity
0:25:41 in prosumer companies is enabling app creation. Like Notion is a big app platform. Actually,
0:25:45 people have created these like daily habit tracker apps and a bunch of other things in the Notion
0:25:49 app store. Yeah, agencies built on top, yeah. Totally, yeah. Airtable, obviously this product’s
0:25:53 like retool, but there’s a lot of people who have been or at least this like latent demand to make
0:25:58 apps, especially for people that are non-technical in a business context or a hobbyist context.
0:26:04 And I think the AI, I know the AI thing is really unlocking it. Yeah. The app store example is a
0:26:09 very good one because we’re seeing this maybe fragmentation in a positive way of the types
0:26:15 of developers that are building on open AI models. There’s literally people who we talk to who are
0:26:20 like, I’m never going to raise venture funding. I am printing cash basically. I’m making a million
0:26:26 or two million dollars a month off of this. Not always thin, sometimes very sophisticated kind
0:26:31 of products that targets maybe a really specific use case. So we see that and that could be an
0:26:35 open AI developer, but also we could see a developer who’s, no, I’m going to build a $50
0:26:41 billion company utilizing or fine-tuning these models. So similar to the app store, we saw
0:26:46 a big range of people who are like, I’m just going to be a solopreneur making an app, too. I’m going
0:26:53 to build a generational business on top of the app store. Maybe the difference to me here so far has
0:26:59 been kind of like, as with everything in AI, the slope of the curve or the speed of ramp. I don’t
0:27:04 think we often saw, especially in the early days of an app store, solopreneurs making millions of
0:27:09 dollars a month. That’s something that has been very uniquely enabled by AI. Yeah. And you see this
0:27:16 overlapping with the code LLM space, right? You’ve gone closer and replete and all of these tools that
0:27:21 allow people who couldn’t code before to become a developer. Totally. Yes. You don’t have to be a
0:27:26 developer or a designer or there’s so many skill sets now that you can abstract away to AI as long
0:27:32 as you have good taste and good ideas. That tooling did not exist in the app store era and now exists
0:27:38 in the AI era. Well, maybe to that end, clearly there’s a lot of building happening. And we’ve
0:27:43 talked about this before, but I’d love to talk about the playbook, right? Because you’re going to
0:27:48 build something within AI. It’s more competitive than ever to get that attention. And so maybe one
0:27:54 frame for us to talk about that against is Pika’s launched 1.5 this week. And I just saw so many
0:28:00 meme videos. It was so viral. People squishing things and inflating things, right? Taking a meme
0:28:06 and distorting it. Exactly. It was actually really fun. So in a pretty intuitive way, I understand
0:28:11 why that kind of model went viral. But we are getting to the point where is there fatigue when
0:28:15 someone releases a new model? I’d love for you to just maybe break down what you might call the
0:28:22 anatomy of a successful launch in this world. If you think about video as a category, when Sora
0:28:29 first came out with their examples, minds were blown. Yes, minds were blown. And I think that
0:28:35 became this like front of mind of, oh my God, you can create and generate videos. Now, the interesting
0:28:40 thing about video is that it’s not all created equal, right? There’s a character centric video,
0:28:46 and then you have more of a scene generation video. What is happening in the scene, the content density
0:28:52 of the video always mattered, right? Slow motion movement of the scene is video, but it’s a lot
0:28:58 less interesting. Cat walking around a garden. Interesting, but cat’s moving. Cool. What we’re
0:29:05 seeing now is these products are becoming a lot more opinionated and a lot more specific, if you
0:29:11 will. So we talked about Pika, but you also have the likes of Vigil, where it’s templatized of what
0:29:16 you can do where little yachty, like dance, walkout scene. That’s very opinionated, like it’s not any
0:29:22 video. It’s a very specific movement and scene where you’re putting yourself in. Pika’s the same
0:29:28 thing where all the sort of templates that are going viral are you take a specific object in the
0:29:35 video and you’re modulating it, whether you’re squishing it, blowing it up, like inflating it,
0:29:41 floats away. It’s sort of unexpected. It is unexpected what’s happening in the video, right?
0:29:46 It’s not a cat walking and oh, it’s at point A and might go to point B. How interesting. You don’t
0:29:53 expect the meme guy looking at another woman to actually be squished in a picture. You don’t expect
0:29:58 all these different meme characters. We’ve blown up all of a sudden. And I think that unexpectedness
0:30:04 is sort of the next evolution of what’s happening. Yeah. I mean, one thing that was really interesting
0:30:11 there is there’s a subset of things that people expect from video and with AI, it’s not enough
0:30:15 to just give people that. Or maybe there is some subset if you’re creating a stock video company,
0:30:20 that’s one thing. But in order to go viral, in order to garner attention in this very busy world,
0:30:25 you need some sort of not known quantity. An opinionated point of view on what that
0:30:30 should be, right? They could have easily said, oh, like we want video to be longer because that’s
0:30:34 hard. That’s really hard. Like 30 second video with some consistency in the scenes are difficult
0:30:39 things to do. They could have done that. But instead, the team decided, you know what, we’re
0:30:44 going to pick like objects in the scene and do weird stuff with it. Do you think that’s required
0:30:50 now to basically design around some sort of viral element? I think if there has been a large
0:30:56 shocking development in the underlying modality, again, video with Sora type, like you do need some
0:31:04 unexpected element of, again, opinion to garner attention, or the quality just needs to be
0:31:09 order of magnitude better, not just 20% better, but much better than I think you get attention.
0:31:14 But that’s the underlying tech stack evolution, which I think will continue to see as well.
0:31:18 So I wouldn’t say it’s like a playbook of the only way to do it is to come with wacky,
0:31:23 like very attention grabbing things. There’s of course the underlying technical evolution
0:31:26 that will continue to sort of push the boundary forward.
0:31:35 All right, that is all for today. If you did make it this far, first of all, thank you.
0:31:38 We put a lot of thought into each of these episodes, whether it’s guests, the calendar
0:31:44 Tetris, the cycles with our amazing editor Tommy until the music is just right. So if you’d like
0:31:50 what we put together, consider dropping us a line at ratethespodcast.com/a16z. And let us know
0:31:56 what your favorite episode is. It’ll make my day, and I’m sure Tommy’s too. We’ll catch you on the flip side.
0:32:00 [Music]
0:32:10 [BLANK_AUDIO]

Last week was another big week in technology.

Google’s NotebookLM introduced its Audio Overview feature, enabling users to create customizable podcasts in over 35 languages. OpenAI followed with their real-time speech-to-speech API, making voice integration easier for developers, while Pika’s 1.5 model made waves in the AI world.

In this episode, we chat with the a16z Consumer team—Anish Acharya, Olivia Moore, and Bryan Kim—about the rise of voice technology, the latest AI breakthroughs, and what it takes to capture attention in 2024. Anish shares why he believes this could finally be the year of voice tech.

Resources:

Find Olivia on Twitter: https://x.com/omooretweets

Find Anish on Twitter: https://x.com/illscience

Find Bryan on Twitter: https://x.com/kirbyman01

Stay Updated:

Let us know what you think: https://ratethispodcast.com/a16z

Find a16z on Twitter: https://twitter.com/a16z

Find a16z on LinkedIn: https://www.linkedin.com/company/a16z

Subscribe on your favorite podcast app: https://a16z.simplecast.com/

Follow our host: https://twitter.com/stephsmithio

Please note that the content here is for informational purposes only; should NOT be taken as legal, business, tax, or investment advice or be used to evaluate any investment or security; and is not directed at any investors or potential investors in any a16z fund. a16z and its affiliates may maintain investments in the companies discussed. For more details please see a16z.com/disclosures.

A Big Week in Tech: NotebookLM, OpenAI’s Speech API, & Custom Audio

Leave a Reply Cancel reply