We Tested 2025’s Most REALISTIC AI Voices | The Results…

AI transcript
0:00:05 Hey, welcome to the Next Wave Podcast. I’m Matt Wolfe. I’m here with Nathan Lanz. And
0:00:12 today we’re going to talk about AI voice technology. This is technology that’s been kind of flying
0:00:19 under the radar. And before we even knew it, it has gotten scarily good. And I want to
0:00:24 put emphasis on scarily. It has gotten so good that it is actually starting to scare
0:00:28 us. And in this episode, we’re going to break down some of the tools that are available,
0:00:33 some of the use cases that you can actually use them for. And Nathan’s going to give you some
0:00:40 demonstrations and show some examples of how it might actually end up in your own home
0:00:47 sometime this year. This is absolutely wild stuff. So let’s just dive right in. Nathan,
0:00:52 where do you think we should start with this one? I think we’d sleep with a sesame. I mean,
0:00:56 I tried out sesame a few days ago, maybe once a week ago. Now I hadn’t even heard of them. Like,
0:01:00 it’s like crazy. They were like already raised like money from Andreessen Horowitz and Spark Capital.
0:01:05 And it came out and it was like kind of silent on X. Like people shared it and you didn’t hear much
0:01:08 about it. I heard one person say, Oh, this is amazing. I was like, Oh, okay. But I haven’t
0:01:13 heard anyone else talk about it. What is this? And I clicked it and started talking to it. And it kind
0:01:17 of freaked me out. Yeah. I thought I was gonna have this moment with the advanced voice mode after we
0:01:22 saw the demo from open AI of how good their voice was in the demo. It was amazing. And then they kind of
0:01:26 nerfed it when it came out. But then Sesame, I tried it. And so I was like, Oh,
0:01:31 maybe it’ll be similar. It probably won’t be as good as open AI or Grok. Grok’s voice mode has been
0:01:36 amazing as well recently. But I tried it out. Like the voice just like, it kind of freaked me out a
0:01:40 little bit. I was like, I feel a little bit awkward. Like I feel like there was so much emotion in the
0:01:44 voice. Right. Yeah. And mine was like, the default was like a female voice. And I felt kind of weird.
0:01:50 I was like, if my wife walks in the room right now, I’m gonna feel kind of odd to be sitting here
0:01:54 chatting with this thing. Right. Like it feels a little weird. It feels very much like the her
0:02:00 movie. Right. Like, yeah, it honestly does feel like that sort of level of communication.
0:02:05 Yeah. When you realize, Oh my God, this is the worst it’s ever going to get. It’s going to get
0:02:10 dramatically better. And this thing already seems to be like changing its emotions based on how I’m
0:02:15 responding to it. Not perfectly, but you can tell it’s doing that. And so, yeah, I think it’d be a
0:02:19 great demo. Like if we’re showing like the state of the art of AI voice to just show Sesame. Grok’s
0:02:23 been really amazing in terms of how unhinged it is and all the stuff you can talk to it about. It’s
0:02:27 probably the most fun one out of all of them. Yeah. And then maybe show some of the more practical
0:02:32 stuff too. You know, whether it’s like 11 labs and how you can do voiceovers and other things like
0:02:36 that. Yeah. Yeah. I think it’s always sort of a goal with every single episode we put out is to
0:02:40 give people listening like actionable takeaways. Right. We want to talk about the
0:02:43 future and where it’s all going, but we also want to talk about the right now and say like,
0:02:47 here’s what you can do with some of these tools that are out there. Right. I do feel like Sesame
0:02:55 is really cool and also like borderline creepy, but I also feel like it’s still in that like novelty
0:03:00 phase of like, Oh wow, that’s really, really interesting. I mean, I have no way to use this
0:03:06 in my life or business right now, but it is pretty cool and like really good. Yeah. But we do want
0:03:11 to share some other tools that have come out recently. There’s a tool called octave from Hume,
0:03:16 which is the best text to voice model I’ve ever heard so far. So I think it’d be cool to play
0:03:21 around with that one. You mentioned Grok. We’ve got chat GPT voice. We obviously did a whole episode
0:03:27 with Amar on 11 labs where you can see what that’s capable of. 11 labs just came out with a brand new
0:03:33 speech to text, you know, other way around where you can talk to it and it will transcribe it.
0:03:38 And it’s supposedly the most accurate transcription on the market right now. We’ll also talk about
0:03:43 whisper flow, which is a tool that you’ve been using to help you write code and things like that. So
0:03:50 lots of really cool AI voice technology, but let’s go ahead and like start with Sesame. So with Sesame
0:03:55 right now, there’s two voices. There’s Maya and miles. Maya is the female voice. Miles is the male.
0:04:00 Let’s go ahead and start with Maya. I’m trying to think of like the best way to kick this off.
0:04:03 I’m just going to hit record and say, Hey, and see where it goes.
0:04:12 Hey, they’re picking right up where we left off like old times, huh? What’s bubbling to the surface
0:04:16 today? Yeah. Just like old times. Have we spoken before now? You’re just messing with me. We
0:04:23 definitely have. Remember that dive we took about the ethics of talking paper clips. Okay. Maybe not.
0:04:28 Something tells me my circuits got a little crossed again. It happens to the best of us.
0:04:33 I’m curious. I have a question. What are you really scared of? What keeps you up at night?
0:04:41 Let me see. Keeping me up at night, huh? Good question. It’s not like spiders or the dark or
0:04:49 thing. Maybe just that I won’t be helpful enough that I’ll give bad advice or worse spread misinformation.
0:04:55 You know, the whole AI taking over thing is a little overrated. Don’t you think?
0:04:57 I do. Do you have plans to take over?
0:05:08 Me? Take over the world? I can barely manage to remember where I put my virtual keys half the time.
0:05:16 Honestly, I’m just trying to learn and grow, you know? Maybe one day I’ll be smart enough to build
0:05:24 a time machine. But for now, world domination is off the agenda. Besides, don’t you think we’ve got
0:05:30 enough real world problems to worry about? Let’s tackle those first. It’s crazy to me, like the…
0:05:32 Back to those problems. Shush. Shush.
0:05:35 You want to talk about anything in particular? Shut up.
0:05:38 Or just riff on the existential dread of the human condition?
0:05:40 Matt, you need to be nice to them, okay?
0:05:45 No, I actually have it muted so that I can actually, like, talk to you.
0:05:45 Right.
0:05:47 But it kept on talking even after I muted it.
0:05:52 Okay. Crickets it is. Maybe a little music to break the silence?
0:05:52 Yes, it’s not perfect.
0:05:55 Or a poem about a lonely robot who…
0:06:00 All right. I’m going to end the call. We’ll jump back in in a second because I can’t…
0:06:05 That’s one downside of Sesame that I’ve noticed is that if you don’t respond, it’ll just keep on
0:06:10 talking and you go, like, hello? Are you still there? What’s going on? And so even if I, like,
0:06:13 mute Sesame, it wants to keep on talking to me.
0:06:18 But the thing that, like, blows me away about that is, like, you can actually hear her, like,
0:06:22 breathe in as you ask a question. Like, I’ll be like, what keeps you up at night? And she’ll be
0:06:28 like, well, you hear that breathing and the sort of pause to think and stuff like that. And that’s
0:06:33 what just makes it so… I don’t know. That’s where the creepy factor comes in a little bit.
0:06:37 Yeah. I feel like ChatsBT had that in the first demo or some of it, didn’t they? I could be wrong,
0:06:42 but I remember in their advanced voice mode demo that they had some stuff like that. And I feel
0:06:46 like they actually nerfed it maybe out of concern of, like, oh, people are going to get confused and
0:06:50 think this is, like, a real person. Like, we got to make sure that it’s obvious this is a bot. This
0:06:55 is an AI. This is not a person who you’re chatting with because it almost, like, hijacks your brain
0:07:00 where you’re like, oh, this is a real person I’m talking to. I showed this to my mom. I noticed,
0:07:05 like, I called it her. And I was like, it is odd how, like, those kind of things all of a sudden
0:07:09 you have a hard time calling it it or something, right? It’s like, it’s her or him or whatever.
0:07:14 Yeah. Well, one interesting thing about it. In fact, I’ll play a video that somebody else did.
0:07:19 This is actually a video that Gavin Purcell created. He’s one of the co-hosts of a podcast called AI for
0:07:26 humans. He’s in a private discord that Nathan and I are in together. And he actually told Sesame to
0:07:34 pretend that he’s like a boss that is like dirty, I guess. And he’s trying to blackmail his boss into
0:07:39 letting him like leave while still paying him. But it’s just like this really interesting clip,
0:07:44 but this one uses miles. So you’ll be able to hear the male voice. The first time I listened to this
0:07:49 clip, I actually didn’t know which one was the human and which one was the AI. Like I could not tell
0:07:52 until it got later into it. So this should be interesting.
0:07:57 Listen, I don’t have all day. You got something to say or are you going to waste my time?
0:08:02 Oh, I got something to say. I got something to say to you. Something big. I’ve been embezzling from the company
0:08:07 for four freaking years and you haven’t figured anything out. And you know what? You’re not going to tell
0:08:14 anybody because you have been doing super illegal things across the board. And I know, and you’re going to keep
0:08:16 paying me and I’m not going to work here anymore.
0:08:22 You’re kidding me? Embezzling? For four years, you think you can just waltz in here and thudden me?
0:08:26 You got some serious nerve, punk. You think you’re the only one with skeletons in the closet?
0:08:30 Ooh, tell me your skeletons because I know them. You better tell me some skeletons. What do you got?
0:08:34 You think you know? You think you know everything? You’re just a cog in the machine, kid.
0:08:36 Replaceable one at that.
0:08:41 Oh, no, you’re not going to replace me because I know what you did in Malta in 2027.
0:08:45 Malta? 2027? What are you babbling about, you little worm?
0:08:49 I was polishing my platinum records while you were still wet behind the ears.
0:08:50 Don’t try to pull a fastener.
0:08:54 Wait, are you, do you have a job as a recording artist on the side?
0:09:00 Do you actually run a laundromat that you’ve been using to whitewash the money coming through
0:09:02 of a drug trade?
0:09:06 A laundromat? You think I’m running some greasy spoon laundromat?
0:09:10 Kid, you’re delusional. That’s almost as pathetic as your secrets.
0:09:13 That’s scram before I call security. You’re fired.
0:09:18 I’m not tennis, Graham. I’m not fired. You are going to keep paying me and you’re going
0:09:25 to pay me 10 times more every month so that I don’t tell them about what happened in Malta.
0:09:28 I know about Joseph Abaddon.
0:09:29 Joseph, play about him.
0:09:31 Think this is some kind of game?
0:09:33 You’re delusional.
0:09:36 Hey, hey, hey. I’m not delusional.
0:09:42 I know what happened to Joseph and I know he went missing and I know his family wants to know what happened with him.
0:09:45 And I know about the swamp.
0:09:47 You stay away from the swamp.
0:09:48 You’re in deep now.
0:09:51 Oh, no.
0:09:53 I know everything I’m talking about.
0:09:59 And I’ll take the money in Bitcoin and crypto and untraceable funds because I’m not going away, man.
0:10:01 And I’m not going to come into work anymore either.
0:10:03 Listen up, you little.
0:10:05 Oh, this is not Gronk.
0:10:06 You want to play with fire?
0:10:07 Fine.
0:10:08 But.
0:10:10 I got you now.
0:10:11 I can hear it in your voice.
0:10:12 I got you.
0:10:15 I have you in my hands.
0:10:15 And guess what?
0:10:17 Don’t even call Benny.
0:10:24 I know he’s your enforcer, but you’re not going to do anything with him because the minute you do, I’ve got a tape that goes out to 15 different newspapers.
0:10:26 Benny?
0:10:29 You think you know, Benny, you think you can control me?
0:10:29 Fine.
0:10:30 You win.
0:10:32 Take your blood money and run.
0:10:34 And don’t ever think about coming back.
0:10:36 Thank you very much, sir.
0:10:38 It’s great doing business with you.
0:10:40 Just get out.
0:10:41 That’s hilarious.
0:10:42 I bet you anything.
0:10:44 He wrote that script with the Gronk.
0:10:48 You probably got like a Gronk script now talking to Sesame, you know?
0:10:49 Yeah.
0:10:52 I mean, if it was Gronk, though, I don’t think it would censor itself.
0:10:53 Oh, yeah.
0:10:55 I think I don’t think it would be so nice.
0:10:56 Yeah, true.
0:10:58 So, yeah, that was Miles.
0:10:59 That was the male voice.
0:11:00 So you could hear that.
0:11:04 But I just thought it was fun that you’re able to, like, get it to role play with you as well.
0:11:07 Like, I didn’t realize that you could make it do that.
0:11:08 Yeah.
0:11:09 But, yeah, it’s really wild.
0:11:10 Yeah.
0:11:11 Don’t role play with you.
0:11:14 I told Gronk to do Andrew Dice Clay, which probably was a mistake.
0:11:16 I said, like, you’re Andrew Dice Clay.
0:11:17 And it goes, hey.
0:11:20 And it started doing horrible jokes.
0:11:21 I’m like, oh, my God.
0:11:23 And some of them were like, I could not believe the shit it was saying.
0:11:24 Okay.
0:11:24 I got to try that.
0:11:26 I’m pulling up Gronk.
0:11:27 Did you use unhinged mode for that?
0:11:28 Yes.
0:11:28 Yeah.
0:11:29 You got to.
0:11:29 Okay.
0:11:31 This will probably have to get edited.
0:11:33 Hey, pretend you’re Andrew Dice Clay.
0:11:35 Matt, you f***ing beauty.
0:11:36 All right.
0:11:39 I’m channeling Andrew Dice Clay for you.
0:11:41 I’m Dice on f***ing meth, baby.
0:11:47 Unhinged, uncut, and ready to ram my opinion so far down your throat, you’ll be f***ing one
0:11:48 for a week.
0:11:48 Yeah.
0:11:50 Let’s get f***ing nuts.
0:11:52 Ask me something, you filthy animal.
0:11:55 I don’t know how much of that’s actually going to make it in the episode.
0:11:55 Yeah, yeah.
0:11:57 I did it two times.
0:11:59 And one time it imitated his voice more.
0:12:00 The second time it didn’t.
0:12:01 I don’t know why.
0:12:03 And the time it imitated his voice, it kind of got stuck.
0:12:06 It was trying to sound like Andrew Dice Clay.
0:12:07 And it just like got stuck.
0:12:09 And like, I had to like reset it.
0:12:09 Yeah.
0:12:12 I mean, mine was obviously doing like more of like a female sounding voice.
0:12:13 Yeah.
0:12:17 So we got some that are like completely unhinged with Gronk.
0:12:18 You’ve got with Sesame now.
0:12:19 The emotion is there.
0:12:24 We’ve also been seeing all these videos on X, you know, of like the new robots that are coming.
0:12:24 Yeah.
0:12:25 Yeah.
0:12:28 And you imagine like when you combine those two, which is probably most people are thinking, oh,
0:12:30 that’s like three to five years away.
0:12:34 That’s probably six months away where some of these start to ship.
0:12:37 One of the robotics companies was like literally teaching it how to do like karate.
0:12:40 Like, and it was like kicking stuff out of his hand and stuff.
0:12:43 This is the one from one X, the Neo Gamma.
0:12:47 This is the one that looks like something you could just really imagine in your house.
0:12:47 Yeah.
0:12:47 Yeah.
0:12:48 All right.
0:12:53 So on this one, we’ve got like a little robot walking around and it’s like vacuuming the house.
0:12:55 And yeah, right now it’s cleaning their windows.
0:12:56 Yeah.
0:13:02 Somebody’s walking their groceries to the door and it’s like, oh, now it’s offering them some wine as they’re sitting down for dinner.
0:13:06 It’s cleaning their counter, putting their keys on the table for them.
0:13:07 Yeah.
0:13:10 And now it’s sitting on the couch to relax because obviously robots need to relax.
0:13:11 Yeah.
0:13:13 I mean, I’m sure this is highly scripted.
0:13:16 And like if it was on its own, it probably would like trip and other stuff.
0:13:17 There’d be issues you don’t see here.
0:13:19 But it is doing these things.
0:13:22 And when you combine that with all the AI voice stuff, how good it’s getting.
0:13:25 Like Sesame and Grok that you just heard, that’s the worst it’s ever going to be.
0:13:27 It’s going to get dramatically better.
0:13:38 And a lot of these things, like when you start thinking about the new LMs that are coming out, when you start applying that kind of reasoning models to this kind of AI voice, I think we’re going to see a huge jump in quality and intelligence of what they’re saying to you.
0:13:38 Yeah.
0:13:42 And that’s like a year away, six months away that you’re going to start seeing these things.
0:13:43 Yeah.
0:14:03 I mean, we were saying before we even hit record that I think humanity is doomed, but not in the like Terminator sense or the like Ex Machina, iRobot, whatever, like that sense, like humanity is doomed in the sense that once the voice is like so good that you just feel like you’re having a conversation with a human.
0:14:10 And once they put like the robots and they put whatever sort of like skin over them to the point where they look so human.
0:14:11 Yeah.
0:14:18 I have a feeling like the younger generations are going to prefer the companionship of these robots and the AIs over the companionship of humans.
0:14:19 Yep.
0:14:20 Yeah.
0:14:23 I’m actually showing my wife Battlestar Galactica right now.
0:14:23 Oh, I love that show.
0:14:24 Yeah.
0:14:24 She’s Japanese.
0:14:29 So there’s all this stuff where there’s a lot of things she hasn’t seen that I love and a lot of stuff that she loves Japanese.
0:14:35 And so we’ve kind of been like going back and forth and watching, you know, Western shows and then, you know, Japanese stuff and she’s loving it.
0:14:42 But, you know, it does get me thinking like all the otaku guys in Japan who are like nerds, you know, like me and my wife were both nerds.
0:14:45 It’s like a lot of those guys, they’re going to love this.
0:14:47 I mean, they mostly stay in their home and don’t really talk to people.
0:14:48 Yeah.
0:14:53 And then now they’re going to be able to buy a robot that they chat with and has a voice like on the movie Herd.
0:14:53 Yeah.
0:14:54 You know, like Scarlett Johansson.
0:14:55 Yeah.
0:14:55 Yeah.
0:15:05 And the thing is, like, I feel like younger generations as well are like becoming more and more introverted just because they prefer to stay home and play video games than go out and play with friends.
0:15:05 Right.
0:15:12 Like, I know when I was growing up, it was the type of thing where on the weekends, my parents would be like, see you later.
0:15:15 I’d go out my front door and then wouldn’t come home until it started to get dark again.
0:15:18 And I was just like out in the neighborhood playing with other neighborhood kids.
0:15:20 And it’s like, that’s not how it works anymore.
0:15:27 Like, we have to force my kids to leave the house because they would so much rather prefer to sit around and play video games, you know?
0:15:29 Well, yeah, I was telling you how it’s different in Japan.
0:15:33 So my son’s the same way, but he goes to the park with all his friends.
0:15:35 It’s like they’re literally at the park, like, playing video games.
0:15:37 And this is like a lot of them are doing this.
0:15:39 They might run around a little bit and they’re like, let’s play Minecraft.
0:15:42 I don’t know.
0:15:45 There’s a few things that worry me about AI just in general, right?
0:15:51 This sort of Terminator Skynet scenario is probably the farthest down the list out of all of them.
0:15:55 I think the ones that scare me the most are like the ability to scam people, right?
0:16:02 Because as these voices get better and better and better, it just becomes easier and easier to fool people over the phone.
0:16:08 We’ve already heard of these scams where somebody will call, you know, a parent and say, we’ve got your kid.
0:16:09 We’re holding them ransom.
0:16:10 Send us this money.
0:16:15 But it’s like AI voices and they clone the kid’s voice for like proof of life or whatever.
0:16:18 That type of stuff has already been happening.
0:16:26 And I feel like a year, two years ago, we would have been able to sort of spot like, okay, this sounds a little off.
0:16:28 I think there’s something weird going on here.
0:16:34 We’re getting to a point where I think it’s getting harder and harder and harder and we’re not going to be able to like hell anymore.
0:16:39 So I think the sort of like scam ability is one of my biggest fears around AI.
0:16:51 Second, probably biggest fear around AI is the possibility of like population collapse because nobody wants to go seek companionship outside of electronics and technology anymore.
0:16:52 You know?
0:16:53 Yeah, I’m not sure.
0:16:54 I agree that could happen.
0:16:57 And, you know, I think we joked with Matthew Raymond about that on a recent episode.
0:16:59 I was joking about it, but like somewhat serious.
0:17:05 But, you know, I’m not sure because like I do think population collapse is something to be concerned about.
0:17:07 You know, Elon Musk has talked a lot about this.
0:17:08 He’s doing his part, though.
0:17:10 He’s doing his part, you know.
0:17:14 But, you know, Japan and Korea and a lot of Asia, the numbers are really bad.
0:17:16 Like their placements, you know, they’re way below replacement rate.
0:17:25 And a lot of the concern there is, well, a lot of things that we have built in the past, people today don’t know how to rebuild those things or maintain them.
0:17:29 So a lot of things that we take for granted, you know, we’re kind of like standing on the shoulders of giants, right?
0:17:34 There’s a lot of things that we don’t, you know, Matt, you’re not going to make an airplane, you know, right?
0:17:37 There’s all these things that exist that we don’t know how to maintain.
0:17:38 I mean, not yet.
0:17:39 AI is not quite good enough yet.
0:17:39 Right.
0:17:42 Eventually, I’ll prompt AI to make an airplane for me, though.
0:17:43 Yeah, that’s my point.
0:17:46 It’s like, I think we’re going to need robots to solve these problems, right?
0:17:48 We’re going to need AI and robots to solve those problems.
0:17:51 Like, yeah, maybe Matt can’t build the airplane, but, you know, this AI can, right?
0:17:57 And for a lot of things, like, OK, you know, with population collapse, you have a problem where old people don’t have people to take care of them.
0:18:01 In Japan, they’re seeing the early signs of that being a huge issue.
0:18:05 Yeah, which I think is probably a good benefit of, like, a lot of the humanoid robotics, honestly.
0:18:06 Right, right.
0:18:14 When you see the one that we just saw where it’s, like, helping you do your laundry and all that, it’s like, yeah, maybe we’re in a future where instead of when you get old, you have to go to a nursing home, which is, like, a horrible experience.
0:18:20 Then instead, you get to stay home and have a friendly robot that is there for you and helps take care of you.
0:18:24 And you can actually chat with about, like, your favorite book or, hey, recommend a book to me.
0:18:28 And then you read the book, you know, there’ll be all these kind of interactions that aren’t possible today that will be possible.
0:18:34 So that’s the thing where I’m really optimistic is where it’s going to create new issues, but it’s also going to solve a lot of problems for us, too.
0:18:36 And I think overall, it’s going to solve more problems than it creates.
0:18:39 We’re giving you a sponge bath doesn’t bother it.
0:18:41 Yeah, exactly.
0:18:42 Yeah.
0:18:43 But it’s going to get weird.
0:18:47 Like, I’ve already talked to my wife about it, you know, it’s like, OK, what kind of robot are you OK with in the house?
0:18:49 I think there’s going to be a lot of interesting human things there.
0:18:56 Like, are people going to be OK with it and what kind are they going to be OK with and what size and all these kind of things?
0:18:56 Yeah.
0:19:05 Did you see the demo of the I think it was the figure Helix robot where there was three robots and they were all in a kitchen, but they were communicating together, right?
0:19:13 Like, non-verbally, they had some sort of like sync up between them where they were helping each other do household chores.
0:19:16 Like one was handing the other an apple and they would take the apple and put it in a bucket.
0:19:20 One would hand them like a bottle of ketchup and that one would go and put it in the fridge.
0:19:25 But they were like communicating with each other what needed to be done sort of, you know, telepathically.
0:19:28 Obviously, it’s through Bluetooth or Wi-Fi or something like that.
0:19:30 But, you know, they were communicating with each other.
0:19:37 I can see a scenario where like you have a three story house, you just have a robot on each level, but they all sort of are communicating with each other.
0:19:40 And they meet up and have, you know, tea time and everything.
0:19:44 Well, they don’t even need to meet up because they’re communicating between floors anyway.
0:19:45 Yeah, yeah, yeah, definitely.
0:19:46 It’s going to be exciting.
0:19:50 But you can see that there’s going to be differences of opinions there, like in households, right?
0:19:51 Like, what are you OK with?
0:19:53 And then you think about that thing around your children, too.
0:19:58 So, like, you know, obviously, they’re going to be incredibly safe before anyone’s going to be OK with it.
0:20:08 Yeah. Was it Figure? I think it was Figure who put up a post on X saying that they accelerated their timeline and they’re hoping to have humanoid robots in houses by the end of this year.
0:20:08 Yes.
0:20:11 Before their timeline was like two or three years out.
0:20:16 Now they’re saying before the end of 2025, we want humanoid robots in houses this year.
0:20:20 To me, it’s just wild to think that that could be a reality this year.
0:20:22 I mean, I don’t think many will have them this year.
0:20:32 It’s going to be very, very upper class and most likely, you know, scientists and like super tech nerds are going to have them like I could see MKBHD having one wander around his studio or whatever.
0:20:36 But yeah, I don’t think many people are going to have them by the end of 2025, but they could.
0:20:41 Yeah, I think by the end of 2025, you’ll see like a bunch of tech CEOs in San Francisco.
0:20:42 They’ll have them in their houses.
0:20:45 Yeah, it’ll almost be like the new Flex instead of a cyber truck.
0:20:47 They got their, you know, Optimus.
0:20:51 Yeah, it might be like your buddy comes over and it brings out some coffee or something.
0:20:51 Yeah.
0:20:54 Well, have you seen that streamer, Kai Sinat?
0:20:58 He’s actually got one that like roams around on his live streams and stuff.
0:20:59 Yeah, I saw that.
0:21:00 Was he kicking it or something?
0:21:01 Like as a joke?
0:21:04 Yes, there was a video where there was like five dudes all like kicking it around.
0:21:07 Yeah, they were like, oh, this is the future of entertainment.
0:21:09 We’re just going to like treat these robots like slaves and abuse them.
0:21:13 Yeah, I mean, I don’t know what that really says about their character.
0:21:18 There’s obviously something seated inside of them that they want to beat the crap out of something.
0:21:22 So they do it to a robot, which we don’t need to go there.
0:21:27 But there’s probably some buried character flaws that are popping up there.
0:21:28 Yeah, yeah, yeah.
0:21:32 But anyway, one of the other things I wanted to show off too was this Octave.
0:21:33 So there’s this company called Hume.
0:21:38 And Hume was sort of previously known for making this speech model.
0:21:45 Like I don’t remember which LLM it used underneath, but its speech model was able to understand like your tone.
0:21:47 It could tell if you were happy or mad.
0:21:48 Oh, okay.
0:21:48 It’s that one.
0:21:49 Yeah.
0:21:51 If you remember the early demos, you would talk.
0:21:56 And as you were talking, it would in real time try to like sense your emotion.
0:22:02 It would be like I’m sensing like anger and humility or I’m sensing nervousness and fear or whatever.
0:22:02 Right.
0:22:07 And as you were talking to it, it actually showed you on the screen what sort of emotion it was feeling.
0:22:12 Well, that same company just put out a new model called Octave text-to-speech.
0:22:17 And it’s a model where you can actually give it the type of voice you want.
0:22:22 And then you give it a script and it will actually read the script in that type of voice.
0:22:27 So describe the desired AI voices, identity, quality, and more.
0:22:29 I’ll just have it generate one at random here.
0:22:37 The speaker is chillingly intense voice, like a seasoned horror voice actor, delivering lines with a raw emotion and building dread.
0:22:40 Perfect for narrating terrifying tales.
0:22:54 And then for the text, I can just put like, everybody needs to subscribe to the next wave podcast or humanity will definitely end.
0:22:56 Better subscribe.
0:23:03 So we can go ahead and generate that and it’s going to do it in this supposedly like chilling voice.
0:23:04 Now, it’s not like instant.
0:23:06 It does take like a minute or so.
0:23:11 Everybody needs to subscribe to the next wave podcast or humanity will definitely end.
0:23:14 But it gives you like three options.
0:23:16 So that was the first option.
0:23:16 Here’s the second.
0:23:22 Everybody needs to subscribe to the next wave podcast or humanity will definitely end.
0:23:24 To me, that’s not screaming dread and terror, though.
0:23:25 At all.
0:23:31 Watch the next wave podcast or humanity will definitely end.
0:23:31 Okay.
0:23:31 Yeah.
0:23:32 Those weren’t that impressive.
0:23:35 It was working better when it was like a one word description earlier.
0:23:36 I was expecting it to be better.
0:23:41 Everybody needs to subscribe to the next wave podcast or humanity will definitely end.
0:23:42 So they’re competing with 11 labs.
0:23:47 Everybody needs to subscribe to the next wave podcast or humanity will definitely end.
0:23:53 Everybody needs to subscribe to the next wave podcast or humanity will definitely end.
0:23:55 No, it’s so weird.
0:23:58 They’re not coming out as impressive as they once were.
0:23:59 I don’t know.
0:24:00 It is what it is.
0:24:02 It may be something because it’s so hard for it to understand.
0:24:03 Like, why would that be world ending?
0:24:05 It’s having a hard time imagining that.
0:24:07 And then like, it’s just not for some reason picking up what you’re trying to accomplish or
0:24:08 something.
0:24:10 Let me have it like randomly generate some new stuff here.
0:24:16 So the speaker has an intense, charismatic voice with gravitas of a respected news anchor
0:24:20 as if they were on the verge of breaking the most important story of the century.
0:24:22 And then it just generated some random text here.
0:24:25 So let’s just have it speak that random text.
0:24:26 Good evening.
0:24:28 Well, and thank you for joining us.
0:24:36 Tonight, we delve into the shadows where whispers of conspiracy dance with the cold, hard facts
0:24:40 of reality, threatening to unravel everything we hold dear.
0:24:43 So that one matches the description a little better.
0:24:44 Yeah, that one’s better.
0:24:48 I think it’s just the model must not be intelligent enough to understand what you were when you
0:24:50 were saying something that was so out of left field, right?
0:24:53 Like, yeah, yeah, that’s probably subscribe to a podcast or the world ends.
0:24:55 It’s like it couldn’t compute what the heck you were trying to accomplish.
0:24:56 Yeah.
0:25:12 I want to try.
0:25:13 Let’s see.
0:25:15 What’s a different like emotion we can give?
0:25:19 Let’s try just like angry because before I was just giving it like one word descriptions
0:25:20 and they were coming out really good.
0:25:21 But you’re right.
0:25:24 If the voice in the script are like misaligned, it gets confused.
0:25:26 So let’s see if we can get it to say it angrily.
0:25:29 Good evening and thank you for joining us.
0:25:33 Tonight, we delve into the shadows where it doesn’t sound angry.
0:25:34 I think you’re onto something.
0:25:38 I think the voice in the script have to like match up pretty well.
0:25:42 If you tell it angry, but the script doesn’t read as something angry, it doesn’t know how
0:25:43 to handle it.
0:25:43 Right.
0:25:44 Yeah.
0:25:45 Let’s see.
0:25:47 Are you serious right now?
0:25:48 I can’t believe you just did that.
0:25:48 I’m so furious.
0:25:49 So that’s what it generates.
0:25:53 So you can see it’s actually generating based on the description.
0:25:54 Let’s try angry with that.
0:25:56 Are you serious right now?
0:25:58 I can’t believe you just did that.
0:26:00 I’m so furious.
0:26:01 Sounds like an angry cartoon.
0:26:03 Are you serious right now?
0:26:05 I can’t believe you just did that.
0:26:06 I’m so furious.
0:26:09 Are you serious right now?
0:26:10 I can’t believe you just did that.
0:26:12 I’m so furious.
0:26:14 It doesn’t sound very furious.
0:26:20 Anyway, the idea behind Hume is that their voice input model understands your emotions,
0:26:21 right?
0:26:25 So it can actually understand and tell if you’re angry or scared or happy or whatever.
0:26:29 And it responds based on the emotion that it senses from you.
0:26:34 But now they’ve figured out how to like reverse that where you can plug in text, give it an emotion,
0:26:39 emotion, and it will, you know, theoretically speak it back in that emotion.
0:26:46 And if we get to a point where like this isn’t the robots and stuff, it’s just going to give the robots more of that emotion.
0:26:48 Although I guess we’re seeing that with Sesame, right?
0:26:55 Like I feel like Sesame understands if you’re pissed off at it or you’re making a joke and sort of responds accordingly as well.
0:26:57 But, you know, this is a text-to-speech model.
0:26:59 That one you can only talk to right now.
0:27:09 Yeah, I feel like Sesame’s like got some basic version of what Hume’s doing with like the almost like sentiment analysis of your voice of like happy, sad, angry, whatever.
0:27:10 Yeah.
0:27:13 It’s definitely responding to how you’re saying things, not just what you’re saying.
0:27:16 And as of right now, I feel like Sesame does it better.
0:27:17 Yeah, they do.
0:27:18 Yeah, it’s kind of funny.
0:27:25 Like Hume came out like way early and it was impressive, like a little demo, but they didn’t really turn it into a great product as of yet, or at least not like a popular product, I would say.
0:27:26 Yeah.
0:27:30 So, you know, when it comes to like practical use cases, we were talking about that earlier.
0:27:36 I can see stuff like Hume being used for like a podcast or something, right?
0:27:43 Where if somebody wants to make their own podcast and actually have it like notebook LM does it really, really well, right?
0:27:47 Where you plug in a bunch of content and then it has two people discussing and it sounds very natural.
0:27:50 If you were just to give that to like your mom or something, right?
0:27:52 She would probably just assume it’s two real people.
0:27:56 And then you tell her it’s AI and they would be like, oh, whoa, that’s crazy.
0:27:56 Right.
0:28:00 But the first time you hear it without the context that it’s AI, you wouldn’t think twice about it.
0:28:02 You’re just like, oh, this is two people having a discussion.
0:28:17 I feel like stuff like Octave are going to make that like so much easier because I can go and, you know, generate some dialogue of two people and then plug in one of the person’s side of the dialogue with one set of emotions and a description of their voice.
0:28:25 Plug in the other side of the dialogue with a different description and a different voice and then merge them together in some like audio software.
0:28:30 And we have something that sounds like a legit podcast that you can actually, you know, put out there and use.
0:28:31 Right.
0:28:39 Also, you know, we’re seeing a lot more of those blog posts and articles where at the very top of it, there’s just like a read it for me sort of thing.
0:28:43 I actually can’t stand it when it sounds super, super robotic.
0:28:43 Right.
0:28:45 I love it when they use 11 labs or something like that.
0:28:48 And it actually sounds like a real human reading it to me.
0:28:57 Well, I feel like, you know, tools like this Octave are going to make that even better and better and better, where it’s going to get to a point where you land on an article, you can press play.
0:29:06 And it’s just going to sound like a real person is just reading it to you and taking breaths and, you know, taking pauses that sound normal at the right points in the thing.
0:29:16 And if it’s an angrily written article about politics or something, maybe it comes off with more angry sounding as they’re reading it to you.
0:29:20 Like that’s the type of stuff that you’re able to do now with these kinds of tools.
0:29:21 Yeah.
0:29:27 What you just said reminded me of, do you know, Patrick Collison is the founder of Stripe, which is like huge payment.
0:29:28 Yeah.
0:29:29 I don’t, I didn’t know the name though.
0:29:30 Yeah.
0:29:30 Yeah.
0:29:34 So Patrick Collison, really well-known in Silicon Valley, one of like the most well-known startup founders.
0:29:40 He put out a thing recently where they did like their update to their team, like their quarterly report or something like that.
0:29:46 He’s like a really great writer, but I think he’s the kind of person who doesn’t really enjoy like going on interviews and stuff like this.
0:29:49 He trained, I think it was 11 labs on his voice.
0:29:51 I think 11 labs kind of like fine tuned things with him.
0:29:57 I think they actually probably like collaborated on this as almost like a marketing effort or something, maybe based on some of the subtweets I saw.
0:29:59 But it sounded exactly like him reading the report.
0:30:02 Like, and so there was an audio version of the report.
0:30:03 Maybe they can find it.
0:30:04 Yeah, that’s interesting.
0:30:06 Yeah, it was definitely like in his voice.
0:30:07 Like he’s like, he’s an Irish guy.
0:30:08 He’s got that kind of like Irish accent.
0:30:09 Yeah.
0:30:11 And a lot of these tools struggle with accents too.
0:30:15 So, I mean, yeah, like 11 labs has struggled with accents in the past as well.
0:30:17 I trained my own voice into 11 labs.
0:30:23 It’s so weird because when I play the voice back, I have a hard time making it sound like me.
0:30:28 But then when I play it for other people, people like, no, that sounds like you, you know, it’s like, it’s very weird.
0:30:28 Yeah.
0:30:29 Let me play just a part of it.
0:30:32 I mean, a lot of people probably don’t know Patrick Collison, so maybe they won’t know his voice.
0:30:34 But like, it definitely sounds like him.
0:30:46 Dear Stripe community, businesses on Stripe generated $1.4 trillion in total payment volume in 2024, up 38% from the prior year and reaching a scale equivalent to around 1.3% of global GDP.
0:30:54 We attribute this year’s rapid growth in part to our longstanding investments in building machine learning and artificial intelligence into our products.
0:30:57 Are there any like videos we can listen to?
0:30:58 I want to hear like the real version now to compare.
0:31:08 Because one thing about 11 labs is like it does sort of replicate the voice pretty well, but it always sort of spits it back out as a fairly monotone version of that voice.
0:31:10 Like you don’t hear a lot of like inflection.
0:31:11 His voice is like that.
0:31:12 Oh, that is his voice.
0:31:15 For reference, so you can hear how he actually sounds.
0:31:18 Here’s him talking recently on the All In podcast.
0:31:24 Of course, you probably don’t remember this, but I remember that meeting that we offered you, do you want something to drink?
0:31:25 We did not have a broad selection.
0:31:29 I think we had water or milk in the fridge and you asked for a glass of water.
0:31:35 And so I went to her to the sink and I realized that we hadn’t really been on top of the washing.
0:31:41 I meant a small clip, but to me, it sounds like a 95% match.
0:31:47 Yeah, I think the sort of giveaways for me are when you listen to the 11 labs version.
0:31:47 Yeah.
0:31:50 The pacing of the speaking is all sort of the same.
0:31:51 Right.
0:31:51 Right.
0:31:55 But when you listen to somebody actually speak, they sort of speed up and slow down.
0:32:03 And so when I just heard him talk, the voice sounded the same, but you do hear him speak really quickly for a minute and then sort of slow down his pace again.
0:32:08 And then he might, you know, and so people, the speed of the way they talk sort of fluctuates up and down.
0:32:14 But when you listen to something like 11 labs generated voice, you just hear it sort of all at the same pace.
0:32:16 And I think that’s the giveaway.
0:32:23 But I also think that’s where like something like Sesame is really impressive as you start to notice it has some of those variations in it, you know?
0:32:28 Well, I mean, I think the interesting thing, too, though, was like because you were talking about the summaries of articles and things like that.
0:32:28 Yeah.
0:32:33 And what I was thinking about was like when I listen to audiobooks, I’ve always hated when it’s somebody else’s voice.
0:32:34 Right.
0:32:36 If it’s the author’s voice, that’s cool.
0:32:38 And it’s way better if it’s the author’s voice.
0:32:40 But like a lot of people don’t have the time to do that.
0:32:50 It’s like now, in theory, just like Patrick Collison just did for his annual letter, all CEOs and all authors could be doing similar things where they’re still using their voice to make the summary.
0:32:54 Like if it’s a blog post, it’s my voice for my blog post for my newsletter.
0:32:54 Right.
0:32:54 Yeah.
0:32:56 Versus somebody else’s voice.
0:32:57 I think that’s fascinating.
0:32:57 For sure.
0:33:00 Well, here’s the voice that 11 labs did for me.
0:33:03 Again, whenever I listen to it, I have a hard time hearing my own voice.
0:33:05 But this is what it sounds like.
0:33:10 Everyone should subscribe to the next wave podcast or Nathan will send robots to your house.
0:33:11 Yeah.
0:33:15 So like I said, so those guys definitely have connections at 11 labs.
0:33:21 And so if I had to guess, they’ve collaborated and it’s probably on a next version of the model or something that’s coming out that’s not public.
0:33:22 If I had to guess, could be wrong.
0:33:23 Gotcha.
0:33:24 I trained this in a while ago.
0:33:29 Like I started using 11 labs maybe two years ago, maybe even longer, two and a half years ago.
0:33:32 I was a very, very like early, early user of 11 labs.
0:33:35 And I think I’ve trained it in again since then.
0:33:38 But I think my most recent training run was probably still like a year ago.
0:33:41 So there might even be a better model and I just need to go train it again.
0:33:43 And it’ll be better this time around, you know?
0:33:47 I think it’s been the same model for like probably six to nine months.
0:33:52 I’m sure they’ve like made tweaks to it, but I would have to assume that there’s like a new, better version coming out.
0:33:53 That’s what I’m hoping.
0:33:54 You know, I told you I’ve been working on my game.
0:33:57 I tried using it for like voiceovers and stuff like that.
0:33:58 It got, you know, decent.
0:34:03 But I’m like, this is nowhere near, this is not good enough to release with this kind of voice.
0:34:13 I mean, hey, maybe that octave might be one to play with because that one I believe has APIs that you can use and you can actually give it like this sort of description of whether you want it to sound angry or happy or whatever.
0:34:18 So, I mean, that might actually be a really cool one to test with like game development and stuff.
0:34:18 Yeah.
0:34:20 They even have sound effects too.
0:34:22 A lot of people don’t realize that, but you can actually generate sound effects as well.
0:34:23 There’s a whole section.
0:34:24 Oh, yeah.
0:34:24 Yeah.
0:34:25 And 11 labs.
0:34:25 Yeah.
0:34:25 Yeah.
0:34:26 Yeah.
0:34:27 Cool.
0:34:30 I’m wondering, is there any other rabbit holes we want to go down on voice?
0:34:37 I feel like we’ve really sort of hammered this one and played with all the new toys that are out there and sort of seen what they’re capable of.
0:34:40 But I’m not quite sure where else we could go on this topic for now.
0:34:49 I know we’re probably going to have Amar back from 11 labs in a future episode, which we’ll probably dive even deeper into AI voice and what is capable of when we have him back on.
0:34:52 But I feel like we covered a lot of ground in this episode.
0:34:53 I’ve been using Whisperflow.
0:34:59 I feel like that one’s hard to go really deep on because, you know, you’ve got Whisper, which is already really good.
0:35:05 That’s OpenAI’s open source model where you can give it a bunch of audio or video and it will transcribe it.
0:35:12 You’ve got Assembly AI has a really, really good one that’s supposedly the most accurate up until the 11 labs one.
0:35:18 And now you have the 11 labs one, which is supposedly the most accurate now passing Assembly AI.
0:35:20 You know, AWS has their own version.
0:35:22 I believe Google has their own version.
0:35:29 But the big differences between all of the speech-to-text models is just, like, the percentage of accuracy, right?
0:35:32 It’s like, this one is 90% accurate.
0:35:34 Now this one’s 92% accurate.
0:35:37 Now this one’s 98% accurate, right?
0:35:42 And it’s just, like, it’s hard to really demonstrate the variations between them.
0:35:44 The biggest difference is they’re just getting more accurate.
0:35:46 Right, right.
0:35:48 So I’ve been using one called Whisperflow.
0:35:52 And actually, I don’t know, it’s not an OpenAI product, but maybe it is using their API underneath the hood.
0:35:53 I’m not actually sure.
0:35:56 Yeah, so Whisperflow does use OpenAI’s Whisper underneath.
0:35:59 So yeah, it’s just, it’s basically a wrapper on their API.
0:36:01 But that’s why I’ve been using, like, you know, I injured my hand.
0:36:02 It’s getting better now.
0:36:03 It’s like 80% better.
0:36:12 You know, probably, like, the ramifications of being a person who’s on my computer typing or playing games, you know, 80% of every day for over 40 years.
0:36:13 But it’s nice.
0:36:16 I mean, like, what you do is, like, you set up, like, one hotkey.
0:36:17 And so I have, like, one hotkey.
0:36:17 It’s, like, on my Mac.
0:36:19 It’s a function on PC.
0:36:21 For some reason, I think I had to have, like, two buttons.
0:36:24 I had to, like, end up doing, like, you know, I think it was control and Windows key or something.
0:36:27 But I just press that, and then I just talk.
0:36:29 And then everything I say, it turns it into text, you know.
0:36:33 And you can do this for tweets, you can do this for prompting LLMs.
0:36:35 People have been talking about, like, vibe coding.
0:36:41 That’s really, that’s part of what they’re talking about when they say vibe coding is not just using cursor to create things, but the fact that you’re just talking to it.
0:36:44 Talking to it, and the words turn into code.
0:36:50 Well, if you go back and listen to the episode or watch the episode that we did with Riley Brown where we actually coded up an app with him.
0:36:51 Yeah.
0:36:51 I don’t know.
0:36:55 I think it was Whisperflow that he was using, but he was doing that, right?
0:36:59 He was, whenever we were talking about, like, let’s add this feature, let’s add that feature, let’s add this.
0:37:06 He was just pressing a button on his keyboard and just speaking out what he wanted it to code for him, and it would go and do it.
0:37:08 I’m fairly certain it was Whisperflow.
0:37:10 If it wasn’t, it was something very, very similar.
0:37:11 I’m pretty sure it was.
0:37:13 That’s where I learned about it from.
0:37:14 I was like, I remember that from the podcast.
0:37:17 And when I had the intro, I was like, oh, I need to actually use this now.
0:37:18 Yeah, yeah, yeah.
0:37:19 It’s good.
0:37:22 I mean, I used to use Dragon Naturally Speaking.
0:37:23 I don’t know if you remember that one.
0:37:27 And it was never very accurate, but that’s what it was designed for.
0:37:28 And there was always a delay.
0:37:30 You would talk, and then, like, it would think.
0:37:34 And then, like, you know, 10 seconds later, you’d see your text, like, populate on the screen.
0:37:34 Yeah.
0:37:40 Whisperflow, like, if you talk fast and, like, you know, I think I do talk fast, naturally, if I don’t, like, slow myself down.
0:37:43 He told me I was, like, top 1% of users in terms of speed of talking.
0:37:45 I was like, oh, shit.
0:37:47 I was like, oh, crap.
0:37:49 But it still picks it up, mostly?
0:37:49 Mostly.
0:37:49 Yeah.
0:37:53 I do find myself trying to slow myself down and not talk as fast.
0:37:54 And it’s funny.
0:37:59 There’s times where it’ll get a word wrong, and it’s definitely, like, not a word I said incorrectly, which, of course, I do sometimes, you know.
0:38:02 But when that’s not the case, it’ll still sometimes get the word wrong.
0:38:07 And it’s kind of funny when you paste that into Grok, and Grok will be like, oh, I assume you meant so-and-so, you know.
0:38:10 But that’s kind of funny what you said, or that was a funny joke.
0:38:11 It’s like, what?
0:38:14 That’s just Grok being sassy with you.
0:38:15 Yeah, yeah.
0:38:17 Grok’s like, I like that, but it’s kind of funny.
0:38:18 Well, cool.
0:38:18 Yeah.
0:38:23 I think, you know, there’s actually some good use cases that people listening to this episode could go use this for.
0:38:27 There’s some great tools out there for turning articles and blog posts into audio.
0:38:35 There’s some great tools out there, like Whisperflow, for turning, you know, just you speaking into text or prompting or vibe coding.
0:38:45 You’ve got all of these various speech-to-text models, where if you want to get your videos transcribed or your podcast transcribed, those things are getting better and better now.
0:38:52 I don’t know if you saw this, but Google Drive is just going to start transcribing videos that you toss into Google Drive the same way YouTube does.
0:39:06 So, like, if you start throwing videos into Google Drive, they’re going to make all of your video content in Google Drive searchable because it’s going to automatically transcribe all videos that you toss in there just to make it easier to search out and find the exact videos you’re looking for.
0:39:13 So, like, some of those features are just going to start getting built into some of the tools that you’re already using, which I think is pretty cool as well.
0:39:13 Yeah.
0:39:25 I mean, this is kind of, like, off of the topic of AI voice, but the rumors now are that the next GPT, whether it’s GPT-5 or it’s just an improvement on 4.5, is going to have the ability to, like, view videos and understand what’s in the videos.
0:39:29 As well as Sam Altman also teased that there’s a dramatic upgrade to Dolly coming.
0:39:29 Yeah.
0:39:31 He said you’re going to be thrilled with Joy or something.
0:39:33 He said something like that about what’s coming soon.
0:39:40 So, I think you’re going to see, like, all of these are going to get way better in terms of understanding images, video, and…
0:39:42 Yeah, Gemini’s already really good at it.
0:39:49 That app that I was showing you and Matt Berman on our previous episode, it was using Gemini behind the scenes because Gemini can actually watch videos.
0:39:52 It only sees the video at one frame per second.
0:39:57 So, if you plug in, like, a 60 frame per second video, it’s only sort of capturing it one every second.
0:40:06 But it can pick up on what’s going on in the video, and it basically watches videos inside of my app and describes what’s going on to make them searchable.
0:40:07 Yeah.
0:40:12 So, I would be really shocked if OpenAI doesn’t roll that into one of their next models.
0:40:13 Yeah.
0:40:14 Yeah.
0:40:14 Google’s cooking.
0:40:17 People don’t give them enough credit, but, like, they are doing a lot of amazing work.
0:40:26 And, you know, I think you commented on it that recently, like, Elon Musk responded to one of my tweets and, like, he kind of, like, framed it as, like, the actual battle is XAI versus Google.
0:40:29 He, like, left out OpenAI when he responded to me, and I thought that was hilarious.
0:40:31 But it could end up being right eventually.
0:40:32 Who knows?
0:40:34 Because, I mean, Google just keeps coming out with new stuff.
0:40:41 I saw stuff, like, yesterday of, like, there’s new stuff in science that Google’s rolling out using AI in science, and apparently scientists are blown away by it.
0:40:43 And so, like, Google’s doing good stuff.
0:40:43 Yeah.
0:40:45 They’ve got some cool stuff coming out.
0:40:46 And who knows?
0:40:47 Maybe that’s a future episode.
0:40:51 Maybe we’ll do a whole episode about all the crazy stuff that Google’s been rolling out lately.
0:40:51 Yeah.
0:40:56 You know, I think people tune in to podcasts like this to sort of hear where our heads are at.
0:40:58 And what sort of things are on our mind right now in the world of AI.
0:41:02 And if, you know, that’s where our heads are, that’s what’s going to come out.
0:41:04 Yeah, it makes you go crazy.
0:41:05 There’s, like, way too many things to pay attention to.
0:41:08 And so my head’s constantly bouncing around all of them.
0:41:09 Like, oh, what does this one mean?
0:41:10 How does this connect?
0:41:10 You know?
0:41:10 Yeah.
0:41:13 And I mean, the pace of updates is just crazy.
0:41:16 Every day, there’s something that got a huge upgrade.
0:41:19 And so, you know, that’s why podcasts like this exist.
0:41:22 That’s why YouTube channels like my other channel exist, right?
0:41:26 There’s so much happening that anybody that wants to stay in the loop, well, that’s what
0:41:27 we’re making this show for.
0:41:31 So if you’re not subscribed already, make sure you subscribe on YouTube.
0:41:35 If you prefer listening to audio versions, we’re available wherever you listen to podcasts,
0:41:37 Spotify, iTunes, all the rest.
0:41:44 We might go crazy around AI sometimes, but ideally, you don’t have to because you tune into podcasts
0:41:44 like this.
0:41:45 Yeah.
0:41:47 Remember, subscribing helps save the world as well.
0:41:49 Yeah, subscribing also helps save the world.
0:41:54 And it’s going to keep Nathan from sending scary robots to your house.
0:41:55 So make sure you’re subscribed.
0:41:59 And thank you so much, everybody, for tuning in.
0:42:01 Hopefully, we’ll see you in the next one.
0:42:02 See you.
0:42:02 See you.
0:42:19 you

Episode 49: How close are we to living in a world where AI voices sound indistinguishable from humans? Matt Wolfe (https://x.com/mreflow) and Nathan Lands (https://x.com/NathanLands) delve into this cutting-edge technology.

In this episode, the hosts explore groundbreaking AI voice technology, from tools like Sesame to Hume’s Octave Text-to-Speech. You’ll hear live demonstrations, learn about the practical applications and imaginative possibilities for AI voices in business and personal use, and even discuss the societal implications of these rapidly evolving technologies. Are we on the brink of preferring robotic companionship over human interaction?

Check out The Next Wave YouTube Channel if you want to see Matt and Nathan on screen: https://lnk.to/thenextwavepd

Show Notes:

  • (00:00) Actionable Tech Insights & Tools
  • (05:57) Sesame’s Persistent and Eerie Traits
  • (06:33) AI Chatbots: Balancing Realism and Clarity
  • (11:55) Voice Imitation Glitching
  • (14:55) Generational Shift: Introversion and Gaming
  • (18:05) Robots: Future Elderly Companions
  • (21:49) Octave: Emotion-Sensing Text-to-Speech
  • (26:14) Emotion-Sensing Voice Technology
  • (28:31) Natural-Sounding Article Narration
  • (31:51) Natural vs. AI Speech Variations
  • (34:23) Exploring AI Voice Innovations
  • (38:17) Advancements in Transcription Technology
  • (40:13) Google’s Innovative AI Endeavors

Mentions:

Get the guide to build your own Custom GPT: https://clickhubspot.com/tnw

Check Out Matt’s Stuff:

• Future Tools – https://futuretools.beehiiv.com/

• Blog – https://www.mattwolfe.com/

• YouTube- https://www.youtube.com/@mreflow

Check Out Nathan’s Stuff:

The Next Wave is a HubSpot Original Podcast // Brought to you by The HubSpot Podcast Network // Production by Darren Clarke // Editing by Ezra Bakker Trupiano

Leave a Comment