Gemini 2.0: 20x Cheaper Than GPT-4?! (DEEP DIVE) | Logan Kilpatrick

AI transcript
0:00:02 (upbeat music)
0:00:06 – Hey, welcome back to the Next Wave Podcast.
0:00:07 I’m Matt Wolf.
0:00:08 I’m here with Nathan Lanz.
0:00:11 Today we’re joined by Logan Kilpatrick,
0:00:15 who is the Senior Project Manager over at Google DeepMind.
0:00:17 And the day we’re recording this episode
0:00:19 is the same day that Google just released
0:00:22 a whole bunch of new AI tools.
0:00:26 Gemini Flash 2.0, Gemini Flash 2.0 Lite,
0:00:30 Gemini Flash 2.0 Pro, all sorts of really, really cool stuff
0:00:32 coming out of Google right now
0:00:34 and Logan’s gonna break it all down.
0:00:36 And you’re gonna get a pretty grand overview
0:00:39 of where the AI world is headed according to Google.
0:00:41 So let’s just go ahead and dive right in
0:00:43 with Logan Kilpatrick.
0:00:44 – Thank you so much for joining us.
0:00:46 It’s probably a really busy day.
0:00:47 So I really appreciate you taking the time
0:00:48 to join us today.
0:00:50 – Yeah, I’m excited to catch up with you both
0:00:51 and talk about all things Gemini
0:00:53 and what’s happening in the AI world.
0:00:55 – Well, so this is actually your second time on the show.
0:00:58 So we’ve already kind of dove into some of the backstory
0:01:00 and introduced people to you in a past episode.
0:01:02 So let’s just jump straight into it.
0:01:06 Can you break down what is 2.0 Flash, 2.0 Flash Lite,
0:01:08 2.0 Pro, like what are the differences?
0:01:10 What’s better about these models
0:01:12 than what was out prior to them?
0:01:14 – I think this is an exciting moment for us
0:01:16 just because of like the amount of effort and work
0:01:18 that’s gone into bringing Gemini 2.0
0:01:19 actually into the world.
0:01:21 And Matt, you were there, Nathan,
0:01:23 I don’t remember if you were there at I/O last year,
0:01:27 but we announced 1.5 Flash and the long context
0:01:29 and a bunch of this other stuff last May.
0:01:30 So like literally less than a year ago.
0:01:33 And for the last year-ish, Flash has been this
0:01:36 like wild success story for us
0:01:38 of building a model that developers really love.
0:01:41 And a lot of that is rooted in like the right trade-offs
0:01:44 of like cost, intelligence, performance, capabilities.
0:01:46 And if you look at the 1.5 Flash model
0:01:48 and you think about like, how do we do this better?
0:01:50 It’s like, you have to make it more powerful.
0:01:52 You have to give it more capabilities.
0:01:55 You have to do that all while not making it cost
0:01:57 a lot more money for developers.
0:02:00 And it feels like we pulled a rabbit out of the hat
0:02:01 to a certain extent with 2.0 Flash
0:02:04 because like the actual cost for developers,
0:02:06 it was historically like seven and a half cents
0:02:08 per million tokens.
0:02:09 Now it’s 10 cents.
0:02:11 So the blended cost is actually less for this model.
0:02:13 And we did all that while like,
0:02:15 this model is actually better than Pro.
0:02:16 It has all these capabilities.
0:02:18 It’s natively agentic.
0:02:21 It has search built into it as code execution built into it.
0:02:23 – Yeah, it’s just exciting for me as a developer.
0:02:27 Like I think ultimately you remove the cost barriers
0:02:29 and all these other things for people to build
0:02:30 really cool stuff.
0:02:31 And like that’s what enables the world
0:02:33 to make really cool products.
0:02:34 So I’m super excited.
0:02:36 So that’s the sort of headline for Flash
0:02:37 is better, faster, cheaper,
0:02:40 which continues to be sort of the tagline.
0:02:41 I need to get my Gemini t-shirts
0:02:43 that say better, faster, cheaper on them.
0:02:46 – I read that it’s better than GPT-40,
0:02:47 but like 20 times cheaper.
0:02:49 Is that like roughly correct?
0:02:51 – Yeah, which is just crazy to me.
0:02:53 And I think like we’ve got a lot of work to do.
0:02:56 I think one of the dimensions of the Gemini story
0:02:58 is like we continue to put out really great models.
0:02:59 I think we need to do a great job as well
0:03:01 of like going and telling the world
0:03:03 about this technology that we’re building.
0:03:05 ‘Cause I don’t think people really understand
0:03:07 and actually for a lot of developers,
0:03:08 the cost is the reason in many cases
0:03:10 they don’t build the stuff that they want to build.
0:03:13 It’s like, I can’t afford to put this thing into production.
0:03:14 It’s too expensive.
0:03:17 So I think Flash is really important in that story.
0:03:18 But we also landed 2.0 Pro.
0:03:22 We also landed an even cheaper version of Flash Light,
0:03:23 which sort of has the capabilities
0:03:24 pared down a little bit,
0:03:26 but makes it so that we can keep delivering
0:03:30 on that like frontier cost performance sort of trade often.
0:03:31 And that model is in preview
0:03:33 and it’ll go GA in the next few weeks
0:03:35 as we iron out the last few bugs.
0:03:38 And I think Pro gives me a lot of excitement
0:03:43 about those whole narrative of pretraining, being dead.
0:03:45 An interesting sort of realization
0:03:47 I had after our conversation with Jack Ray,
0:03:49 who’s one of the co-leads for the reasoning models
0:03:53 in DeepMind is there’s this non-linear amount
0:03:57 of extra effort it takes to make the models
0:03:59 continue to get better.
0:04:00 Like you look at like, okay,
0:04:03 what does 3% mean on some benchmark?
0:04:06 Like you think 3% and we think of like the normal world
0:04:10 where 3% is like actually 3% and like in the model world,
0:04:14 3% is actually like a 25% increase
0:04:16 in like the amount of efforts that went into this.
0:04:19 But also that 3% is like the difference
0:04:22 between unlocking a bunch of capabilities
0:04:25 and a bunch of use cases that like just didn’t work before
0:04:28 because like the thing failing 3% to 4% of the time
0:04:31 versus not is the difference between you putting AI
0:04:33 into production at your company and not.
0:04:35 So it actually matters a lot.
0:04:36 And that’s why I think we continue to push
0:04:37 on that frontier.
0:04:41 – So what’s the big difference between like these new models
0:04:44 and the last ones as far as like how they were created,
0:04:46 is it like more parameters that are being trained on?
0:04:49 Obviously, the big narrative like you just mentioned
0:04:52 with things like deep seek and O3 and things like that
0:04:55 from open AI are the sort of what happens at inference, right?
0:04:58 When somebody enters a prompt, it does all of this thinking
0:05:00 and that’s really what they’re sort of like pushing on
0:05:02 is like the next sort of breakthrough.
0:05:03 Like what sort of breakthroughs,
0:05:06 what changed between the last models and this one
0:05:07 to make this one so much better?
0:05:09 – Yeah, there’s two dimensions of this.
0:05:12 One, it’s a story of the really difficult work
0:05:14 of doing algorithmic improvements and breakthroughs.
0:05:16 And I think like the team at DeepMind
0:05:17 stuff way beyond my understanding
0:05:20 as far as how they’re able to make this continue to work.
0:05:23 So I think there’s like core fundamental research
0:05:24 advancements that are happening.
0:05:26 And there’s a lot of like data efficiency wins as well,
0:05:27 which is also exciting.
0:05:30 But as far as like new capabilities of these models,
0:05:34 I think the two big ones is when Gemini was first announced,
0:05:36 it was announced as this model that’s natively multimodal.
0:05:39 And it was natively multimodal in the sort of input sense
0:05:42 that it could really understand the videos,
0:05:45 audios, images that it was being given.
0:05:47 And that was one of the main differentiators.
0:05:50 Today, the model’s actually capable of doing that
0:05:51 except on the output sense,
0:05:53 which I think was a huge jump for us.
0:05:54 And it actually requires again,
0:05:58 a bunch of like non-trivial amount of engineering work
0:06:00 in order to make the models capable of doing that.
0:06:02 I had an interesting conversation a few weeks ago
0:06:04 with someone on our research team,
0:06:05 which reminded me of this.
0:06:07 Someone asked a question of like,
0:06:09 why does it matter if the models are capable
0:06:12 of natively outputting these multimodal capabilities?
0:06:14 Like we have really great text-to-speech models.
0:06:16 We have great speech-to-text models.
0:06:18 We have great image generation models.
0:06:20 Like why is it cool that the model can do this natively?
0:06:23 And there’s all of these really great examples,
0:06:26 like a calculator versus like, I don’t know,
0:06:29 an AI model that has access to code execution.
0:06:31 Like the code execution version can really like solve
0:06:33 these problems in a really complicated way
0:06:36 that you wouldn’t otherwise be able to,
0:06:38 or at least that the effort is required
0:06:40 as you on the user of the models.
0:06:43 And I think that’s the world of these custom domain-specific
0:06:45 models, like image generation and audio generation,
0:06:48 versus the native capability really feels like
0:06:50 the model can just do the heavy lifting for you,
0:06:52 which is really interesting.
0:06:55 – So right now, can Gemini actually output like an image
0:06:57 if I give it a prompt to generate an image?
0:06:59 Does it generate an image right now?
0:07:00 – Not accessible to everyone yet.
0:07:01 And I think this is the gap.
0:07:03 So we have it internally and folks are using it
0:07:05 in our early access program
0:07:07 and we should get you both early access
0:07:08 to play around with it and test it out.
0:07:11 And we’ll roll it out more broadly soon,
0:07:12 which I’m excited about.
0:07:15 But that sort of same line of thinking
0:07:17 is what takes us to like native tool use as well.
0:07:19 And like native tool use is available to everyone.
0:07:22 And it’s like the model was trained,
0:07:25 knowing how to differentiate questions
0:07:27 that it should go and search the internet for
0:07:28 or questions that it needs to use a tool
0:07:30 like code execution for.
0:07:33 So you get like all of those like silly examples
0:07:34 where the model would be like,
0:07:36 let me try to solve this math problem,
0:07:38 which I know I’m not going to be able to solve
0:07:40 just because you asked me to with code execution,
0:07:42 like it knows it needs to use that tool.
0:07:44 And there’s a whole bunch of verticals
0:07:45 where like the performance goes up
0:07:46 significantly because of that.
0:07:49 – So Gemini is actually generating the image.
0:07:51 It’s not going and calling upon like,
0:07:53 imagine three to generate the image.
0:07:55 It’s actually Gemini who’s creating that image
0:07:57 when it does generate an image.
0:07:57 – Exactly.
0:07:59 And I’ll push on getting you both access
0:08:00 after this conversation
0:08:02 because I think the world knowledge piece
0:08:04 really highlights like why this matters.
0:08:06 And there’s like a bunch of examples
0:08:09 that I played around with of like pictures of a room
0:08:11 and like having the image change
0:08:15 based on these like really complex nuanced prompts
0:08:16 around moving objects in certain ways.
0:08:19 Like things that if it doesn’t have world knowledge
0:08:21 and understand like it understands physics
0:08:22 and understands all these things
0:08:25 that again require the world knowledge piece.
0:08:28 And I think it’s actually there’s some interesting trends
0:08:30 of what is the outcome of like being able
0:08:32 to take other domain specific models
0:08:35 and bring them into these LLMs that have world knowledge.
0:08:38 I think there’ll be some really cool capabilities
0:08:39 that like we’re not thinking of today
0:08:41 that this is going to enable,
0:08:43 which yeah, it gets me excited for people.
0:08:45 – Yeah, it seems like that’s kind of required
0:08:46 for this also to work in like everything
0:08:48 from like gaming to robotics
0:08:50 to it actually have an understanding of the world.
0:08:52 – Oh yeah, 100%.
0:08:54 – Yeah, yeah, I mean, I actually had the opportunity
0:08:57 to go out to London, go visit the DeepMind offices
0:08:59 and got to play with Astra on the phone.
0:09:02 And I mean, that was like the first taste I got
0:09:07 of like an actual useful AI assistant out in the real world.
0:09:10 And I don’t know if that was using Gemini 2.0 Flash
0:09:13 or if that was actually using the full Gemini 2.0 yet
0:09:17 at the time, but it was definitely a very, very impressive model
0:09:20 to actually see the tool use in real life
0:09:21 and just be able to walk around
0:09:24 and it understand images and understand video
0:09:26 and understand audio and understand text.
0:09:29 And it was all built from the ground up
0:09:33 to understand that stuff as opposed to like look at an image,
0:09:36 use OCR to figure out the text on the image
0:09:38 and then pull in the text or listen to the audio,
0:09:41 transcribe the audio to text and then use the text.
0:09:43 It’s actually understanding what it’s seeing,
0:09:46 what it’s hearing, which I think is like one
0:09:50 of the major differentiators about like what Gemini is doing
0:09:52 that you don’t see the other models doing yet.
0:09:54 So it’s super, super impressive.
0:09:56 – Yeah, I think the other piece of this is
0:09:58 other than just the raw capabilities
0:09:59 from a complexity standpoint,
0:10:01 it means that like developers building stuff,
0:10:04 you don’t have to go and like do a ton of scaffolding work
0:10:05 in order to like make this happen.
0:10:07 It’s like the overall complexity of your application
0:10:10 when the model is just able to sort of take in a bunch
0:10:12 of things and put out a bunch of things
0:10:13 makes life incredibly easy
0:10:16 and you don’t have to deal with like frameworks
0:10:17 on frameworks on frameworks.
0:10:19 So much of the agent world is like,
0:10:21 hey, the models actually aren’t that good
0:10:22 at doing some of these things.
0:10:24 And like the way that we supplant that
0:10:27 is by like building a bunch of scaffolding and frameworks,
0:10:29 that’s where a lot of developers are focused today.
0:10:30 And actually there’s going to be a moment
0:10:31 where like all of a sudden the model capabilities
0:10:33 are just like good enough that it kind of works.
0:10:35 And then people are going to be like,
0:10:36 well, why do I have all this scaffolding
0:10:38 that’s doing these things for me
0:10:40 that the models can just do out of the box now?
0:10:42 So it’ll be interesting to see how that plays out.
0:10:45 – Yeah, I know like 2025 is going to be sort of
0:10:46 the year of the agent, right?
0:10:49 That term has already been thrown around quite a bit.
0:10:50 But I also feel like everybody kind of has
0:10:52 a different definition of an agent.
0:10:54 You know, some people will look at something
0:10:56 that like you can build over on make.com or Zapier
0:10:58 where it’s tying different tools together
0:11:00 using APIs as an agent.
0:11:02 But I’m curious, does Google and DeepMind,
0:11:04 do they actually have like an internal definition
0:11:06 of an agent that they’re shooting for?
0:11:08 Do you think we actually have agents now
0:11:09 based on their definition?
0:11:11 Where does Google stand on agents?
0:11:14 – I actually don’t know what our like formal definition
0:11:15 of agents are.
0:11:18 I have tried a bunch of the agent products
0:11:20 and historically haven’t been super impressed
0:11:21 at what they’re capable of.
0:11:22 I think we’re just not there yet.
0:11:25 The thing that I want and I think a lot of users
0:11:28 want this as well is just the models to be proactive.
0:11:32 And like all of the products of today that build on AI
0:11:35 require me to basically change my workflows
0:11:38 or put in extra work or put in extra effort
0:11:41 like through this sort of guys of like,
0:11:42 oh, this is actually going to save you time
0:11:43 if you do this thing.
0:11:45 And like really what I want is like the models
0:11:47 to just be like looking at the stuff
0:11:48 that I give them access to
0:11:50 and like coming up with ways to be useful
0:11:51 and like save me time.
0:11:54 And like, yeah, it’s going to get some things wrong
0:11:56 but like I don’t want to have to be the one
0:11:57 in the driver’s seat all the time.
0:11:59 And it feels like today’s AI agents,
0:12:01 again, because the models aren’t good enough
0:12:03 like have to be proactive.
0:12:05 Like it requires the proactiveness of the human.
0:12:08 And I think once that role reversal switches,
0:12:11 I think that’s where we see like billion agents scale
0:12:14 deployments like all of a sudden just like happening
0:12:14 and working.
0:12:17 And this is also where like things get crazy
0:12:17 again with compute.
0:12:19 Cause again, like actually if you look at
0:12:21 how compute is being used today,
0:12:23 it’s in a lot of cases like this one to one correlation
0:12:27 between like a human input and the token output.
0:12:30 And I think the future is thousands and thousands
0:12:32 of X more usage of AI happening
0:12:34 by the agents themselves than by humans,
0:12:36 which will be fascinating to see play out.
0:12:40 – Hey, we’ll be right back to the show.
0:12:42 But first I want to talk about another podcast
0:12:43 I know you’re going to love.
0:12:45 It’s called entrepreneurs on fire.
0:12:46 And it’s hosted by John Lee Dumas
0:12:49 available now on the HubSpot podcast network.
0:12:51 Entrepreneurs on fire stokes inspiration
0:12:54 and share strategies to fire up your entrepreneurial journey
0:12:56 and create the life you’ve always dreamed of.
0:12:59 The show is jam packed with unlimited energy,
0:13:00 value and consistency.
0:13:02 And really, you know, if you like fast-paced
0:13:04 and packed with value stories
0:13:07 and you love entrepreneurship, this is the show for you.
0:13:09 And recently they had a great episode
0:13:12 about how women are taking over remote sales
0:13:13 with Brooke Triplett.
0:13:15 It was a fantastic episode.
0:13:16 I learned a ton.
0:13:17 I highly suggest you check out the show.
0:13:19 So listen to entrepreneurs on fire
0:13:21 wherever you get your podcasts.
0:13:26 – Yeah, no, I think what you described
0:13:28 is sort of what I envision of an agent
0:13:29 is almost like predictive.
0:13:32 Like it sort of figures out what you need before you need it
0:13:34 and then make suggestions based on that.
0:13:36 So I think that’s a world that I’m really excited
0:13:37 to get into.
0:13:39 But I do want to touch on the word compute
0:13:41 that you just mentioned for a second
0:13:43 ’cause obviously there was a bit of like a, you know
0:13:45 a freak out, so to speak in the US
0:13:47 when that deep seek model came out
0:13:49 and everybody thought that, well,
0:13:52 this deep seek model uses a lot less compute
0:13:55 than these other models that have been trained.
0:13:57 So therefore, you know,
0:13:59 NVIDIA GPUs are no longer necessary.
0:14:02 And then we saw NVIDIA sort of lose some market share
0:14:03 as a result of it.
0:14:05 But I’m just curious, like what are your thoughts
0:14:06 on the compute?
0:14:07 Because I saw all of that happening
0:14:09 and I thought it was so bizarre.
0:14:11 I was like, this seems like a bullish sign
0:14:13 for NVIDIA to me, not a bearish sign.
0:14:15 Like what’s going on here?
0:14:16 But I’m curious, like what’s your take on
0:14:17 what happened there?
0:14:20 – Yeah, there’s a lot of complexity to that story
0:14:22 and parts of the story that I don’t want to touch on
0:14:25 but the thing that I do want to touch on is like
0:14:26 if you look around the world,
0:14:28 I think this example that I just gave of like
0:14:31 who is in the driver’s seat of using AI today?
0:14:34 And like today it’s humans that are in the driver’s seat
0:14:36 and like we’re just inherently bounded
0:14:38 by the amount of humans who are using AI
0:14:39 because, you know, it just takes a while
0:14:42 for technology to assimilate into culture
0:14:44 and real use cases and all that stuff.
0:14:46 You know, we’re on this exponential right now.
0:14:48 I think as soon as agents start to take off
0:14:51 that exponential becomes a straight line up into the right.
0:14:54 Like I think it’s gonna be pretty profound
0:14:56 because like again, the challenge is
0:14:59 the human process just doesn’t scale
0:15:02 and the agent process is going to scale
0:15:03 which is going to be really interesting.
0:15:05 Like I have 10,000 emails I haven’t read
0:15:06 in the last three months.
0:15:07 – Right, I was thinking emails.
0:15:08 First thing I was thinking of,
0:15:10 I wanted to handle all of that for me.
0:15:11 I don’t want to think about any of it.
0:15:12 – Yeah, it’s gonna be wonderful.
0:15:14 And really like go in and find the things
0:15:16 like I know there’s things that I should be doing
0:15:18 that would create value.
0:15:19 – Missed opportunities.
0:15:20 – 100%.
0:15:22 And there’s like so many of those things
0:15:24 where if you also like think about like
0:15:26 what’s the economic value of all that work?
0:15:29 Like one of the crazy frames of mind
0:15:31 that I look at the world through is like,
0:15:33 you look at the world and like the world is just filled
0:15:34 with all this inefficiency.
0:15:36 And like it’s beautiful in many ways,
0:15:39 but like it’s also this really cool opportunity
0:15:41 if you can be the one to create something
0:15:43 that sort of makes people more productive
0:15:46 and explicitly makes them more productive
0:15:47 at maybe the things they don’t want to do.
0:15:49 And I feel like that’s the other part
0:15:51 of this agent story and this compute story,
0:15:53 which is a lot of the products
0:15:55 that I see people building are actually going
0:15:59 after the things that people really like doing.
0:16:02 And like maybe like shopping is this sort of
0:16:03 tongue-in-cheek example of this.
0:16:05 ‘Cause some people really like shopping
0:16:05 and some people don’t.
0:16:07 But like if I told my girlfriend
0:16:09 that we could never go shopping again together
0:16:13 and we could never go try out different experiences
0:16:15 and go check out the vibe of different stores,
0:16:17 like there’s so much of this is like
0:16:19 such a fundamental part of the human condition actually
0:16:22 of like going and seeing these different places
0:16:24 and like it’s really baked into who we are.
0:16:27 And that’s such like a traditional example
0:16:29 of like what agents are going to do for people.
0:16:30 And it’s odd to me.
0:16:33 I feel like people have like kind of misconstrued
0:16:35 what the value creation is going to be
0:16:36 in some of these examples.
0:16:37 – I mean, I think that’s like the bias
0:16:39 from like a, you know, Silicon Valley nerds
0:16:40 building this stuff, right?
0:16:41 – Yeah, yeah.
0:16:42 – I don’t want to shop.
0:16:43 I just want to be automated for me.
0:16:46 And it’s like, that’s just like a small subject
0:16:46 of the world.
0:16:47 Like, you know, when I shop,
0:16:49 I just like go in and get what I want.
0:16:50 There’s like two options.
0:16:51 Okay, I like this one better.
0:16:51 I get it.
0:16:54 But my wife, she just loves checking out stores
0:16:55 and different shops.
0:16:56 And like she would hate the idea of like skipping
0:16:58 all of that would just make no sense to her at all.
0:17:01 – Yeah, I think there’s an underlying story here,
0:17:02 which is like, first of all,
0:17:03 there’s a lot of variance in human preference,
0:17:07 but also there’s like ways about going about a certain task
0:17:09 that like make it interesting or not.
0:17:10 And like, I like shopping too,
0:17:13 if it’s like me getting to do it on the terms that I want.
0:17:15 And like, it will be interesting to see like,
0:17:17 how can agents and some of these products
0:17:19 like actually help create that experience?
0:17:21 And this goes back to this deep seek narrative
0:17:23 around like the value creation
0:17:24 happening at the application layer.
0:17:26 And it really does feel like this is true.
0:17:28 Like if you look back two years ago,
0:17:29 the narrative was, you know,
0:17:32 all these companies are just rappers on top of AI.
0:17:33 There’s no value creation.
0:17:35 All the creation of the value creations in the tokens,
0:17:38 like don’t spend your time thinking about these companies.
0:17:41 And it’s so funny how quickly this like flip-flops
0:17:43 back and forth between like,
0:17:45 now all the value creation is at the application layer
0:17:48 and like LLMs are this commodity thing
0:17:49 that no one should think about.
0:17:51 I enjoy watching it all play out.
0:17:51 It’s fun.
0:17:52 – So how do you think it’s gonna play out?
0:17:53 Because like right now,
0:17:55 Google’s kind of focused on developers, right?
0:17:57 More than consumers, correct?
0:17:58 – So on one hand,
0:18:01 like I spend all my time thinking about developer stuff.
0:18:02 Google’s got a ton of other people
0:18:04 who are doing consumer stuff.
0:18:05 I think a good example of this is like,
0:18:10 Gemini is both in search and through the Gemini app,
0:18:13 like, you know, deployed across like billion user scale.
0:18:15 Like there are literally billions of people
0:18:18 who are interacting with outputs and the models themselves,
0:18:20 which is crazy to think about.
0:18:22 And like it’s a very consumer forward use case
0:18:23 for those folks.
0:18:25 And I think it’s also like still incredibly early
0:18:27 for Gemini in search.
0:18:29 And there’s some interesting stories around that stuff.
0:18:32 But I mean, I personally am incredibly bullish
0:18:35 on the infrastructure layer and like the infra tooling.
0:18:38 And I think like actually a good example of this.
0:18:41 And you see this in some of Sam’s recent tweets
0:18:44 about sort of open AI and the pro subscription
0:18:46 are sort of good examples of this,
0:18:48 which is builders at the application layer
0:18:52 have a lot of tension with building more AI.
0:18:53 And like actually back to the thread
0:18:56 of having AI be more proactive.
0:18:58 This is why I believe in flash so much.
0:19:00 And I believe in the direction that we’re going
0:19:02 as far as like reducing the cost for developers
0:19:04 while continuing to push the performance frontier
0:19:07 is because the story of AI to me is a story
0:19:09 of like the actual infrastructure,
0:19:11 incentivizing developers to not use it.
0:19:14 Like you literally have an economic incentive
0:19:16 not to use AI because it costs you money.
0:19:18 And like the more AI you build into your product,
0:19:19 the more expensive it is.
0:19:21 And like the more margin pressure you have
0:19:23 as an application builder.
0:19:24 – It’s scary too, right?
0:19:25 To build an app and like all of a sudden
0:19:27 you get like a gigantic bill
0:19:28 because people are using your thing
0:19:30 and you haven’t figured out how to properly monetize it yet.
0:19:31 As you know, someone who creates companies,
0:19:33 it’s kind of intimidating.
0:19:33 – Yeah, 100%.
0:19:35 That is like the realist reaction
0:19:38 and like also just like the truest reaction developers
0:19:40 have to the cost of the technology.
0:19:43 So back to the point of like where the value creation happens.
0:19:46 I think the nice thing for infrastructure providers
0:19:48 is you have a fixed margin.
0:19:51 So like, you know exactly how much money you’re gonna make
0:19:54 by providing some infrastructure at the application layer.
0:19:56 You’re like, you’re constantly incentivized
0:20:00 to almost like not add additional stuff.
0:20:01 And I think this has been the story
0:20:03 for like the chat GBT plus and pro subscription
0:20:06 is like they built a subscription for $20 a month
0:20:08 and they realized, hey, we actually can’t give people
0:20:09 all of these things anymore.
0:20:11 We have to make something different.
0:20:13 And like even at $200 a month,
0:20:16 it’s like not a break even scenario for them yet.
0:20:19 So it’s super interesting to see that play out
0:20:21 and it’s a lot of food for thought
0:20:22 for people who are building stuff to be like,
0:20:25 are there new economic incentive mechanisms
0:20:28 that you can create as you’re building a product
0:20:31 more so than just like charging a $20 a month subscription.
0:20:33 And like the one example I can think of of this
0:20:35 that has made me think is,
0:20:37 I don’t know if you all are familiar with OpenRouter,
0:20:40 but it’s a surface that lets you sort of swap
0:20:41 in and out different language models.
0:20:44 OpenRouter’s product leaderboard,
0:20:46 I’m pretty sure they give you discounts on tokens
0:20:49 and stuff like that for showing you in certain ways
0:20:51 and some metadata passing back and forth
0:20:53 so they can understand like how people are generally
0:20:54 using AI models.
0:20:57 So like some interesting things like that.
0:21:00 Alex Atala, who’s the CEO who previously worked on OpenSea,
0:21:03 he has this quote, which always rings in my head,
0:21:06 which is like usage is the ultimate benchmark.
0:21:08 How many people are using your model or your thing
0:21:10 is like the proof point of success,
0:21:11 not all these other benchmarks
0:21:12 that people are chasing after.
0:21:14 So super interesting platform.
0:21:15 – Do they actually publish like a leaderboard
0:21:17 similar to like what LMSYS does?
0:21:18 – Exactly. – Okay.
0:21:21 – Checking it out right now at openrouter.ai
0:21:23 and it’s a forward slash rankings.
0:21:24 – Oh, cool.
0:21:26 I’m curious, a little bit of a topic shift here,
0:21:29 but I know you’re a proponent of open source
0:21:31 and Google obviously has their Gemma models.
0:21:34 Are there any updates, any idea of what’s going on with Gemma
0:21:35 and what we can expect next
0:21:37 out of the open source side of things?
0:21:39 – Yeah, I think that this is also the piece
0:21:42 that makes me excited about what we’re doing at Google,
0:21:45 which is it really is the exact same research
0:21:46 that powers the Gemma and I models
0:21:48 that ends up making the Gemma models.
0:21:53 And Gemma two was, I think it’s like the second most downloaded
0:21:55 open source model in existence, which is awesome to see.
0:21:57 And Gemma three is definitely going to happen.
0:21:59 I think the timeline is soon.
0:22:02 So you’ll hear more and y’all should do an episode
0:22:03 with some of the Gemma folks
0:22:04 ’cause there’s lots of cool stuff coming.
0:22:05 – Yeah, for sure.
0:22:07 – They’ve been doing a lot of like interesting fine tunes
0:22:08 for different use cases.
0:22:10 Like they have a version for RAG,
0:22:12 they have a version for vision.
0:22:16 I think they’re probably gonna do some agent stuff as well.
0:22:18 So like there’s lots of really cool explorations happening
0:22:20 on making those open models.
0:22:21 – Super cool.
0:22:23 I wanna talk about Imagine three,
0:22:26 which I just learned is actually pronounced Imagine three.
0:22:27 Like I’ve been calling it Imagine.
0:22:29 I always thought it was like Image Generator.
0:22:32 So like Image Gen, but I heard you pronounce it Imagine.
0:22:34 So now I’ll start saying it that way.
0:22:37 But you mentioned that there’s some updates with that as well.
0:22:38 What can you tell us about that?
0:22:39 – Yeah, and you’re in good company.
0:22:42 Don’t worry, I swear, like 50% of the meetings I’m in,
0:22:44 I hear Imagine, 50% I hear Image Gen.
0:22:47 So yeah, there’s no conclusive answer.
0:22:48 I think it’s Imagine,
0:22:50 but someone should correct me if that’s not the case.
0:22:54 So we released the Imagine three model
0:22:57 across a couple of services late December,
0:22:58 and then have been doing a bunch of work
0:23:00 in the last few months to bring that to developers.
0:23:03 So Imagine three should be available to developers
0:23:07 in the API and is the frontier Image Generation model
0:23:10 across like quality and a bunch of like human ranking
0:23:12 benchmarks, which as an aside comment,
0:23:15 it’s super interesting that if you look at text models,
0:23:18 I think one of the reasons the world has so much success
0:23:20 hill climbing on making models better
0:23:22 is like there’s a definitive source of truth
0:23:24 in some of the tasks that the models can perform.
0:23:26 I think with Image models, it’s like actually not the case.
0:23:28 It’s really hard to eval
0:23:30 and like you actually need humans in the loop
0:23:31 to do a lot of those evals,
0:23:34 or it’s like artistic and stylistic stuff
0:23:37 that’s like hard to put a finger on
0:23:38 which of these two things are better.
0:23:40 So a lot of those evals use human Raiders
0:23:41 and human benchmarks.
0:23:43 So there’s some degree of error,
0:23:46 but yeah, it’s been exciting to see like the models available
0:23:47 in the Gemini app.
0:23:50 It’s available to enterprise customers
0:23:51 and now it’ll be available to developers
0:23:52 to build with, which is awesome.
0:23:55 I think this like gen media future
0:23:56 is going to be super exciting
0:23:59 and VO hopefully sometime in the future as well.
0:24:01 – Yeah, yeah, VO is awesome.
0:24:02 I have access to it, early access to it.
0:24:04 It’s super fun to play with.
0:24:06 Is the best way to use the Imagine 3
0:24:07 inside of the image effects?
0:24:09 Is that still kind of like the easiest way
0:24:11 for a consumer to just go and play around with it?
0:24:12 – Exactly.
0:24:13 I think it’s also available for free
0:24:16 to folks in the Gemini app.
0:24:17 I think if you ask to generate an image,
0:24:19 it’ll just do it through the Gemini app,
0:24:22 but ImageFX gives you a little bit more controllability
0:24:23 and stuff like that.
0:24:24 So there’s a few more like features
0:24:26 that are built into ImageFX.
0:24:28 So that’s definitely a place that it’s publicly available.
0:24:29 – Super cool.
0:24:32 Yeah, I know you and Nathan sort of before we hit record,
0:24:34 we’re nerding out a little bit about, you know,
0:24:37 the whole like sort of text application concept.
0:24:38 He wished there was a way
0:24:40 that you could have Unity open on the screen
0:24:42 and then actually have an AI sort of like assist you
0:24:44 with like where to click and what to do next
0:24:46 while you’re building a game in Unity.
0:24:48 And I think both you and I in Unison went,
0:24:51 “You can do that in AI studio right now.”
0:24:52 So I thought it might be kind of cool
0:24:55 to pick up where we left off on that conversation
0:24:57 and talk about some of the cool stuff
0:24:59 that’s available inside of AI studio
0:25:02 that maybe a lot of people don’t even realize exists
0:25:04 and probably definitely don’t realize
0:25:06 you could use most of it for free still right now too.
0:25:07 – Yeah, cool.
0:25:08 So everything in AI studio is free,
0:25:10 which I don’t think people realize.
0:25:12 Like the entire product experience,
0:25:14 there is no paid version of it.
0:25:15 There’s a paid version of the API,
0:25:17 which hopefully developers can scale with
0:25:18 and do all that fun stuff.
0:25:22 But all of our latest models end up in AI studio for free,
0:25:24 including the experience that powers
0:25:27 this like real-time multimodal live experience,
0:25:29 which if folks haven’t played around with it,
0:25:33 aistudio.com/live lets you do things
0:25:36 like share your screen or show your camera
0:25:38 and ask all these different questions
0:25:39 and interact with the model.
0:25:40 There’s a bunch of different voices.
0:25:43 There’s a bunch of different modalities to choose from.
0:25:45 But back to the conversation of like,
0:25:46 what will agents look like?
0:25:48 What do we want out of agents?
0:25:49 One of the limitations for agents
0:25:51 is you have to build all this scaffolding
0:25:54 for the agent to be able to see the things that you do.
0:25:55 Like to see my email and my text
0:25:57 and my et cetera, et cetera,
0:25:59 my personal laptop and my work laptop
0:26:01 or my phone and my watch,
0:26:03 like incredible amount of work to make that happen.
0:26:05 Except if you have a camera,
0:26:07 all of a sudden all of it just works.
0:26:10 And like you can sort of make the determination
0:26:12 of being able to show like the information you want to show
0:26:15 and share the stuff that you want to share.
0:26:18 It’s definitely more of a showcase of what’s possible.
0:26:20 And that’s why we put it in the API
0:26:22 because like, we don’t have the answers ultimately.
0:26:24 Like developers should go and build these products.
0:26:25 But I do think Matt,
0:26:27 you mentioned the Astra experience earlier.
0:26:29 I think like the multimodal live API
0:26:32 sort of gets to what Astra does at the core,
0:26:36 which is like be able to be this co-presence with tools.
0:26:38 And again, through a simple API, which is really exciting.
0:26:40 I think the piece that the multimodal live API
0:26:42 doesn’t have that will build,
0:26:44 that I think the Astra experience did have
0:26:46 is this notion of memory,
0:26:48 which again is like critical for agents.
0:26:51 Like I don’t want agents to just forget that,
0:26:54 I prefer sitting in window seats instead of aisle seats
0:26:55 or whatever it is.
0:26:57 Like you want all that context to be retained
0:26:59 as agents are making decisions for you in the future.
0:27:02 And I think that’s gonna require this sort of memory layer
0:27:04 which we’re working on building, which is exciting.
0:27:05 – Yeah, yeah.
0:27:07 And I mean, project Astra even sort of remembered
0:27:10 between sessions too, that like I was in London, right?
0:27:12 So I was talking to it while in London,
0:27:16 asking for restaurants to go check out things like that,
0:27:19 close that session, started another session later on.
0:27:22 And it remembered that I was over in London,
0:27:24 it remembered all the previous conversations.
0:27:27 So it wasn’t just like memory in the terms that like,
0:27:29 you can plug into the custom instructions on open AI
0:27:32 and it’ll remember your name and stuff like that.
0:27:35 It was actually like remembering the past conversations
0:27:37 and bringing in that as additional context,
0:27:38 which I thought was really cool
0:27:40 ’cause that’s really helpful to just like remember
0:27:41 the past conversations you had.
0:27:43 – I think this is an infrastructure problem.
0:27:45 And I think we didn’t talk about this explicitly,
0:27:47 but like one of the other narratives
0:27:49 over the last like year and a half has been like,
0:27:51 not enough AI in production.
0:27:54 You know, it’s kind of this demo toy thing
0:27:55 and no one really uses it.
0:27:58 I think a lot of this is ’cause it’s just like taking a while
0:28:00 for companies to build the infrastructure
0:28:02 to actually put AI into production.
0:28:04 And I think memory is this example of,
0:28:05 there aren’t a bunch of companies
0:28:07 that are like building this memory as a service.
0:28:09 And if you are building this, like let’s talk,
0:28:11 I’d love to hear about it and hear about what you’re building,
0:28:14 but I think there’s a lot of opportunity still to be built
0:28:17 around that, around memory, like as a service for folks.
0:28:18 You could also start to think about like,
0:28:21 there’s so many interesting ways to explore this.
0:28:23 Like where does all your personal context
0:28:24 already live today?
0:28:26 Like how does that, whoever that provider is,
0:28:30 plug into the world of where all the other memory services
0:28:30 are going to be.
0:28:32 So I think there’s a lot of like really, really interesting
0:28:36 directions that need to be built for memory specifically.
0:28:37 – Logan, I’m curious.
0:28:39 So you were talking about earlier the models
0:28:40 that a lot of people don’t realize and all of it’s free,
0:28:41 like an AI studio.
0:28:44 Like why do you guys hide it in AI studio?
0:28:46 Recently, I talked to a bunch of different people
0:28:48 about DeepSeq and they were talking about how amazed
0:28:49 they were by it.
0:28:52 And I was like, yeah, but like you can get the same stuff
0:28:55 but better for free on AI studio right now.
0:28:57 And they didn’t know.
0:28:58 And it’s like, there’s a lot of people who don’t know.
0:29:00 And so I was like, you got to communicate that better
0:29:02 somehow or like, I think you guys should have like,
0:29:05 you know, its own website or something like outside of Google,
0:29:06 like a new product where you guys just like,
0:29:08 hey, here’s the new frontier
0:29:10 and here’s what we’re pushing and Google is still there
0:29:12 and it uses some of the tech, but we have a new thing.
0:29:14 And that’s that personal opinion, but you know.
0:29:15 – Yeah, yeah.
0:29:17 No, I think you’re spot on and for what it’s worth.
0:29:20 Like I think we get this feedback pretty consistently.
0:29:23 I think some of this is a factor of just like the state
0:29:27 of the world and the challenges that we have as a product.
0:29:30 Like I think in one hand, like we are a developer platform.
0:29:32 Like we’re not building the front door
0:29:33 for Google’s AI surfaces.
0:29:36 Like that’s not the product that I’m signed up to build.
0:29:37 That’s not the product that like we’re sort of
0:29:38 directionally building towards.
0:29:40 We’re really focused on like,
0:29:43 how do we enable builders to get the latest AI technology?
0:29:47 The Gemini app formerly barred is the sort of front door
0:29:49 to Google’s AI technology.
0:29:51 And I think from a consumer standpoint
0:29:52 and also from an enterprise standpoint
0:29:55 and like workspace and other places,
0:29:57 there’s all this interesting organizational work
0:29:59 that’s happened at Google over the last couple of years.
0:30:02 I think like one of the cool stories is like operationalizing
0:30:05 Google DeepMind from doing this sort of foundational research
0:30:07 to being an organization that builds
0:30:09 the world’s strongest generative AI models
0:30:11 and like actually delivers those to the world
0:30:12 sort of as a product.
0:30:15 And then now bringing the product surfaces
0:30:18 that are the front line of delivering those to the world,
0:30:21 the Gemini app Google AI studio into DeepMind
0:30:23 so that we can sort of continue to accelerate.
0:30:24 Like all of those things to me are like
0:30:26 directly the right stuff for us to do.
0:30:27 I agree with you.
0:30:28 I think we need to put the models
0:30:30 in front of the world as soon as possible.
0:30:33 And I think having a single place to do that makes sense.
0:30:34 And it should probably be the Gemini app
0:30:37 that probably shouldn’t be AI studio.
0:30:39 But at the same time I say that like
0:30:40 we also want to be a surface
0:30:42 to sort of showcase what’s possible.
0:30:44 So there’s a lot of like tension points
0:30:46 but I do think, I fundamentally do think
0:30:47 we’re going to get there.
0:30:49 The Gemini app is moving super quickly
0:30:50 to like get the latest models.
0:30:52 Like they just shipped 2.0 flash.
0:30:54 They shipped 2.0 Pro today.
0:30:55 They shipped the thinking model today.
0:30:57 So I think that Delta between the Gemini app
0:30:59 and AI studio is sort of going away.
0:31:01 Which yeah, I’m excited about
0:31:03 because like the consistent feedback
0:31:04 is people don’t like that Delta.
0:31:06 They want to have a single place to go to
0:31:07 to sort of see the future.
0:31:09 – I always saw the AI studio is sort of like
0:31:12 the playground to test what the APIs are capable of.
0:31:15 In the same way like open AI has their open AI playground
0:31:18 and you can kind of go and mess with some of the settings
0:31:20 and see what the output will look like
0:31:21 before using it in your own APIs.
0:31:24 That’s kind of how I always saw the AI studio.
0:31:26 Because like once you get into it
0:31:28 if you’re not very technically inclined
0:31:29 you might get a little overwhelmed
0:31:32 seeing things like what models should I be using?
0:31:34 What is the temperature?
0:31:35 You know, things like that.
0:31:36 People aren’t necessarily going to know
0:31:37 like how to play with that.
0:31:39 Like what should I set my token limit to?
0:31:40 Things like that.
0:31:42 I don’t really feel like general consumers
0:31:43 want to mess with.
0:31:45 They just want to go to a chat bot
0:31:46 and ask their question, right?
0:31:50 It feels very tailored towards developers to me.
0:31:52 And then the Gemini app feels like, all right
0:31:53 this is their front user interface
0:31:55 that they want the general public
0:31:57 to go be using at least.
0:31:58 – Yeah, I got a ping yesterday actually
0:32:01 from someone saying why is the chat experience
0:32:02 have all this stuff in it?
0:32:04 Like why are there all these settings and stuff?
0:32:06 And I was like, you’re in the wrong place.
0:32:08 Like Gemini app is the place that you need.
0:32:09 And then they responded right away on Twitter
0:32:12 and they’re like, yes, this is much better for me.
0:32:13 I don’t want to see all that complexity.
0:32:15 But I do think about this a lot
0:32:17 is like how do people actually show up?
0:32:19 How are they finding their way into these products?
0:32:22 I do think the Gemini app is like this very large front door.
0:32:24 So it tends to capture most of these folks.
0:32:27 Like it’s literally built into the Google app on iOS
0:32:28 and all this other stuff.
0:32:29 Versus like you actually kind of have to do
0:32:31 a little bit of searching to find AI studio
0:32:34 which probably makes sense in some cases.
0:32:35 Awesome.
0:32:36 – Logan, you said you were super excited
0:32:38 about text application.
0:32:39 You were talking about like Lovable, Bolt,
0:32:41 other companies like that.
0:32:42 Like what are you excited about in the space
0:32:44 and where do you think that kind of stuff is going?
0:32:48 – Yeah, I think just like being able to democratize access
0:32:51 to people building software and like creating things.
0:32:52 There’s a ton of people in my life
0:32:54 and I don’t live in the Bay Area.
0:32:56 So there’s a disproportionate amount of people
0:32:57 who aren’t in tech where I live.
0:33:00 But the proportion of like people with interesting ideas
0:33:01 I actually think is the same.
0:33:03 I think it’s just like the actual tools themselves
0:33:05 that they have to go and execute on those ideas
0:33:08 that I think is like much less distributed
0:33:10 in places outside the Bay Area, New York
0:33:11 and other places like that.
0:33:15 So I think this frontier of text to app creation
0:33:18 is gonna be so, so interesting to see play out.
0:33:19 And yeah, there’s a ton of companies
0:33:22 that are having like lots of like actual real early
0:33:24 commercial success and traction today.
0:33:28 Which I think, again, this is one of those examples
0:33:31 where like sometimes there’s use cases that don’t work
0:33:33 and then all of a sudden like the model quality
0:33:35 just gets good enough for you build the right
0:33:37 sort of couple of things from a product experience.
0:33:39 And then all of a sudden it clicks
0:33:40 and like now this thing is possible.
0:33:43 And to me it feels like text to app creation
0:33:46 like has had that moment and it’s now possible.
0:33:47 And I think it’ll take a while
0:33:50 and there’ll still be a bunch of other things to hill climb on.
0:33:52 But I think especially now with like reasoning models
0:33:54 and the ability for them to like keep thinking
0:33:57 and writing more code and doing all that work.
0:33:59 Like I think the complexity of the apps
0:34:01 is also going to continue to go up on this exponential.
0:34:03 So and actually replicate just how they’re launched.
0:34:05 I think today or yesterday of like this
0:34:07 a similar sort of product of text apps.
0:34:09 I think there’s more and more players showing up
0:34:10 in this space.
0:34:13 I would assume that like probably 50% of products
0:34:15 or something like that that are building with AI
0:34:17 have this type of experience.
0:34:19 And you could think about like, you know
0:34:20 how does that translate to someone
0:34:23 who’s doing something very, very domain specific?
0:34:25 I think there’s a lot of like companies
0:34:28 that try to build extension ecosystems or connectors
0:34:31 or like, you know, all these other like side cars
0:34:32 of their product.
0:34:34 You can imagine like you just let your users create those.
0:34:36 Like here’s the sort of generic set of APIs
0:34:39 that talk to, you know, your email client, for example.
0:34:42 And like here’s a text box and like go build
0:34:44 the sort of product experience you want.
0:34:45 Like it’s sort of in your hands.
0:34:48 And like that’s a crazy world that you could totally customize
0:34:49 it however you want to them.
0:34:50 Yeah, it doesn’t feel that far away.
0:34:52 My new email client that’s got like, you know
0:34:54 80s style video game stuff.
0:34:56 You know, it’s like a mixed end with the email client.
0:34:57 That’d be so cool.
0:34:58 You know, I’ve been loving that concept.
0:35:00 We’ve talked about this a couple of times on the show
0:35:03 of like, I’ve gotten in the habit now of like,
0:35:05 when I have like a little problem or a bottleneck
0:35:07 that I need to solve instead of going and like searching out
0:35:10 if there’s like a SaaS company that already exists
0:35:11 that has that product for me.
0:35:14 If it’s simple enough, I’ll just go prompt
0:35:15 that software into existence.
0:35:17 And I have like a little Python script
0:35:20 that runs on my computer to solve the problem for me, right?
0:35:22 Like I made a little script where I can input
0:35:24 my one minute short form videos into it.
0:35:26 It automatically transcribes them
0:35:27 and then cleans up the transcription
0:35:30 and adds like proper punctuation and stuff.
0:35:31 I created another mini app
0:35:33 where I can drag and drop any image file.
0:35:35 Doesn’t matter what type of file format it is.
0:35:37 It’ll convert it to a JPEG for me
0:35:39 so I can use it instead of like my video editing.
0:35:42 And these are probably softwares that exist.
0:35:43 I could go and hunt them down on the internet
0:35:46 and maybe pay five bucks a month to use them,
0:35:49 but I could just go use an AI tool,
0:35:50 prompt the tool that I need.
0:35:53 And 15 minutes later, I have something on my desktop
0:35:56 that I don’t need to go pay anybody else for anymore.
0:35:57 Maybe it connects to an API,
0:36:00 like my transcription one connects to the OpenAI whisper API.
0:36:03 So it is costing me like a penny every time I use it,
0:36:04 but so what?
0:36:06 I just love this concept of like
0:36:08 when I have a bottleneck in my business,
0:36:10 I can just go like prompt an app into existence
0:36:12 that solves that bottleneck.
0:36:16 – Yeah, I think that carried out one step farther
0:36:18 towards this like infinite app store
0:36:21 where like truly everyone is creating
0:36:24 and contributing to this thing and like remixing.
0:36:26 This is the stuff that gets me excited about the future
0:36:28 ’cause like there’s so much cool stuff to be created.
0:36:31 And really, I think that the lens of all of this is like,
0:36:33 how do you democratize access and make it
0:36:35 so that anyone can go and build this stuff?
0:36:36 And as someone who can program,
0:36:39 but also knows how painful it is in a lot of ways,
0:36:41 it’s just like so cool that more folks
0:36:43 are gonna be able to participate in that.
0:36:44 It’s gonna be awesome.
0:36:46 – Yeah, coincidentally, someone today
0:36:48 from my hometown of Alabama like messaged me like,
0:36:49 hey, I have this idea for an app.
0:36:50 I get this kind of stuff all the time.
0:36:52 I have an idea for an app and who can I hire
0:36:53 to build it and all that stuff?
0:36:55 And I’m like, I’m about to send him a link to like replit.
0:36:56 Have you like tried this yet?
0:36:58 You know, it’s like, just go try that.
0:37:01 And instead of paying someone $5,000,
0:37:03 that’s probably like a ton of money for him, right?
0:37:05 Instead, just go try replit and you know,
0:37:07 sign up for one month, cancel it if you don’t like that
0:37:10 to that and then just see what you can get.
0:37:11 And it’s gonna get better and better.
0:37:12 Like I’ve tried replit and all of them
0:37:13 and like they’re pretty good.
0:37:15 It feels like there’s something that’s like slightly missing,
0:37:17 but every time I check it, it’s better than the last time.
0:37:20 And it feels like probably within the next year or two,
0:37:22 you’re just gonna make any kind of software you want
0:37:23 just by talking.
0:37:25 It’s just, that’s gonna be such a magical moment.
0:37:26 Like in the early days of the internet,
0:37:27 the internet I feel like was more fun.
0:37:29 ‘Cause people, there’s all these like different websites
0:37:31 and different kinds of things or like you’d have Winamp
0:37:32 and you put a skin on your Winamp.
0:37:33 But there’s all these different things
0:37:35 in terms of customization that was happening more
0:37:36 than there is now on the internet.
0:37:38 And it feels like this kind of stuff might bring that back
0:37:39 where like, yeah, the internet,
0:37:42 you can kind of customize how you interact with the internet
0:37:44 through creating your own custom software with AI.
0:37:45 – Yeah, I was just thinking about
0:37:47 as you were describing like a fun internet,
0:37:49 I was thinking of my personal website,
0:37:51 which is like a blank HTML page.
0:37:53 And like there’s no styling or anything like that.
0:37:56 But like, if I didn’t have to shoulder the costs like,
0:37:58 and the LLM on someone’s computer,
0:37:59 I could just kind of like talk to and say,
0:38:01 like, you know, remix this site
0:38:02 and do it in all types of crazy ways.
0:38:06 Like that would be so fun of like every time someone shows up,
0:38:08 it’s a different product or it’s a different experience
0:38:09 to see this content.
0:38:11 And I think there’s a lot of interesting
0:38:12 threads to pull on that.
0:38:15 – Is there anything else happening at Google right now?
0:38:17 Any other things that you’re working on
0:38:19 that you’re allowed to talk about?
0:38:21 Is there any avenues we haven’t gone down
0:38:22 that you really wanna talk about
0:38:24 that you’re allowed to talk about, I guess?
0:38:26 – Yeah, I think the only other thread,
0:38:28 and we alluded to it a couple of times
0:38:29 is reasoning model stuff.
0:38:31 It feels like, and I tweeted this the other day,
0:38:33 it feels like the GBT2 era for these models.
0:38:36 Like there’s so much new capability
0:38:39 and so much progress being squeezed out of the models
0:38:40 in such a short time.
0:38:42 And we released our first reasoning model
0:38:42 back in December,
0:38:45 right after the Gemini 2.0 flash moment,
0:38:46 one month later,
0:38:49 like a normal like six months worth of progress,
0:38:51 honestly, on like a bunch of the benchmarks
0:38:53 that matter for this stuff.
0:38:54 We released an updated version
0:38:57 like January 21st, a couple of weeks ago.
0:38:58 And if you look at the chart,
0:39:02 it’s like literally linear progress up into the right
0:39:03 across a bunch of the end.
0:39:06 Like it’s just crazy to think that,
0:39:08 again, like a month ago, the narrative was like,
0:39:09 the models are hitting a wall,
0:39:11 there’s no more progress to be had.
0:39:14 And it’s funny like how much nuance
0:39:15 some of the conversation lacks
0:39:19 because these innovations are like deeply intertwined.
0:39:21 I was having a conversation earlier today
0:39:23 about like long context
0:39:25 and how long context is actually like
0:39:27 a fundamental enabler of the reasoning models
0:39:29 because like by themselves,
0:39:30 the long context innovation,
0:39:33 like the model’s okay at pulling out
0:39:34 certain pieces of information.
0:39:36 Like it can do needle in a haystack well,
0:39:39 it can find a couple of things and a million tokens,
0:39:41 but it’s really hard for the models to attend
0:39:43 to the context of like,
0:39:44 you know, find a hundred things
0:39:47 in this million token context window example.
0:39:49 Reasoning models are the unlock for this
0:39:50 because the reasoning models,
0:39:52 the model can really just like continue
0:39:53 to go through that process
0:39:55 and think through all the content
0:39:57 and like really do the due diligence.
0:40:00 And it’s almost uncanny how similar it is to like,
0:40:01 how would you go about this?
0:40:03 Like I couldn’t watch a two hour movie
0:40:06 and then if you quizzed me on a hundred random little things
0:40:08 as part of like, I’m not going to get those things, right?
0:40:10 Like it’s going to be really hard to do that.
0:40:11 But if you let me go through the movie
0:40:13 and you know, watching an eye movie
0:40:17 and like add little inserts and like clip things
0:40:17 and cut things and do all this.
0:40:19 So like I’d be able to find those things
0:40:21 if you asked me those questions again.
0:40:23 And it feels like that’s kind of what reasoning is doing
0:40:25 is actually being able to do that.
0:40:28 So I think we’re super early in this progress
0:40:29 and it’s going to be a lot of fun
0:40:32 to see both the progress continue for us.
0:40:35 But again, through this narrative of how all this innovation
0:40:38 trickles into the hands of people who are building stuff.
0:40:39 And like there’s going to be a ton of new products
0:40:42 that get built, like maybe text to app
0:40:44 just like it’s 10x better in the next year
0:40:45 because of reasoning models.
0:40:48 Like that’s possible, which is just crazy to think about.
0:40:49 – Yeah, yeah.
0:40:50 – But like the other models, they almost feel like they’re like
0:40:53 double checking triple checking themselves in real time.
0:40:56 It’ll be like sort of starting to give a response
0:40:58 and then be like, let me actually double check
0:40:58 what I just said.
0:41:02 And when it comes to coding, that seems like it’s the ideal
0:41:03 use case almost, right?
0:41:05 Cause it can almost look back at its code and be like,
0:41:07 oh, I think I made a mistake there.
0:41:09 And sort of continually fix its code
0:41:11 before it finally even gives you an output,
0:41:13 which I’ve just found to be really, really cool.
0:41:17 But also from what I understand, that’s where a lot of the cost
0:41:18 and the future is going to come in.
0:41:22 The cost of the inference to do all of this like analysis
0:41:25 in real time as it’s giving its output.
0:41:26 – Yeah, and the other thing to think about
0:41:29 which is interesting is we’re seeing all of this progress
0:41:33 with the reasoning models and they are doing
0:41:36 like the most naive version of thinking.
0:41:38 Like they really are like, if you were to think about
0:41:40 like the human example of this, like you’re sort of sitting
0:41:42 in a box like thinking to yourself.
0:41:44 Like you have no interaction with the outside world.
0:41:46 You’re not able to like test your hypothesis,
0:41:50 use a calculator, search the internet, any of those things.
0:41:52 And like you have to sort of form your thoughts
0:41:54 independent of the outside world.
0:41:55 And you imagine what starts to happen
0:41:57 when you give these things tools.
0:41:59 And like it really does feel like
0:42:01 that’s the agentic future that we’ve been promised
0:42:04 is like all of these tools in a sandbox interacting
0:42:06 with the model letting it sort of have that feedback loop
0:42:08 of trying things and seeing what doesn’t work.
0:42:10 So I couldn’t be more excited about that.
0:42:12 – Yeah, that’s really interesting to think about.
0:42:15 So like right now it’s just sort of thinking through things
0:42:17 and sort of double checking itself, but in the future
0:42:19 it could actually be working with other tools
0:42:21 that can also like assist in the double checking
0:42:25 and things like that and get even smarter in those ways.
0:42:26 – Yeah, and I think put a different way,
0:42:28 like to be more extreme, I think it has to do that.
0:42:30 Like I think the version of the future
0:42:31 that we’re going towards is like,
0:42:33 we’re not going to be able to see the progress
0:42:35 continue to scale unless the models can do that.
0:42:37 And again, this goes back to this thread of like,
0:42:39 there’s lots of hard problems to solve in the world,
0:42:41 like making it so the models can do that efficiently
0:42:45 and like securely and safely have that sort of sandbox
0:42:47 to do that type of thinking and work
0:42:48 is going to have to happen.
0:42:50 And it’s probably a lot of work that hasn’t been solved today,
0:42:53 which is interesting and opportunistic.
0:42:55 – Yeah, it’s crazy to think that like probably soon
0:42:57 like AI is going to be helping create all those tools as well.
0:42:59 So that’s when we’ll see things just go exponential.
0:43:00 – It already is.
0:43:02 – It’s like whether people with AI or AI itself
0:43:05 creating the tools and just move that back into the system.
0:43:06 And it’s going to be wild,
0:43:08 how fast things are going to get better.
0:43:10 – All the engineers being powered by cursor.
0:43:11 It’s crazy, like it’s happening today.
0:43:14 Like so many people are, I feel this way for myself,
0:43:15 like I write more software now than I did
0:43:17 when I was a software engineer
0:43:20 because I have AI tools
0:43:22 and I can do all this crazy stuff.
0:43:22 – Yeah.
0:43:24 – How far do you think we are from like AI
0:43:26 actually being able to update its own weights
0:43:28 based on conversation.
0:43:31 So it actually learns based on new input
0:43:33 that it gets through conversations that it has.
0:43:36 – I think in the small scale example sense,
0:43:39 you could probably already do this to a certain extent.
0:43:43 I think in like the like real frontier use cases,
0:43:44 probably far from that.
0:43:47 Some of the open AI operator stuff was talking about this
0:43:51 around like, you know, the need for having evals
0:43:53 of like basically like creating economic value,
0:43:57 like actually creating money and where we are in that.
0:43:58 And like you probably don’t want the models
0:44:00 to do things that have a high cost today
0:44:02 because if they get it wrong, it costs you a lot of money.
0:44:05 And training frontier models is definitely
0:44:07 on the list of things that would cost you a lot of money.
0:44:09 If you got that wrong, like you don’t want
0:44:12 a bunch of training that are just wasted compute.
0:44:14 Like that’s, you know, millions of dollars
0:44:15 of potential loss money.
0:44:17 So I think there’ll be a human in the driver’s seat
0:44:19 for those things for a while.
0:44:22 But I do think you can sort of accelerate this,
0:44:24 you know, small scale feedback loop.
0:44:26 And I think that’s why small models matter.
0:44:28 Like this like innovation that’s happening
0:44:31 of being able to compress the frontier capabilities
0:44:32 down into small models.
0:44:35 I think it enables that like rapid iteration loop
0:44:38 where maybe AI is more a co-pilot in that example.
0:44:39 – Gotcha.
0:44:42 Well, cool Logan, this has been absolutely amazing.
0:44:45 If people want to follow you, what’s the best platform
0:44:46 to pay attention to what you’re doing
0:44:49 and to keep up with what Google and DeepMind are up to?
0:44:51 – Yeah, yeah, I’m on Twitter, I’m on LinkedIn,
0:44:53 I’m on email.
0:44:57 So whichever one of those three is easiest to get ahold of me
0:45:00 would love to chat with folks about Gemini stuff or the like.
0:45:03 – Yeah, you’re pretty active over on X slash Twitter,
0:45:04 whatever you want to call it.
0:45:07 Whenever there’s a new like Google or DeepMind rollout,
0:45:08 you’re pretty much either tweeting about it
0:45:09 or retweeting about it.
0:45:11 So very, very good resource to keep up with
0:45:14 what’s going on in the world of AI with Google.
0:45:16 And Logan, thank you so much for hanging out again
0:45:17 with us today.
0:45:19 I’m sure we’ll have you back in the future if you want,
0:45:21 but this has been an absolutely fascinating conversation.
0:45:23 So thanks again for hanging out.
0:45:24 – Yeah, this is a ton of fun.
0:45:25 I’ll see you both at IO, I hope.
0:45:27 I think hopefully we’ll get the game back together
0:45:29 and we’ll spend time in person.
0:45:31 Hopefully at IO, it’s gonna be fun.
0:45:31 – We’d love to do it.
0:45:32 Thanks.
0:45:33 – Thank you.
0:45:36 (upbeat music)
0:45:38 (upbeat music)
0:45:41 (upbeat music)
0:45:43 (upbeat music)
0:45:46 you

Episode 45: How is Google shaping the future of AI with its new Gemini models? Matt Wolfe (https://x.com/mreflow) and Nathan Lands (https://x.com/NathanLands) are joined by Logan Kilpatrick (https://x.com/OfficialLoganK), Senior Project Manager at Google DeepMind.

In this episode, delve into the details of Google’s latest AI models, Gemini 2.0, Flash 2.0, and Pro versions, as Logan Kilpatrick breaks down the advancements and unique capabilities that set these models apart. They discuss the cost-efficiency that Gemini brings to the table, the concept of reasoning models, and how agents are paving the way for future AI applications. Whether you’re a developer or just intrigued by the progress in AI, this conversation offers insights into what Google’s innovations mean for the industry.

Check out The Next Wave YouTube Channel if you want to see Matt and Nathan on screen: https://lnk.to/thenextwavepd

Show Notes:

  • (00:00) Gemini 2.0 Launch Excitement
  • (03:18) Cheaper Flashlight Model Previewed
  • (08:50) Experiencing Gemini AI in London
  • (11:11) AI Agents: Need Proactive Models
  • (14:23) Embracing Inefficiency for Productivity
  • (17:09) AI Infrastructure and Consumer Impact
  • (21:31) Imagen 3 Model Update & Insights
  • (24:18) AI Studio: Free Multimodal Experience
  • (26:53) AI Production and Infrastructure Challenges
  • (31:56) Democratizing App Creation Tools
  • (34:09) DIY Software Solutions
  • (38:51) Reasoning Models Unlock Contextual Understanding
  • (42:45) AI Frontier: Risks and Costs
  • (44:12) AI Updates on Twitter

Mentions:

Get the guide to build your own Custom GPT: https://clickhubspot.com/tnw

Check Out Matt’s Stuff:

• Future Tools – https://futuretools.beehiiv.com/

• Blog – https://www.mattwolfe.com/

• YouTube- https://www.youtube.com/@mreflow

Check Out Nathan’s Stuff:

The Next Wave is a HubSpot Original Podcast // Brought to you by The HubSpot Podcast Network // Production by Darren Clarke // Editing by Ezra Bakker Trupiano

Leave a Comment