Google Cloud Live: From the Next ‘26 main stage to the terminal

Google Cloud Live: From the Next ‘26 main stage to the terminal

Google Cloud Tech

0:00 [MUSIC PLAYING] JASON DAVENPORT: Google Cloud friends and family,

9:06 welcome to the live stream.

9:08 I'm Jason Davenport, an Area Technical Lead here with Google,

9:11 coming to you live from the center of action here at Google Cloud Next.

9:15 Over the next few days, myself and Stephanie Wong,

9:18 who will be joining me as a host— please give her a warm

9:21 welcome for all the cool stuff that we're going to do.

9:24 I am so excited to be talking to you here

9:26 and to celebrate all the things that are happening here at Next,

9:29 and I'm hopeful that we'll have something for everyone.

9:32 Now, because we're coming to you live with all these things,

9:35 I just want to cover a few housekeeping things

9:36 for all those folks that are listening out there.

9:39 Open up that live chat right now.

9:41 Drop the city you're watching from.

9:43 I want to see how global we have this community today.

9:46 And just so you know,

9:47 we've got some moderators standing by and we're going to be

9:49 pulling up your comments and questions throughout to the show.

9:52 Also, a big thank you to our crew working behind all the work here.

9:57 It is super awesome, and this is a team effort for all of us to be successful.

10:01 Thank you to everyone who's working through this stuff.

10:05 All right.

10:06 Let's talk about some fun stuff.

10:07 We're going to have all sorts of sessions on this today,

10:10 including covering deep dives for things that are launching or evolving.

10:14 We'll also have some other cool briefs for all

10:16 the different topics that we're going to be covering.

10:18 Here's what we've got lined up for you just to get started.

10:21 First off, just after the keynotes,

10:23 we'll be meeting from the stage right here to talk about folks

10:27 and the cool things that we're breaking down and getting right into it.

10:30 We'll break down what this means tangibly for each of your tech stack pieces,

10:34 what those announcements mean for your daily workflows,

10:36 and all the things that you can be

10:38 doing for developers or ITDMs and business folks.

10:41 Second, let's talk about tech, because we're at a tech conference.

10:44 It's the most fun thing to do.

10:46 We're going to see some awesome demos from some

10:48 of the folks that we have coming here today.

10:50 And I am excited to see some of these things and how

10:53 people are using AI to truly create innovation for their customers.

10:57 And last, it wouldn't be a live stream without all the cool people.

11:00 There are going to be many excellent people coming through on this show.

11:04 We've lined up unscripted chats with Google Developer experts, Google engineers,

11:07 even product leaders that are bringing all of this change

11:11 to us for the fun things that we have.

11:15 So as we dive in, we'll be taking questions from you, from the live stream.

11:19 Please remember to be excellent to each other as a part of this process.

11:23 We're going to be doing a ton here over the next few days.

11:26 Stay tuned for some surprises and for some

11:28 other great moments that you'll have from us here.

11:31 Get your coffee mugs or other mugs filled and ready,

11:33 and let's kick this thing off.

11:35 I am so stoked.

11:43 [MUSIC PLAYING] JASON DAVENPORT: Hello, everyone.

15:27 Great to be here live with our first folks

15:30 of the afternoon here— or I guess it's morning.

15:32 I'm on a different time zone here than where we're at Google Cloud Next.

15:36 I'd like to welcome two great friends of mine.

15:39 Starting from the left, Dave Elliott.

15:41 I've worked a long time with Dave in Developer Relations.

15:44 And then Director of Product Management, Addy Osmani.

15:48 Addy and Dave, great to have you here.

15:50 DAVE ELLIOTT: Great to be here.

15:51 JASON DAVENPORT: All right.

15:53 Gemini Enterprise Agent platform.

15:55 We talked about it.

15:57 What is it?

15:58 ADDY OSMANI: So Agent platform is our end-to-end platform for building,

16:03 scaling, governing, and optimizing your agents.

16:07 And we think that one of the big challenges people have had over the last year

16:11 or two when they've been trying to build agents

16:13 is that it's very easy to build a prototype.

16:15 JASON DAVENPORT: It is.

16:16 ADDY OSMANI: It's very,

16:16 very difficult to turn that into something you can put in production reliably.

16:20 And what we've been hearing from customers is that, yeah,

16:23 I can build something, but then I have to worry about identity.

16:27 I have to worry about governance, I've got to worry about memory.

16:30 And they ended up having to string together

16:32 lots of different services to get things working.

16:35 We thought it could be a little bit simpler,

16:37 so that's what Agent platforms are trying to solve.

16:39 DAVE ELLIOTT: Yeah, it's an end to end platform for building those agents,

16:43 scaling them, governing them— JASON DAVENPORT: And optimizing them?

16:46 DAVE ELLIOTT: And optimizing them.

16:48 Exactly, yes.

16:49 JASON DAVENPORT: That is super cool.

16:50 So let's break this apart.

16:51 So let's start with build.

16:53 Because I know so many folks out there,

16:56 we can vibe code a nation in realistically under

17:01 probably five minutes with most of our tools today.

17:04 We've just released some new features though, an Agent Development Genkit.

17:07 So how are we making it easier for folks to move from that zero

17:12 to one stage to now one to production

17:14 with their agents and their agent workloads?

17:16 DAVE ELLIOTT: Yeah.

17:17 I think it's worthwhile mentioning the core

17:20 to building the agent is Agent Development,

17:23 ADK, hence my shirt representing here.

17:26 Agent Platform.

17:27 JASON DAVENPORT: It's a very great logo, by the way.

17:28 DAVE ELLIOTT: Yeah, it is a great logo.

17:29 We need a name for the logo, though.

17:31 We don't have one yet.

17:32 JASON DAVENPORT: Bot.

17:32 DAVE ELLIOTT: Bot?

17:33 Bot, eh.

17:34 JASON DAVENPORT: Bot's, like, 2019.

17:35 DAVE ELLIOTT: That's not very creative.

17:37 Yeah.

17:37 So I think the heart of building an agent is Agent Development.

17:40 It's our framework.

17:42 So for folks who are looking to quickly scale,

17:46 quickly build, and get their agent in practice,

17:49 we announced it last year at Next, here, rolled out,

17:53 and we support the four main languages, Python, Go, TypeScript, and Java.

18:00 And it's really the core to getting an agent quickly built.

18:05 JASON DAVENPORT: That's super cool.

18:06 And adding maybe to one of the things

18:08 that Thomas talked about is how ADK— sorry,

18:12 Agent Development Kit really helps with governed workflows.

18:17 So can you maybe break that down for us?

18:19 What's so special about Agent Development Kit now?

18:22 If you think about regulated workloads, you want non-determinism.

18:26 But you have to prove that the workflow

18:30 actually did all the things you needed to.

18:32 ADDY OSMANI: Yeah.

18:32 Well, there's a few different parts to our governed story.

18:35 The first one is that we've got a gateway.

18:39 We've got identity.

18:41 We've got a registry, and then we've got anomaly detection.

18:44 And really, what you want is for your agents to be traceable.

18:48 You want to be able to understand, well, what agent did this thing?

18:52 What is the log of all the actions that they took?

18:56 And up until now, it's been difficult

18:58 for people to assign unique identities to their agents.

19:01 What we now do is we can give

19:03 you cryptographically generated identities for each of your agents.

19:07 If you're a member of my team and I want you to be able to log into a service,

19:12 I'm going to be giving you some credentials.

19:14 I'm not going to be just letting you reuse some tokens,

19:16 reuse my account details very fuzzily.

19:20 I want it to be secure.

19:21 And so one of the other things that we do is we enable

19:24 your agents to also get secure

19:26 credentialed access to different services and systems.

19:28 So you're able to manage that in a way where

19:30 you can feel good that it's going to be secure,

19:33 and you've got an audit trail if anything goes wrong.

19:35 I think for a lot of businesses it's important.

19:37 DAVE ELLIOTT: Yes, it's critical.

19:38 I would point out a couple of things.

19:40 One is the govern pillar in agent platform is not really part of ADK.

19:46 ADK really is the core to building.

19:48 It could make sense.

19:49 There's so much in Agent Platform for us to show the one slide.

19:53 It's a little bit of an overwhelming— there we go.

19:57 The one slide that summarizes all the things that are in build,

20:00 scale, govern, and optimized.

20:01 So Addy was just talking about the gateway.

20:03 That's new.

20:04 Agent identity, agent registry, anomaly detection,

20:06 which I think is one of the hidden gems.

20:10 We're fortunate, across the govern pillar,

20:13 where we have a large engineering team that's

20:15 been working on these issues for a long time.

20:18 This is not an AI thing.

20:21 This is really a Cloud thing, or really just an enterprise thing.

20:24 And so we have lots of people working on this for a long time.

20:27 We're applying those engineering talents,

20:29 those engineering innovations now to agent building.

20:33 JASON DAVENPORT: And there's a lot going on on this slide.

20:35 So, Dave, maybe let's use this as an anchor point.

20:38 So we talk about— obviously we're mentioning govern,

20:41 which is a huge, huge thing here.

20:44 One of the things I'm passionate about actually is sessions and memory bank.

20:48 So what are we doing— if you think of agents,

20:50 an agent that doesn't understand what happened before the agent came

20:55 on or did last time and doesn't understand what happens after,

20:58 how can someone think about sessions and memory management— if they're thinking

21:03 about building an agent that can actually learn from its behavior over time?

21:07 DAVE ELLIOTT: Yeah.

21:08 I think, well— Addy, sorry, were you going to say something?

21:10 ADDY OSMANI: Oh, no.

21:11 DAVE ELLIOTT: Yeah, I think you're exactly right.

21:14 Memory is— I think about six months ago maybe, maybe a year ago,

21:19 memory became something that was a major

21:22 issue blocking agents not from performing,

21:26 but from performing at a level that people want, that people expect.

21:31 And so memory management across sessions and across

21:35 time is something that we rolled out.

21:38 It's now generally available.

21:40 We rolled out probably about six months ago.

21:42 And you're exactly right.

21:43 That makes the agents more enterprise ready, more reliable than anything else.

21:50 ADDY OSMANI: One of the really cool things that builds

21:52 on top of that is the idea of long running agents.

21:55 That's one of— I think that's one of the other

21:58 hidden gems that people will hear about at Cloud Next,

22:01 long running agents or agents that can run for not just a few hours,

22:04 but potentially days, a week,

22:07 and persistence is a really important part of that.

22:10 You don't want your agent that's running for a couple of days to forget midway

22:13 what it was doing or half of the work that it was like digging into.

22:18 And so we're excited that that's now a first class thing in Gemini Enterprise.

22:22 Folks can go and check that out too.

22:25 But I think that memory bank— if people want to create their own

22:28 solutions on top of that, memory bank is also a great solution,

22:31 like a good LEGO brick for it.

22:32 JASON DAVENPORT: Yeah.

22:33 What I love about memory bank is that, especially if I'm starting off,

22:37 I don't have to be an expert in memory.

22:39 Memory bank will actually handle a lot of the, hey,

22:42 this might be something that's interesting to store.

22:45 Let's store it and then see if we come back to it,

22:47 and it'll manage itself over time,

22:49 which I think, if you think of agentic behavior, is super cool for that.

22:55 Dave, another piece if we pull back up the chart here.

22:59 So the other thing that I think about

23:01 a lot actually— obviously there's the scaling portion with runtime,

23:05 which is super interesting.

23:07 Let's talk about optimization.

23:10 How are we thinking about agent evaluation?

23:14 DAVE ELLIOTT: Yeah, this whole— JASON DAVENPORT:

23:15 Other than we're thinking about it, right?

23:16 DAVE ELLIOTT: We're thinking about it.

23:17 No, this whole pillar is new.

23:18 And I think this is really the cutting

23:22 edge of thinking on getting agents to be productive.

23:27 When we talked about this, it's funny because we talk about optimize,

23:30 and there's multiple ways what optimize might mean.

23:33 It's optimizing for tokens because we're just in a global shortage of capacity.

23:40 But it's also optimizing to make sure that it performs as you would expect.

23:44 So agent evaluation, are you— answering the question.

23:49 Are you sure that it's behaving the way you'd like your agent to behave?

23:55 That's really what we're solving with agent eval.

23:59 I think also the ability to do simulation is critical.

24:05 And then by creating a dashboard, it gives us your ability to see all

24:09 of your agents across your entire enterprise is another key thing.

24:13 JASON DAVENPORT: That's super cool.

24:14 And Addy, maybe for those who are a little bit newer to agents,

24:18 how do you think about agent evaluation versus what we were

24:21 probably doing 12 or 18 months ago with just pure LLM evaluation?

24:25 What are you looking for if you think about

24:28 agent behavior over time to actually build a good eval?

24:31 ADDY OSMANI: So one of the things that's probably not

24:33 a big surprise to people is that LLMs are not really deterministic.

24:37 DAVE ELLIOTT: What?

24:38 ADDY OSMANI: Shocker.

24:40 JASON DAVENPORT: I'm surprised.

24:41 ADDY OSMANI: We're all shocked up here.

24:43 But because they're not deterministic,

24:44 that also applies to your agent story too.

24:47 And given that you're seeing people who are stringing multiple agents together,

24:51 you've got orchestrators, you've got these fleets of agents,

24:53 you've got lots of different agents that are working together.

24:56 That fact that they're non-deterministic can mean that there's a risk associated

25:01 with trying to deploy something that is in the critical path for a business.

25:06 And so it's really important to have an agent eval story that allows

25:10 you to have some level of guarantee that throughout that whole flow,

25:13 throughout that whole process, it's actually accomplishing that goal,

25:16 even if aspects of it are not quite as deterministic as you would like.

25:20 So I think that piece is critical.

25:22 JASON DAVENPORT: I think that's super cool.

25:23 The other thing that I'm actually really excited about— so

25:26 we'll hearken back to the Cloud Observability days— is agent tracing.

25:32 Like, actually building a graph of things that agents are doing.

25:36 Dave, from a developer's perspective, how are we making observability easier?

25:42 Because it's not deterministic.

25:44 I think we can agree on that part.

25:46 DAVE ELLIOTT: Yes.

25:46 JASON DAVENPORT: Are we going to agree on one thing?

25:47 DAVE ELLIOTT: We're going to agree.

25:48 That will be the only thing we agree on today.

25:49 JASON DAVENPORT: That's totally fine.

25:51 So how are we thinking about bringing

25:53 observability for the developers using this stuff?

25:55 DAVE ELLIOTT: Yeah.

25:56 I mean, it really is, like, can you see what's happening?

26:00 And by having a dashboard that stands in line,

26:05 that reports in line on your agents, I think that is what's going to give you

26:11 the confidence that the agents are performing as you would,

26:14 especially in a world where we have

26:17 these long running agents that Addy was talking about,

26:19 where we have agents that are really running autonomously,

26:25 and you have to have that confidence, because if things go off the rails,

26:31 you want to be able to have a way to understand what happened,

26:35 to go back and look where the logic broke down, and to be able to fix it.

26:40 JASON DAVENPORT: Yeah.

26:41 And I think, for me,

26:43 the observability piece is probably one of the most critical ones.

26:47 Dave, you mentioned another thing.

26:48 And Addy, maybe over to you.

26:51 I know agent sandboxes, those are becoming a very needed thing.

26:56 How are we making sandboxes easier for devs and for agents

27:00 that are actually just running over time for these tasks?

27:03 ADDY OSMANI: Yeah.

27:03 So I mentioned earlier that over the last two years,

27:06 people had to string together so many different services.

27:09 And one of those LEGO bricks is also a sandbox.

27:11 As we're increasingly giving agents this autonomy and we're

27:14 giving them access to different tools, different services,

27:16 the ability to go inside your company's data and your different

27:20 services and be able to do what they need to do.

27:24 I think that sandbox has become critical because you want to have a level

27:27 of guardrails around what the agent is able to do to limit the blast radius.

27:31 So we know that there are going to be agents

27:33 that need to be a little bit more powerful than others,

27:35 but that doesn't mean that we don't have guardrails in place so that we know,

27:39 well, it's not going to completely empty our bank account or anything like that.

27:44 DAVE ELLIOTT: This is especially true in a world where you expect the only way

27:48 that the agents can be effective is them

27:51 using tools and interacting and engaging with other agents.

27:55 So that becomes a riskier world.

27:57 It also is a world where agents become

27:59 more powerful because they can use all these.

28:01 So you need to have that protection.

28:02 You need to have that sandbox.

28:04 JASON DAVENPORT: I love that stuff,

28:05 in particular for coding agents where I just want a few tools.

28:09 I don't need all this stuff in the sandbox.

28:12 Here's the things.

28:14 You have a hammer, you have some nails, and you're going to build— DAVE ELLIOTT:

28:17 You don't have a machine gun or a nuclear bomb, right?

28:19 JASON DAVENPORT: Yeah, no.

28:19 Exactly, right?

28:20 But we have— like, we're going to build a birdhouse.

28:22 ADDY OSMANI: You're not bypassing all permissions by default?

28:24 JASON DAVENPORT: No.

28:25 Well, I hope not.

28:26 You never know.

28:28 That's what makes the live stream great.

28:31 So fun fact.

28:33 Fun thing.

28:34 What's one cool thing that you're seeing?

28:37 Maybe one to each of you.

28:38 What's the cool thing you've seen in the community someone build with AI?

28:43 And specifically an agent that they've built.

28:45 DAVE ELLIOTT: Yeah.

28:45 I think here on the floor we've got a bunch of really fun demos,

28:48 a couple I'll mention.

28:50 One is we have a brain computer interface engaging with attendees,

28:56 and they basically put on this strap

29:00 on their forehead and it reads the brain, your brainwaves.

29:03 JASON DAVENPORT: I've seen this one.

29:05 DAVE ELLIOTT: Yes.

29:05 JASON DAVENPORT: It's super cool.

29:06 DAVE ELLIOTT: Yeah.

29:07 But where it gets fun is we built an agent that engages with when you work,

29:13 and it tells you— it can tell— it doesn't read your mind,

29:18 but obviously it can tell how you're thinking or your emotions.

29:23 And so it can prioritize the work that you should

29:26 do and let you know when you need a break,

29:28 or maybe you should work on something that's

29:31 more fun versus— because you're starting to wander.

29:34 So that's a really cool one.

29:35 It's an agent that understands you,

29:37 and then it's an interface that's pretty cool.

29:41 JASON DAVENPORT: What have you seen that's cool recently?

29:44 ADDY OSMANI: So I think that one of the challenges we've all had is staying

29:48 on top of agentic coding constantly changing

29:51 or anything that's an AI that's constantly changing.

29:54 There's a project called 30 Days that I love,

29:56 and the basic idea is that it uses agents to go and take a look

30:00 at a lot of popular— like Reddit and Twitter

30:03 and a lot of popular social networks and forums,

30:06 and it will just get an idea of, hey,

30:08 over the last 30 days, what have been the things that have been viral,

30:11 what have been the things that are important that you should

30:13 keep an eye on and maybe dig into and read up more?

30:16 For someone that doesn't have a lot of time

30:18 but wants to stay on top of stuff, that's huge.

30:20 So I've loved that just because it's very practical.

30:23 JASON DAVENPORT: Yeah.

30:24 I think these are— one of the things that I love about

30:26 the agent space right now is if you think of Cloud transformation,

30:31 in my mind, and having lived it, it's really an efficiency play.

30:38 Hey, you didn't have enough capacity on-prem, but we can go to the Cloud.

30:42 We can build your workloads to scale faster.

30:44 You have all these other characteristics.

30:47 AI really is about business transformation at the end of the day,

30:51 even as a developer, sometimes— I don't care about the business that much.

30:54 I do have to know the business.

30:57 I have to know the business processes.

30:59 I have to know what are the rules that we need to apply,

31:03 and how do we think about AI as continuing

31:07 to change the developers role in this transformation?

31:10 DAVE ELLIOTT: We had a great

31:12 roundtable with the Google Developer experts yesterday,

31:13 and we talked about that.

31:15 We had that exact question come up.

31:16 What's the role of a developer in the future in the world of AI?

31:20 And my thought is that developers are problem solvers.

31:24 At the heart, they're problem solvers.

31:26 And we use tools, and the tools today are typical things that we

31:32 think of, programming languages and IDEs and libraries and things like that.

31:37 And I don't think that changes with AI.

31:41 What doesn't change is that they're problem solvers.

31:44 What does change are the tools that developers use.

31:48 So I think the mindset needs to shift a little bit,

31:50 but at the end of the day, I think we all want to solve problems.

31:54 We get a rush when we solve that problem

31:56 and we see other people getting benefits from what we built.

32:00 And now we'll get maybe more of a rush

32:02 more quickly because we have different tools.

32:04 ADDY OSMANI: I would completely agree with that.

32:06 There's this great quote from Grady Booch who says that the history

32:10 of software engineering is a history of a rising set of abstractions.

32:14 And I feel like that's exactly what's happening right now.

32:17 Our roles are possibly going to shift

32:19 to ones where we are managing fleets of agents.

32:22 And at the end of the day, someone's still going to be responsible for quality,

32:26 for making sure they're doing the right things,

32:28 for making sure that they're following good architecture,

32:32 design system principles, all of that stuff.

32:34 And so I think that if you have

32:36 a good grounding in those things, you'll be fine.

32:39 DAVE ELLIOTT: And this is exactly what Agent Platform's designed for.

32:43 It is for the building scaling, optimizing,

32:46 governing these agents that we think that as they're

32:50 deployed through Gemini Enterprise or through other platforms,

32:53 those are the things that can make a difference in your daily life.

32:58 So I mean, it's an exciting time to be a developer.

33:01 JASON DAVENPORT: It's super cool.

33:02 Maybe here we got about four minutes left.

33:04 One more question I'll throw out there,

33:07 and this is something I've been thinking about a lot.

33:10 Machine learning in the traditional sense stays the same, grows, or shrinks.

33:18 And then why?

33:20 And I have my own opinion, but I'm curious what yours is.

33:22 DAVE ELLIOTT: Well, I mean, machine learning is— it's math,

33:26 I mean, at the end of the day.

33:27 And so, I mean, that's not going to change.

33:29 There's still going to be— if anything, it's going to improve.

33:33 It's going to accelerate, rather,

33:34 because as you make as breakthroughs in research,

33:39 in academia happen, there's more ways to monetize

33:45 it and for people to get benefit.

33:47 So I think, if anything, it's going to accelerate.

33:50 JASON DAVENPORT: I'm scared if you and I are in agreement,

33:52 by the way, but maybe— DAVE ELLIOTT: That's twice this year.

33:56 ADDY OSMANI: Shockingly, I agree with everything that they've said.

33:59 And I think that with more and more people just becoming aware

34:04 of ML through the fact that they are increasingly having touch points with AI,

34:09 I think we're going to see more

34:10 people interested in getting involved in research.

34:12 I think that this space is only going to get bigger and bigger.

34:15 JASON DAVENPORT: I'm super excited personally.

34:19 I've built a lot of machine learning models, and it was hard.

34:23 You look at— I was building an image recognition model with Gemini's help,

34:27 probably two or three months ago.

34:29 And it's like, well, one, synthetic data used to be hard.

34:32 I can generate synthetic data now

34:34 that actually helps build an image recognition model.

34:38 DAVE ELLIOTT: Yeah.

34:39 ADDY OSMANI: It's wild.

34:40 DAVE ELLIOTT: I've been at Google and working in AI since 2013,

34:44 and I lived through the TensorFlow days.

34:46 And TensorFlow, we were really happy with TensorFlow because we said, look,

34:49 we built this for our own software developers to be able to do ML.

34:53 And it's so easy, even software developers can do it now.

34:56 Of course, anybody who's worked with TensorFlow

34:58 knows that it's not all that easy.

35:00 I mean, you really have to have— JASON DAVENPORT: Yeah.

35:03 DAVE ELLIOTT: You really had to put the work

35:04 in to become good at building and managing models, to train models.

35:10 And we've talked about this from the very,

35:12 very beginning, back, I think in 2018.

35:14 It's about democratizing AI.

35:16 And I think we're really at that stage now

35:19 in the last maybe six months where we have democratized AI.

35:24 Almost anyone can go and build something.

35:26 ADDY OSMANI: Strong agree there.

35:28 I can't remember another time in history where a research paper comes out,

35:32 you can go and check out the model on Hugging Face or Kaggle or whatever,

35:35 and people can just start playing around with it.

35:38 I can't remember that being as accessible.

35:40 It's kind of awesome.

35:41 JASON DAVENPORT: You think about the machine

35:43 setup that was required, like, in 2018.

35:45 ADDY OSMANI: Oh, yeah.

35:46 Yeah.

35:47 JASON DAVENPORT: Gave up.

35:48 DAVE ELLIOTT: I think a lot of people talked about this, but it's the PC era.

35:53 It's the internet.

35:54 It's the mobile phone.

35:55 Those flash points, those moments really

35:57 changed how people can engage with technology.

36:00 I think that that's what we're seeing.

36:02 And again, maybe it started three or four years ago,

36:04 but I really think it's accelerated in the last six months.

36:08 JASON DAVENPORT: Totally.

36:09 100% agree.

36:09 All right.

36:10 We've got about a minute left, so one last question.

36:14 What's the topic— DAVE ELLIOTT: This worries me.

36:18 JASON DAVENPORT: You're like, where are we going with this?

36:20 No, no.

36:21 Fun question.

36:22 Where are you headed to next at Cloud Next?

36:25 What are you most excited about doing here on day one?

36:30 DAVE ELLIOTT: So we have 20— my team's working on 21 demos out here.

36:35 And so the Agent Hack Zone's ability to go and put fingers on keyboard

36:40 with five different demos where you can go and do some code lab.

36:45 So going out and seeing those demos and seeing finally developers,

36:48 attendees engaging with those demos is what I'm most excited about.

36:52 ADDY OSMANI: Yeah.

36:53 I'm excited about the demos.

36:55 We're also announcing an AI agents challenge for startups today.

36:59 Very excited to check that out.

37:01 And speaking of long running agents,

37:03 I'm going to be giving a talk about that in just a little while.

37:06 JASON DAVENPORT: That's super awesome.

37:07 And I hope to be able to attend that.

37:09 You all mentioned a couple things.

37:11 We have a ton of code labs that are out there.

37:13 They're under Google Code Labs for Developers with the Next '26 flag.

37:18 We have a bunch there and a bunch of other ones.

37:21 Addy, you mentioned the builders challenge.

37:24 DAVE ELLIOTT: And check out Agent Platform.

37:26 It went live at 5:00 AM this morning.

37:28 There's a repo.

37:28 You can go and play with it.

37:30 JASON DAVENPORT: It's super cool.

37:30 All right.

37:31 Dave, Addy, thank you so much for kicking us off

37:34 here on what's going to be a very exciting day.

37:37 And let's go celebrate some agents.

37:39 DAVE ELLIOTT: Yeah.

37:40 ADDY OSMANI: Sounds good.

37:41 JASON DAVENPORT: All right.

37:41 We'll catch everyone here in a bit.

37:54 Thanks, guys.

37:57 [MUSIC PLAYING] JASON DAVENPORT: Hey.

44:23 What's up, friends and family?

44:25 Back here on the Google live stream stage.

44:28 My name is Jason Davenport.

44:29 I work at Google Cloud and I am joined today by two very special guests.

44:34 I have Ben and David from the Acquired Podcast.

44:38 Super excited to have you guys on here.

44:40 BEN GILBERT: Thanks for letting us crash your party.

44:43 DAVID ROSENTHAL: What a party it is.

44:44 JASON DAVENPORT: I feel like I'm crashing the party

44:46 with you two on stage, to be very candid.

44:49 It's super fun to see the energy here.

44:51 Let's start off something here, get just right into it with our audience.

44:54 You've done three episodes on Google, right?

44:56 BEN GILBERT: Yeah.

44:57 It was like the majority of our last year, studying the company.

45:00 DAVID ROSENTHAL: I think 11 or 12 hours total saga.

45:03 So that's a solid audiobook in there.

45:05 BEN GILBERT: Yeah.

45:06 JASON DAVENPORT: What's that?

45:07 That's, like, 400 pages written?

45:08 BEN GILBERT: Yeah.

45:09 About right.

45:10 JASON DAVENPORT: Super cool.

45:11 All right.

45:12 So you've done so much so far.

45:14 You obviously watched today's opening keynote with Thomas.

45:18 How would you change what you've written so far?

45:21 BEN GILBERT: Yeah.

45:21 So the question is, do we need a part four here at some point?

45:25 For anyone who's watching that hasn't listened, we did this big saga.

45:29 Part one was the history of Search.

45:31 Part two, we called Alphabet,

45:33 the era of web applications and developing the web as a platform.

45:37 And then part three was this AI era, and the craziness of Google seeming very

45:42 behind when ChatGPT launched in 2022— DAVID ROSENTHAL:

45:46 Well, first Google inventing— BEN GILBERT: Inventing the transformer.

45:49 DAVID ROSENTHAL: Everybody in AI worked at Google.

45:51 JASON DAVENPORT: Just a small thing in 2018, right?

45:53 BEN GILBERT: That's right.

45:54 JASON DAVENPORT: A small thing.

45:54 BEN GILBERT: 2017.

45:55 JASON DAVENPORT: Oh, yeah.

45:56 You're right.

45:56 BEN GILBERT: And then the crazy comeback in '23, '24, '25, with Gemini,

46:01 with bringing together DeepMind and the core Google research

46:05 team that predated DeepMind and watching all the products evolve.

46:11 Today is probably a reasonable chapter of part four.

46:14 I expect it'll be, like, 10 years before we make the episode.

46:17 But I thought an underrated,

46:19 very interesting part of today's announcement was TPU V8,

46:24 the split to a training and an inference chip,

46:28 and just some of the specs on the generational change between V7 and V8 in just,

46:35 I think two years is the time frame between the last chip and this one.

46:42 DAVID ROSENTHAL: When we made our NVIDIA series in— what was that?

46:46 '23?

46:46 '24?

46:47 BEN GILBERT: Yeah.

46:48 DAVID ROSENTHAL: '25?

46:50 '23-24.

46:51 The dominant computing modality in AI was training.

46:55 Inference was like, oh, yeah, inference too.

46:58 But training was what everybody was focused on.

47:00 JASON DAVENPORT: Inference is fine.

47:01 DAVID ROSENTHAL: Yeah, inference is fine.

47:02 I guess you could run that on the same chips or whatever.

47:05 Worry about that later.

47:06 And the bets that the TPU team and that all of Google

47:10 made a couple years ago— it was a couple of years ago,

47:13 when TPU V8 was being developed.

47:16 It was still very much a training first market.

47:19 And now I don't know if it's inference first, but it will be soon.

47:24 BEN GILBERT: I think the compute loads

47:26 for inference have now eclipsed the line for training,

47:31 which makes sense when you look at every single Google product.

47:35 Let's just zoom in on Search.

47:37 Every query now has an AI overview.

47:40 Now granted, that's a really small model and it's really optimized.

47:43 But it's kind of unbelievable that at the scale of full Google Search,

47:47 you now are doing AI inference every time.

47:51 JASON DAVENPORT: Yeah.

47:51 It's fascinating to me.

47:53 You think of frontier models and there's three to five-ish

47:56 companies in the world that are really probably pushing that.

48:01 Everyone else is leveraging those frontier models or derivatives of those.

48:05 So I do think it makes a lot of sense

48:07 if you think about inference as that workload going forward.

48:12 The other thing I think is really fascinating with some of the stuff today,

48:15 if you think of some of the Agent Platform components,

48:18 I see a lot of folks struggle from zero to one with building.

48:24 And the lucky that you get to one,

48:27 then run into these hardware problems where it's OK, well,

48:30 I have one, but how do give this to a million users,

48:34 10 users, even, for some cases.

48:37 And solving that problem, I think,

48:39 is probably one of the next things over the next 24 months, if I'm being honest.

48:43 BEN GILBERT: Yeah.

48:44 DAVID ROSENTHAL: Yeah.

48:45 JASON DAVENPORT: What are your thoughts, though?

48:46 How are you looking at this across the market in terms of barriers

48:50 to that kind of last mile of adoption with AI and agents?

48:59 BEN GILBERT: There's this interesting question of, is AI

49:02 as useful as the hype sort of around it?

49:08 JASON DAVENPORT: Yeah.

49:09 BEN GILBERT: It's one way to frame the bubble question.

49:11 Are people getting enough value versus all the conversation around it?

49:14 And the fact that we're seeing inference

49:18 eclipse training in workload volume— DAVID ROSENTHAL:

49:20 Seems like a clear answer to that question.

49:22 BEN GILBERT: Yeah.

49:23 Check.

49:23 DAVID ROSENTHAL: Yeah.

49:24 BEN GILBERT: And are customers getting value out of it?

49:26 You're seeing revenue explosion, so OK.

49:28 Check.

49:29 Then the question becomes, well, what is holding AI back?

49:34 And I suspect it is most people still don't think,

49:37 oh, I should be using AI for this task.

49:40 We have very narrowly defined, oh, I should ask Gemini.

49:44 I should ask ChatGPT.

49:45 I should ask Claude when I have a query that is

49:48 more complicated than something I would just search the web for.

49:52 But I think most people for most tasks throughout the day,

49:54 it still doesn't occur to you,

49:56 even if you're an AI maximalist who's really trying to lean into the tools.

50:01 There's just not the muscle memory yet

50:04 and the obvious tools to use for each task.

50:07 And to your point, I think the biggest manifestation is around agentic.

50:11 Most people, 99 plus percent of people still don't think, oh,

50:14 I should create an agent to do that on my behalf.

50:17 JASON DAVENPORT: Yeah.

50:18 Yeah.

50:18 DAVID ROSENTHAL: And the UIs for this stuff are still getting figured out.

50:21 I mean— JASON DAVENPORT: Do you need a UI with an agent?

50:24 DAVID ROSENTHAL: I mean,

50:25 the fact that— it's getting better quickly, but until now,

50:29 you kind of needed to have a terminal up to do a lot of this.

50:31 Normal people are not going to have terminals up.

50:34 There needs to be some kind of way to widely accepted,

50:37 easily usable layer for people that the instant

50:40 the word terminal or terminal flashes on your screen,

50:43 it's like, nope, not for me.

50:45 BEN GILBERT: Yeah.

50:46 JASON DAVENPORT: I mean,

50:46 I do a lot of development and I still have that feeling some of the time.

50:50 It happens to everyone.

50:52 BEN GILBERT: Just don't type sudo.

50:53 JASON DAVENPORT: Oh, that's the first command that I run all the time.

50:56 Come on.

50:58 All right.

50:58 Is AI like the internet?

51:02 Is it like the manufacturing line?

51:03 Or is it like electricity in terms

51:05 of the timeline of adoption we're thinking about?

51:08 And maybe the scale, also.

51:09 BEN GILBERT: The internet but faster.

51:11 JASON DAVENPORT: Internet but faster?

51:12 BEN GILBERT: Yeah.

51:12 Actually, we had a great conversation last night with Amin,

51:16 who's on stage today announcing the new TPU.

51:20 And he made this comment to us when

51:23 we were talking about inference versus training workloads.

51:25 And his point was, training is a large, expensive, one-time task.

51:32 Inference is a cheap, over and over and over again, ongoing task.

51:37 There's an analog could make to the early web.

51:40 Google in 2000, the main use of the infrastructure was

51:44 to go and do a giant one time web crawl,

51:48 save that web crawl in the index— JASON DAVENPORT: Build the index.

51:50 Yeah.

51:51 BEN GILBERT: And then use the leftover

51:52 computers to actually serve the web pages.

51:55 And over time, by 2010, 2015, maybe even earlier than that, the vast

52:01 majority of Google's infrastructure was serving

52:03 pages for queries and the crawl was this small thing running in the background.

52:08 You see that exact same thing happening with training and inference,

52:11 but on a much more compressed time scale of call

52:14 it four years so far rather than 15 years.

52:18 DAVID ROSENTHAL: I think also,

52:20 the internet was so much about creating new business models

52:24 and creating revenue for— new revenue for companies and enterprises,

52:27 and that just takes longer to figure out.

52:29 And AI, I think, will be that too.

52:30 But AI is also— like, we saw the demo of the YouTube TV customer service today.

52:35 One of our partners is a company called Sierra that does this for companies.

52:39 Today, you can use AI to massively efficientize

52:44 your costs and so that is going to drive adoption.

52:47 That is driving adoption by enterprises so

52:50 much faster than something like the internet.

52:52 JASON DAVENPORT: Yeah.

52:53 It's interesting.

52:54 I think we're in an efficiency phase of AI right now.

52:57 I don't know how much longer it's going to last.

52:59 Candidly, I'm a little excited for when it's over and we

53:03 start to really think about business transformation as a result of AI,

53:07 because I think if you think of business models and where these next rounds

53:12 of companies are going to come from, it's

53:14 going to be from that type of question.

53:15 Hey, how am I building AI in a way

53:20 that integrates into what the customer actually wants?

53:23 Not like making the website two clicks easier for some customer to use.

53:28 DAVID ROSENTHAL: Yeah.

53:28 The next Googles, next YouTubes, the next— yeah.

53:31 BEN GILBERT: It's actually a good question I would,

53:33 much like how in the internet when web pages first went online,

53:39 the newspapers put the exact newspaper up.

53:42 But it was digital and you could read it via HTML.

53:45 And then over time, we realized, oh,

53:48 the scrollable feed is the way to consume information on phones.

53:53 I think you're right that we aren't really seeing

53:57 the killer native app of the agentic AI era.

54:03 We don't even know what it looks like.

54:04 We're trying to make all of our existing stuff

54:08 more efficient and put it into this AI modality,

54:11 but I don't think— DAVID ROSENTHAL: But it definitely will come.

54:13 It feels like there are things enabled by AI that it'll come.

54:16 JASON DAVENPORT: Like the YouTube support demo.

54:18 For those who are following along, we had a demo this morning of using

54:23 AI with YouTube support in a multilingual scenario.

54:27 And how many times does that happen?

54:28 BEN GILBERT: It happens all the time.

54:30 JASON DAVENPORT: That was a great demo, I loved that demo.

54:32 Patrick put a lot of love into that too.

54:36 It shows in terms of all the presentation for it.

54:40 That's starting to scratch that, because it's like, hey, I have a question.

54:45 Here's the things that I need.

54:47 I just want to watch the draft.

54:49 DAVID ROSENTHAL: Yep, yep.

54:50 The NFL draft.

54:51 JASON DAVENPORT: Make it easy for me to— DAVID ROSENTHAL:

54:53 And that's the kind of stuff— I mean, there are really, really,

54:56 really large enterprises that when you call

54:58 their 1-800 number today, AI picks up.

55:01 It's already happened.

55:03 It's not like, oh, this is a demo, like a proof of concept.

55:05 This is happening.

55:06 JASON DAVENPORT: Yeah.

55:07 And I think those are super cool.

55:11 All right, so you've obviously been following

55:12 a lot of Google and other companies.

55:15 How would you describe Cloud's transformation?

55:19 DAVID ROSENTHAL: Oh, man.

55:20 BEN GILBERT: I mean— JASON DAVENPORT: Loaded question.

55:22 Not a loaded question at all.

55:23 DAVID ROSENTHAL: It's not even a transformation.

55:25 It's like, I don't know, refounding or something.

55:29 Google Cloud today versus Google Cloud

55:31 10 years ago are completely different species.

55:34 JASON DAVENPORT: I mean,

55:35 I joined in 2020 and I feel like what it is today versus what I joined,

55:38 it's a completely different organization.

55:41 In a good way, to be clear.

55:42 DAVID ROSENTHAL: Yeah.

55:43 No, no.

55:43 In a great way.

55:43 Yeah.

55:44 BEN GILBERT: I read a Harvard Business School case study on Google Cloud,

55:47 and especially starting right around the time that Thomas Kurian came in.

55:53 And there was a quote— I hope I get the order of magnitude right,

55:56 because it was so extreme I almost couldn't believe it.

55:58 That Thomas came in and he was sort of surveying the organization.

56:03 And he said, how many salespeople do we have?

56:05 And he looked at the number and said,

56:07 we need 1,000 times more salespeople in order to run an effective,

56:12 customer-focused enterprise organization.

56:14 And it is literally true that Google 1,000x the headcount on Google Cloud.

56:19 DAVID ROSENTHAL: It was just not— Google's DNA

56:21 was not to run an enterprise first— JASON DAVENPORT:

56:24 It's a completely different animal, to be fair.

56:26 DAVID ROSENTHAL: The story of Snap is amazing.

56:29 I mean, Google Cloud started with Google App Engine,

56:32 which was like a platform as a service for mobile app developers.

56:37 And it just so happened that Snapchat got built on it.

56:39 And then I think Snapchat disclosed in their S-1

56:42 when they were going public that they were

56:45 Google Cloud's largest customer or something like

56:47 that, but they had no— there was no sales relationship.

56:50 It was, like, complete insanity.

56:51 BEN GILBERT: The one big shining example of a giant customer at that time,

56:55 and now— DAVID ROSENTHAL: It was all by accident.

56:56 BEN GILBERT: You watched the keynote today,

56:58 and it's just a dizzying number of giant Fortune 500 logos

57:02 that have staked a big part of their business on Google Cloud.

57:05 DAVID ROSENTHAL: And then the other thing that's happened is,

57:08 as we chronicled in our series,

57:12 the structural and product advantages that Google

57:15 Cloud has now versus the other hyperscalers,

57:17 because Google's the only company that has a chip,

57:20 a Cloud, and a model all integrated.

57:24 There are multi-billion dollar companies that have one of those things.

57:28 Yeah.

57:29 JASON DAVENPORT: Well, and what I do think is also pretty cool about that is if

57:34 you think of all the sovereign Cloud work that we're doing at Google Cloud,

57:38 there's real needs elsewhere.

57:40 If we step out of the bubble of, really, Silicon Valley or the United States,

57:46 being able to run Gemini actually on the edge,

57:50 in the Distributed Cloud sense, I think that's super cool.

57:55 It seems kind of like you're like, yeah, it's like a stack.

57:58 It's like, no, you have so many optimization pieces that you

58:01 have to do to go into that to give people that advantage.

58:05 It's pretty cool to watch.

58:07 What are you most excited about seeing from Google here in the next 12 months?

58:12 BEN GILBERT: So I don't think this was in the keynote,

58:15 but it is in some of the blog posts and press releases that Google put out.

58:20 The inference time is a 5x speed up on the TPU 8 inference chip.

58:31 That takes AI from this modality of I'll query and then I'll sit back and wait.

58:38 Maybe I'm only waiting 5 seconds or 10 seconds,

58:40 but maybe I'm waiting several minutes for a task to complete.

58:43 JASON DAVENPORT: I usually go golfing.

58:45 BEN GILBERT: Great.

58:46 That's great.

58:47 Yeah, just let a whole bunch of code be written while you're out on the links.

58:54 A lot more AI tasks will start feeling synchronous instead of async,

58:58 and I don't think we can yet predict what that unlocks.

59:03 There's such a big difference in a computing

59:06 application of something that feels instant versus not,

59:10 that there's all these interesting downstream effects from that.

59:12 DAVID ROSENTHAL: Yeah.

59:13 I mean, Google saw this with Search in two generations ago.

59:16 Why is the Google homepage so sparse?

59:18 It wasn't because Larry and Sergey were minimalist design acolytes.

59:23 BEN GILBERT: A little bit.

59:24 DAVID ROSENTHAL: I mean, sure, a little bit.

59:25 But it was about speed.

59:26 The faster you get from query to answer,

59:29 the more you're going to retain customers.

59:31 JASON DAVENPORT: Well, so it's funny you mentioned that.

59:35 I had the privilege of doing a couple

59:38 GDE meetups and GDG meetups in India in February, and one of the use cases,

59:43 or I guess one of the primary ones that everyone's

59:45 trying to unlock is voice-powered AI and real-time voice AI.

59:51 So if you think of you being able to reduce

59:53 the inference time on that, you can actually chat with AI.

59:57 And not, hey, support.

59:59 Well, I got to go look at your user account.

1:00:02 I need to look at your payments, need to look at your prior history.

1:00:05 Like, we accept that that's going to take a few seconds.

1:00:09 I have a 10-year-old at home.

1:00:11 He wants to talk to AI, he expects AI to be having a conversation with him.

1:00:16 And what that unlocks?

1:00:18 I think that's super cool.

1:00:19 DAVID ROSENTHAL: Yeah.

1:00:20 Another area that obviously made less sense for today and for Cloud Next,

1:00:24 but that I'm excited for see what happens with Google

1:00:27 and AI in the next couple of years is YouTube.

1:00:29 One of the things that YouTube is just this giant hiding in plain

1:00:33 sight that we spend a lot of time on in our episodes.

1:00:35 It's the biggest property in human history.

1:00:39 It's the biggest thing on the internet is the biggest consumption platform,

1:00:44 product, whatever you want to call it.

1:00:45 It is the biggest thing that humans have ever created.

1:00:47 BEN GILBERT: And it now has more revenue than any other media company,

1:00:50 including Disney, as of this quarter.

1:00:52 DAVID ROSENTHAL: Right, right.

1:00:54 And AI is going to do a lot there.

1:00:56 BEN GILBERT: I've actually been doing a ton of this where,

1:00:58 for acquired research, we go— I probably watch, I don't know,

1:01:02 50 to 100 YouTube videos of research per episode.

1:01:06 And sometimes I forget where I heard something,

1:01:08 and I now very regularly will go back through my sources list,

1:01:12 click on 10 videos that I think are the candidate ones,

1:01:15 and then use the little chat with the video or the ask AI,

1:01:18 the little Gemini logo on it.

1:01:20 And I'd say in the transcripts of this episode, where was this discussed?

1:01:24 And then I can go and figure out exactly

1:01:25 which video it was and exactly what the comment was.

1:01:27 It saved me hours and hours and hours of research.

1:01:30 JASON DAVENPORT: Well, and for me, AI and Workspace, interestingly,

1:01:35 the last— if it's like, where do you use AI the most right now?

1:01:39 Workspace, hands down.

1:01:41 In particular, the last two or three months, all the feature releases there.

1:01:45 I'm like, it has also learned my writing style,

1:01:48 and it has learned how to shorten my writing

1:01:50 style because I'm not a very good writer.

1:01:52 News flash for anyone who knows me well.

1:01:55 But it's fascinating to see those improvements,

1:01:59 and being able to use your Gemini with this stuff— DAVID ROSENTHAL:

1:02:03 That was a cool part of the demo— or cool demo today

1:02:06 was just generating a slide deck within Gemini for enterprise right there.

1:02:10 JASON DAVENPORT: Yeah.

1:02:11 I've used that before, and it's pretty cool.

1:02:13 The other thing that's cool about it is I gave it a very terrible prompt.

1:02:19 And I was like, I should have given it a better prompt.

1:02:22 Because it's like, how would you know what I'm looking for?

1:02:24 Does the stack, and I'm like, it's like 75% of the way there interestingly,

1:02:28 just with the context of me as a user and what I was working on.

1:02:32 Two shot after that and I'm like,

1:02:35 it's actually— it looks good and it's very close to what I need to use it for.

1:02:40 And I'm like, this is really cool.

1:02:42 And it's not like you generate me an email which is a four

1:02:45 page email that someone else is going to take and be like,

1:02:49 make this a one line thing.

1:02:50 What's that?

1:02:51 There's so many internet memes for this stuff.

1:02:53 DAVID ROSENTHAL: Yes.

1:02:53 BEN GILBERT: Yeah.

1:02:54 Please fix, thanks.

1:02:55 JASON DAVENPORT: Yeah, exactly.

1:02:56 Yeah.

1:02:57 That's one of my favorite ones.

1:02:58 But it's so cool to see it in action.

1:03:00 All right.

1:03:01 So you're here for today and probably a couple more.

1:03:05 Where are you going to next and what are you

1:03:06 most excited about doing while you're here at Cloud Next?

1:03:09 BEN GILBERT: So we get to spend, when we were doing our Google series,

1:03:13 some time with the legendary Jeff Dean and— JASON DAVENPORT:

1:03:18 I'm jealous, by the way.

1:03:19 BEN GILBERT: That guy— DAVID ROSENTHAL: Incredible human being.

1:03:21 JASON DAVENPORT: I know.

1:03:22 He also— yeah.

1:03:23 He's so awesome.

1:03:24 DAVID ROSENTHAL: He may actually be an AI

1:03:26 from the future sent back to— BEN GILBERT:

1:03:28 To give their gifts to all of us here in the present.

1:03:31 Yeah.

1:03:32 Do you want to tell him about what we're doing here?

1:03:33 DAVID ROSENTHAL: Yeah.

1:03:34 So this afternoon, we're doing a fireside chat,

1:03:36 extended conversation with Jeff and with Amin on the new TPUs

1:03:42 and on everything going on with Gemini and at Google DeepMind.

1:03:45 So really excited for that.

1:03:47 BEN GILBERT: It'll be fun to catch Amin after the keynote,

1:03:50 kind of coming off stage.

1:03:51 He's had a few hours to decompress, to really reflect on TPU V1 to now,

1:03:57 going from ASIC to this highly specialized two

1:04:02 sets of chips right on the cutting edge.

1:04:05 Excited to chat with him about it.

1:04:06 JASON DAVENPORT: It is super cool to see all of that.

1:04:08 And also a huge thank you for you guys coming on the live stream here.

1:04:12 It is so awesome to be able to spend even

1:04:14 20 minutes with you talking about all this cool tech innovation.

1:04:19 So with that, we're going to wrap up here.

1:04:21 Thank you, Ben and David, for everything that you've been gifting to us here.

1:04:26 Hope your session this afternoon is awesome.

1:04:29 And we're going to pass it on to the next group here coming up.

1:04:32 Thank you, everyone, for listening in.

1:04:33 We'll take a quick break and be right back.

1:04:36 DAVID ROSENTHAL: Thank you.

1:04:36 BEN GILBERT: Thank you.

1:04:41 [MUSIC PLAYING] JASON DAVENPORT: Hey.

1:13:10 What's up, friends and family?

1:13:11 Back here with Jason Davenport with Google Cloud,

1:13:14 and I am joined in this next session with Shubham Saboo,

1:13:18 an AI product manager here at Google Cloud.

1:13:22 And Shubham, super excited to have you on the show and to talk

1:13:25 about all the cool things that we're doing here at Cloud Next.

1:13:29 So let's get into it.

1:13:32 What is the most exciting things that we're doing

1:13:34 for developers here that we've launched here today at Google Cloud?

1:13:38 SHUBHAM SABOO: Yeah.

1:13:39 Thank you for having me, Jason.

1:13:41 Really excited to talk about it.

1:13:42 There are quite a few things that we did for developers here at Google Cloud,

1:13:46 but there are three things I'm particularly excited about.

1:13:49 JASON DAVENPORT: The power of three, right?

1:13:52 SHUBHAM SABOO: If you say that.

1:13:53 So the first one is Agent CLI in Agent Platform.

1:13:57 Now, what that is, it's a brand new CLI.

1:14:00 It's a combination of skills and commands that let you build,

1:14:05 scale, govern, optimize the entire Agent Development lifecycle.

1:14:09 That's your entry point to building anything

1:14:12 with Agent Development Kit and Gemini Enterprise Agent Platform.

1:14:15 It works with any of your coding agents,

1:14:18 so all you need to do is just install the CLI.

1:14:21 You can do a global install.

1:14:23 It gets picked up by any of your coding agent,

1:14:25 you can use Gemini CLI, Cloud Code,

1:14:28 Codex, whatever your preference of coding agent is.

1:14:31 And it just works.

1:14:33 That's the beauty of it.

1:14:34 It's the entry point.

1:14:35 You go in, you give it a command.

1:14:37 I want to build XYZ agents.

1:14:39 It knows everything about ADK.

1:14:41 It knows everything about platform.

1:14:43 It has all the context, all the skills needed to build the agent for you.

1:14:48 That's the beauty about it.

1:14:49 JASON DAVENPORT: That's super cool.

1:14:50 And yeah, before we come to maybe the lifecycle of Agent CLI,

1:14:54 I think one of the things that you talked about

1:14:56 there— and so maybe to do a quick off road.

1:14:59 You mentioned skills.

1:15:01 SHUBHAM SABOO: Yeah.

1:15:02 JASON DAVENPORT: What are skills?

1:15:04 And then how are we using skills here at Google

1:15:07 Cloud and in Agent CLI to make that development easier?

1:15:10 SHUBHAM SABOO: Awesome.

1:15:11 No, that's a great question.

1:15:13 So to summarize, skills is something

1:15:15 that makes the agent smarter and intelligent.

1:15:18 So it really provides agents with different kind

1:15:20 of capabilities that agents can execute and do.

1:15:24 Previously what you would do before skills, before the introduction of skills,

1:15:27 you would stuff everything that you would want your agent

1:15:29 to do in a single system prompt or a single prompt,

1:15:32 which would make your prompt be tens and hundreds of lines of code.

1:15:36 Your agent can get lost in context.

1:15:38 Now skills introduce this smarter— and this is not specific to Google Cloud.

1:15:42 Skills is a standard,

1:15:44 generic standard across every AI company, every AI product.

1:15:49 They now use agent skills.

1:15:51 What it does is it gives your AI agent different capabilities.

1:15:54 Your AI agent can decide at runtime which skills that they would want to use.

1:15:59 And that's what we are using for Agent CLI as well.

1:16:02 So Agent CLI has a combination of number

1:16:05 of skills around Agent Platform which helps you build, scaffold.

1:16:09 It has all the context of scaffolding an agent,

1:16:12 building an agent, deploying an agent

1:16:14 across the Gemini Enterprise Agent platform.

1:16:17 JASON DAVENPORT: One of the things I think

1:16:18 is super cool with all the skills— obviously,

1:16:22 we're launching skills for Google Cloud products.

1:16:25 We have a number that are out there today

1:16:27 which are making it essentially your agent and expert,

1:16:30 whatever product it's using.

1:16:31 The thing that I love in the CLI, though,

1:16:34 is that you need to know both the technical skill.

1:16:38 And then we also need to know some more process based skills.

1:16:42 So how do we use— or how does CLI

1:16:45 use skills to make that Agent Development so much easier?

1:16:48 SHUBHAM SABOO: Yeah.

1:16:48 So the way CLI is built is it is replicating how you

1:16:53 as an expert software engineer or an agent

1:16:56 engineer would go about building your agent,

1:16:59 deploying your agent, evaluating your agent, adding observability.

1:17:02 Now imagine all of those capabilities summed up in a skill

1:17:06 that an AI coding— just AI coding agent can just use through simple commands,

1:17:11 through simple English prompts.

1:17:13 How cool is that?

1:17:15 JASON DAVENPORT: I mean,

1:17:16 I've scaffolded a lot of this stuff before and I can tell you,

1:17:19 it does make it significantly easier, which I greatly appreciate.

1:17:24 Let's talk about some of the things now

1:17:26 that we're doing in evaluation with the CLI.

1:17:29 How is the CLI helping to make

1:17:31 evaluation easier for agents that folks are building?

1:17:34 SHUBHAM SABOO: Yeah.

1:17:35 So the great thing about it is in Agent Platform in the govern section,

1:17:38 if you look at the things that we have introduced,

1:17:41 we have introduced a number of ways for you to evaluate your agents.

1:17:45 Now, one way to use it would be to go read the docs, look at the blogs,

1:17:49 look at what we installed or what we released,

1:17:51 go to the platform, do it yourself.

1:17:54 Or you could just ask your coding agent with Agent CLI installed,

1:17:57 I want to evaluate my agents— JASON DAVENPORT:

1:17:59 Should I go install the CLI after this podcast?

1:18:03 I hope I have it already.

1:18:04 SHUBHAM SABOO: Yeah, yeah.

1:18:05 I think you should be doing it already.

1:18:09 Yeah.

1:18:09 So coming to that, I think what it really makes easy

1:18:14 for people is now you don't have to think about all these things.

1:18:18 So the question that you're asking me, can now ask it to your agent.

1:18:21 It has context of Agent Platform.

1:18:23 And the best way to get started with if

1:18:26 you're interested in evals is ask the coding agent,

1:18:30 what all eval options do I have?

1:18:32 How can I evaluate my agent?

1:18:34 And then it can walk you through that process.

1:18:37 You can do back and forth in the process.

1:18:40 You're not only just building those features

1:18:42 and adding those features to your agents, you're also learning in the process.

1:18:47 JASON DAVENPORT: So I think that last part that you hit on is super critical.

1:18:51 As I think of even my own journey with agents,

1:18:54 I've been using them pretty extensively now for probably

1:18:58 the better part of a year or two years.

1:19:01 It's super cool to think about my evolution and my learning with that, and to be

1:19:07 able to use something like CLI to start to get those tidbits.

1:19:10 Hey, how is agentic evaluation different than LLM evaluation?

1:19:15 Because they're kind of similar, but they're also really different.

1:19:18 One, I'm just looking at a prompt response.

1:19:21 The other I'm trying to manage the outcome.

1:19:23 Super cool to have just something where I can start

1:19:25 to get those tidbits and then learn along the way for it.

1:19:29 Let's talk about another thing you have to think

1:19:31 of learning and bringing that— I don't want to say determinism,

1:19:35 but outcome based learning agents.

1:19:38 What's new in Agent Development Kit?

1:19:40 So we just announced Agent Development Kit 2.0, if I'm not mistaken.

1:19:43 SHUBHAM SABOO: Yeah.

1:19:44 That's good.

1:19:44 Yeah, great question.

1:19:45 So in Agent Development Kit 2.0,

1:19:48 we introduced graph based workflows, which makes it really,

1:19:51 really easy for you to bring that kind of determinism,

1:19:54 that kind of reliability that you would

1:19:56 want in your production workflows into agents.

1:19:59 So one thing with agents was it was non-deterministic.

1:20:02 It could just, based on the prompt, based on what model would understand,

1:20:06 it could go in any of those directions.

1:20:07 But now you have precise control over the routing,

1:20:11 what the task execution would happen,

1:20:15 and how would you structure the entire system.

1:20:18 So that kind of determinism could come

1:20:21 into your agents through graph based workflows.

1:20:23 JASON DAVENPORT: And maybe for folks out there,

1:20:26 what— so you could obviously agentify almost any type of processor workload.

1:20:31 Where are you seeing— what types

1:20:33 of processes should folks be thinking about would

1:20:35 be a good use case for or candidate for this new graph based workflows approach?

1:20:39 SHUBHAM SABOO: Yeah.

1:20:40 The processes where you— in financial services, insurance claims,

1:20:44 processes where you need a little bit of reliability and determinism,

1:20:48 where you exactly know that there are some

1:20:51 deterministic logic which you don't want to agentify.

1:20:54 You don't want to bring the big guns for everything.

1:20:57 When you know it could just work with some small ammunition, just go with that.

1:21:01 You just have deterministic logic sit there,

1:21:04 make sure it gets executed in that specific way.

1:21:07 So all those applications where deterministic logic is

1:21:11 very specific and you don't need to identify everything,

1:21:13 that's where you can use this to route your agents to those specific

1:21:18 routes or those specific things where it

1:21:21 can execute those things with such determinism,

1:21:24 but still have that agent logic baked in.

1:21:26 JASON DAVENPORT: Yeah.

1:21:27 As I think about agent building, one of the things that I always think about

1:21:32 is you want the non-determinism when you really need it.

1:21:37 But I want determinism for all the other components.

1:21:40 With the graph based approach, I can see how that's really coming to life.

1:21:45 There are things that non-determinism is really desirable, and to your point,

1:21:50 if I have to route to five different steps, just route— just do each step,

1:21:56 and bringing that graph approach to it is super cool.

1:21:59 All right.

1:22:00 So all these things, what's the one maybe thing that we've launched as a part

1:22:06 of these that you're excited about that we haven't talked about?

1:22:10 SHUBHAM SABOO: I'd say long running AI agents.

1:22:12 JASON DAVENPORT: Long running?

1:22:13 SHUBHAM SABOO: Yeah.

1:22:14 Everyone is super excited about OpenClaw, Hermes.

1:22:17 That's all the hype.

1:22:19 But with agent runtime at Cloud Next, we have launched long running AI agents.

1:22:25 Now agents can maintain state up to several days.

1:22:28 JASON DAVENPORT: That's super cool.

1:22:30 SHUBHAM SABOO: You can just start running an agent today and will

1:22:32 still remember what you did two days ago or three days ago.

1:22:35 Now agents can maintain state up to seven days.

1:22:38 I'm really excited about— JASON DAVENPORT: Seven days, right?

1:22:41 SHUBHAM SABOO: Yeah.

1:22:41 Seven days.

1:22:42 So really excited about long running AI agents.

1:22:45 And to enable those long running AI agents in production,

1:22:48 we have also launched ambient agents and resume agents in ADK,

1:22:52 Agent Development Kit.

1:22:54 What ambient agents is.

1:22:57 So your agents is as smarter as the prompts that you would give them.

1:23:01 Normally, the way you would execute, you would just give some instructions.

1:23:03 Based on that, agent will trigger off.

1:23:05 But how about if your agents can trigger off based on events,

1:23:09 based on sending a text, based on schedules, based on prompts?

1:23:12 That's what ambient agent enables.

1:23:15 And when you have these long running agents, you would get into this problem.

1:23:19 There could be a network drop or there could be any kind of interruption.

1:23:24 Before, what used to happen is whenever— JASON DAVENPORT:

1:23:26 We called those coffee breaks when people were doing the work, by the way.

1:23:29 SHUBHAM SABOO: Yeah.

1:23:30 So whenever that used to happen before, your agent would have to restart.

1:23:35 So you would not like to take those coffee breaks.

1:23:38 But now you can easily take those coffee breaks.

1:23:43 Now you can easily take those coffee breaks by just

1:23:46 turning one parameter as true when you're defining your agents.

1:23:50 Your agents can resume from wherever they paused before,

1:23:54 and it also helps with all the human in the loop workflows.

1:23:58 We talked about the insurance claims processing agent.

1:24:01 For example, let's take an example of that.

1:24:03 When you build that, at times, you would want a human to approve.

1:24:07 There will be times you would not want to completely,

1:24:09 100% automate that process.

1:24:12 But that human in the loop requires some kind of pausing.

1:24:16 So you might not have time to approve it right then and right there.

1:24:19 You might approve it six hours later, eight hours later,

1:24:23 maybe two days later, depending on how long your weekend is.

1:24:25 JASON DAVENPORT: Well,

1:24:27 so what I love about that, if you think of stopping and resuming agents.

1:24:33 So the first thing we have is obviously context.

1:24:37 Hey, what does the agent have in the current context window?

1:24:40 So you have to manage that.

1:24:42 If you think of agents that have access to tools or sandboxes,

1:24:47 you have to maintain those.

1:24:48 So pausing and resuming is actually not a very easy problem to solve.

1:24:56 SHUBHAM SABOO: Yeah.

1:24:57 JASON DAVENPORT: Gotta make it easy.

1:24:58 SHUBHAM SABOO: Yeah.

1:24:59 So none of these things, even long running agents, they sound great.

1:25:02 But none of these were trivial problems to solve.

1:25:05 But the great thing about this is we have made it

1:25:08 super simple for people to build these things with Agent Development Kit.

1:25:12 And cherry on the top, now you can do all of those through Agent CLI.

1:25:17 And Agent CLI, again, is not just for building one agent.

1:25:21 So we talked about building agent, deploying, adding evals, observability.

1:25:26 So it has all the context around Agent

1:25:28 Platform but you can extend the functionalities of agents.

1:25:31 So you can add tools.

1:25:32 Again, similarly, if you don't know what tools exist, just ask Agent CLI.

1:25:37 It would know.

1:25:38 All you need to do is talk to your agents.

1:25:41 So we are in the phase where we are building agents with agents,

1:25:45 and all we are doing is talking to agents,

1:25:48 and it can do or it can complete the entire Agent Development lifecycle for you.

1:25:54 Just like having you, Jason, with me all the time.

1:25:58 It's like you're an expert and having you, OK?

1:26:01 But Agent CLI can now do that for me.

1:26:02 JASON DAVENPORT: I think the agent is probably more

1:26:04 of an expert than I am in most things.

1:26:06 SHUBHAM SABOO: I doubt that.

1:26:08 JASON DAVENPORT: So, yeah, one of the coolest examples,

1:26:11 talking about agents, that I saw in the keynote,

1:26:15 Sundar was talking about how we're using Agent

1:26:17 Teams for doing migrations here at Google Cloud.

1:26:20 We talk about planner agents, an orchestrator agent,

1:26:24 and then the worker agent or the executor agent.

1:26:28 What I like about that— if you take a step back,

1:26:32 that's kind of a good blueprint.

1:26:33 Whether it's coding or another team based approach,

1:26:36 you usually have those three roles.

1:26:38 What other types of teams are you seeing folks build with agents

1:26:41 and other kind of interesting or cool moments you're seeing so far?

1:26:45 SHUBHAM SABOO: Yeah.

1:26:46 Yeah.

1:26:46 So people are doing a lot of cool

1:26:48 stuff with multi-agent patterns or multi-agent teams.

1:26:52 And I would say a great place to get started for people is Agent Garden.

1:26:57 We haven't talked about Agent Garden yet.

1:26:59 JASON DAVENPORT: We have not talked about that yet.

1:27:01 SHUBHAM SABOO: Yeah.

1:27:01 So Agent Garden is a library of pre-built templates where you will find

1:27:05 all those cool patterns that you're talking to me about— you're asking me about,

1:27:10 where our team of experts have put together those pre-built templates.

1:27:14 All you have to do— and the GitHub source code is also out there.

1:27:17 All you have to do is just pull that in your coding agent,

1:27:22 give your specific requirements.

1:27:23 I want to— however you would want to customize

1:27:26 it for and can customize it for you.

1:27:28 You can also deploy it as is.

1:27:30 So Agent Garden has all those kind of patterns, use cases, multi-agent teams.

1:27:35 Coming back, in ADK, we support a number of multi-agent patterns or multi-agent

1:27:41 workflows that you can do— three simple ones,

1:27:44 a sequential loop agent and parallel agent.

1:27:47 So sequential, like in a pipeline you can execute

1:27:50 loop where you have an agent looping through different steps.

1:27:54 And parallel agents where you can have

1:27:56 multiple subagents doing parallel tasks because you always

1:28:00 want— would not want a single agent

1:28:02 to do sequentially things that could be done parallelly.

1:28:05 But there are a number of patterns depending on who you would ask.

1:28:10 But these are the fundamental patterns which build

1:28:13 up on top of a lot of patterns.

1:28:15 There's human in the loop pattern,

1:28:17 coordinator, dispatcher pattern, iterative refinement.

1:28:19 I can name names, but if you know

1:28:23 these fundamental patterns and think from first principles,

1:28:26 I would suggest work with your agent,

1:28:28 work your way up with your agent and ask them what

1:28:31 are different patterns that we can build depending on our use case.

1:28:34 What becomes really important is for you

1:28:36 to understand what your problem statement is.

1:28:38 Put it in simple English.

1:28:40 Install agent CLI.

1:28:41 Use any coding agent that you would like.

1:28:44 And yeah, just start talking.

1:28:45 That's all you need.

1:28:46 JASON DAVENPORT: Yeah.

1:28:47 One of the things I love— well, one,

1:28:49 I love Ralph Wiggum as a character, obviously, but the Wiggum loops.

1:28:54 If you think of learning agentic behavior,

1:28:58 Wiggum loops are one of the first times you look and you're like, OK,

1:29:02 well, I know that character from TV, which is highly entertaining to me,

1:29:07 but just the pattern of thought to doing that.

1:29:10 And then you move from that, you go into the two agent team.

1:29:15 OK, so can I have one agent do something, work with another agent to do it?

1:29:18 And then you start to get to even more so,

1:29:21 like Gas Town's architecture on agent teams and all that.

1:29:25 And those are super cool, right?

1:29:27 But if you start off trying to build that end state,

1:29:32 there's so much learning that we still have

1:29:34 to do as developers just to, I think,

1:29:36 be able to really use that system effectively.

1:29:38 And it's super cool to hear how we have these things in Agent

1:29:41 Garden that really make that easy for folks to go use in it.

1:29:44 SHUBHAM SABOO: Yeah.

1:29:45 Yeah.

1:29:46 These are templates.

1:29:47 So we talked about Agent CLI, which is your entry point.

1:29:51 We talked about ADK, the new developments in ADK.

1:29:54 It's all about the flexibility.

1:29:56 Now it is giving developers all

1:29:57 the flexibility around building graph based workflows.

1:30:00 If your use case involves something which is

1:30:03 deterministic and you would want to have specific routes,

1:30:05 we talked about the ambient agents, pause and resume agents in ADK.

1:30:11 It's all about flexibility, like giving you as a developer and your agents

1:30:15 more flexibility to go explore and build those production systems.

1:30:19 In the end, we talked about long running AI agents, which is,

1:30:23 OK, now you have all these pillars, you have all these building blocks.

1:30:27 What would you do with that?

1:30:28 You would want your agents to just keep running and keep asking questions.

1:30:32 Start with all the context baked in, not start fresh every time.

1:30:35 JASON DAVENPORT: Well, I mean, for me, I'm like,

1:30:37 I would build an agent that can analyze YouTube videos about golf analytics

1:30:42 and then tell me just constantly how I can improve my golf swing.

1:30:47 I should probably do that after the show,

1:30:49 just to actually make sure that you can do this.

1:30:51 SHUBHAM SABOO: You can add it to Agent Garden as well.

1:30:53 JASON DAVENPORT: I should put on Agent Garden.

1:30:55 That would be perfect actually.

1:30:57 This is a good challenge.

1:30:58 I might do this in the next month.

1:30:59 SHUBHAM SABOO: I'll be looking forward.

1:31:02 JASON DAVENPORT: It's multimodal.

1:31:03 It's all that fun stuff, too.

1:31:04 All right.

1:31:05 Let's do one last question here.

1:31:07 So obviously you're here at Cloud Next all week.

1:31:10 What's on your agenda?

1:31:11 What are you most excited about here as we look

1:31:14 into the end of day one and into day two and three?

1:31:17 SHUBHAM SABOO: Yeah.

1:31:18 I'm really excited about all the capabilities,

1:31:21 specifically in the govern and optimize section pillar

1:31:24 that we will be launching for Agent Platform.

1:31:27 Really excited about this narrative of building agents with agents

1:31:31 and the flexibility that you could have now with building production agents,

1:31:35 with things that we are introducing in Agent Development Kit.

1:31:38 So if you combine all of those, you

1:31:39 would have a system that is really intelligent,

1:31:42 that can run for days and days, not hours and not a few minutes.

1:31:47 So we are moving from demo to something that is production—

1:31:50 that can run in production reliably and can really solve real problems.

1:31:55 JASON DAVENPORT: And that's really— I've done some of this stuff manually

1:31:59 over the past probably 12 months and having the product to do,

1:32:03 honestly, it's a game changer because it's not— once you've done it,

1:32:08 then it's repetitive, but getting the system for it is very hard to get right

1:32:12 and cool to see what we're doing with Google as a part of that.

1:32:15 All right.

1:32:16 Thank you, Shubham, for joining us here on the live stream.

1:32:18 Thank you, everyone listening in.

1:32:20 We're going to cut over here,

1:32:21 and we'll keep on the fun stuff here in a few minutes.

1:32:24 SHUBHAM SABOO: Thank you for having me.

1:32:25 JASON DAVENPORT: Thank you.

1:32:32 [MUSIC PLAYING] JASON DAVENPORT: Hey.

1:47:28 What's up, friends and family?

1:47:30 Thank you to all of you for being patient here on the live stream.

1:47:33 We had a little bit of a technical issue here with one of our microphones,

1:47:37 and a big thank you to the crew for getting us back on track so quickly.

1:47:41 So thank you so much for that.

1:47:42 All right.

1:47:43 I am gathered here this afternoon with two of my favorite people.

1:47:47 First one, Jay Rodge with NVIDIA and Philip Kiely with Baseten.

1:47:53 So great to have you guys here on the show to talk about cool things with Cloud.

1:47:57 JAY RODGE: Yeah.

1:47:58 Really excited.

1:47:58 PHILIP KIELY: Thanks for having us.

1:47:59 JASON DAVENPORT: All right.

1:48:01 One of the themes that we've been hearing starting

1:48:03 off with Thomas's keynote today is really about inference.

1:48:07 And so, there's so much going on that— Jay, maybe I'll start with you.

1:48:11 What are some cool things that maybe NVIDIA and Google

1:48:14 Cloud are doing to make inference better for users?

1:48:18 Hint, hint.

1:48:20 JAY RODGE: Yeah, sure.

1:48:21 So I'd like to share two most important

1:48:24 announcements from today of Google Cloud partnering with NVIDIA.

1:48:29 So the first one is Google Cloud will be

1:48:32 one of the Cloud providers to have Vera Rubin,

1:48:36 which is the next generation NVIDIA hardware for inference and training.

1:48:41 So this will be coming later half of this year.

1:48:43 And Google Cloud are also adding Blackwell GPUs, which is RTX Pro 6000.

1:48:51 So the key benefit of that GPU is it has 96 gigabytes of VRAM,

1:48:56 which helps you deploy or add multiple models on a single GPU, which is insane.

1:49:02 JASON DAVENPORT: Can I have one of those for my basement computer?

1:49:05 JAY RODGE: Sure, sure.

1:49:06 We should talk later.

1:49:07 JASON DAVENPORT: Just, if I pay you for it.

1:49:09 If I pay Jensen for it.

1:49:11 Philip, so inference.

1:49:15 What does that mean to you?

1:49:17 PHILIP KIELY: To me, what inference means is being able to actually

1:49:21 deliver on the promise of AI applications.

1:49:23 It's one thing to train the world's best model or fine

1:49:27 tune a model that is amazing at a specific task.

1:49:29 But if you want to build a low latency, high reliability, user experience,

1:49:34 and scale it in the hypergrowth fashion that all

1:49:38 of these AI platforms are going through these days,

1:49:41 then you need really fast and really reliable inference to make that happen.

1:49:46 JASON DAVENPORT: I think it's cool just how we're seated.

1:49:49 We have infrastructure.

1:49:50 We have models or GPUs and we have the application on top of it.

1:49:55 It's super cool to see all this.

1:49:56 PHILIP KIELY: Full stack seating chart.

1:49:58 JAY RODGE: Yeah.

1:49:59 Full stack.

1:50:00 JASON DAVENPORT: We should trademark that.

1:50:01 That's a great comment there.

1:50:03 All right.

1:50:04 So let's bring this back.

1:50:05 So maybe starting with you, Philip.

1:50:07 You obviously are with Baseten.

1:50:10 How are you using the stack, if you walk back to this way?

1:50:14 How are we doing inference at scale for so many cool things?

1:50:17 Because you're running billions of inferences,

1:50:21 and I don't remember the specific time frame,

1:50:23 but it is a lot on customized things.

1:50:26 PHILIP KIELY: Yeah.

1:50:27 We like to be a little vague about it, but it's a lot of inference.

1:50:31 We are very closely partnered with NVIDIA on a bunch of things,

1:50:35 both on the hardware and the software side.

1:50:38 We were one of the first users of NVIDIA Dynamo for inference.

1:50:43 I actually remember, the week after that GTC keynote last year,

1:50:46 it was like we called them up.

1:50:48 We're like, hey, we're using it in production and we've got a couple questions.

1:50:51 And they were like, you're doing what in production?

1:50:55 And then obviously big adopters of Blackwell, for instance.

1:51:00 There's a lot of work that goes into migrating

1:51:02 an input system from Harpo to Blackwell on the software layer,

1:51:06 on the kernel layer.

1:51:07 We've been doing a lot of that, a lot of stuff with NVFP4.

1:51:10 I was just doing a talk at AI Engineer in Miami a couple of days ago,

1:51:14 where I was talking all about the different

1:51:16 quantization frameworks that we're using with NVIDIA.

1:51:19 Obviously, on the hardware side, any GPU we can get our hands on with stuffing

1:51:23 models on there and making them run fast.

1:51:25 And that get your hands on part comes through Google Cloud.

1:51:28 So Google Cloud is one of our cloud providers.

1:51:31 We sit on top of Google Cloud GPUs, use GKE to build our systems.

1:51:37 And one really cool thing that we do with Google

1:51:40 Cloud is that we are a sort of multi-region deployment.

1:51:43 JASON DAVENPORT: That's cool.

1:51:44 PHILIP KIELY: Yeah.

1:51:45 We can take GPUs in America, in Europe, around the world,

1:51:50 collect them into a single sort of unified pool of compute,

1:51:54 route user requests to minimize latency, and yeah,

1:51:58 just generally make sure that you're able

1:51:59 to access all of the compute in the world,

1:52:02 not just what's in whatever one region or one system that you're working in.

1:52:05 JASON DAVENPORT: That's super cool.

1:52:07 So one of the things as you think about running these things,

1:52:10 I know you also do a lot with Gemma,

1:52:12 which is one of our— essentially our premier open source model at this point,

1:52:18 which is super cool.

1:52:20 How are you training your Gemma with all the stack

1:52:23 here and what are you doing with it that's super cool?

1:52:26 Because you just mentioned a lot of things that we could dig into.

1:52:29 PHILIP KIELY: Yeah.

1:52:30 So we were day zero support partners for the Gemma 4 launch.

1:52:35 And I remember back when the first Gemma

1:52:38 model came out and then Gemma 2, Gemma 3.

1:52:41 What I've always appreciated about this family of models is a few things.

1:52:44 Number one, the multi-modality.

1:52:46 Having a model that does native image inputs is huge for a ton of use cases,

1:52:51 especially a lot of enterprise stuff,

1:52:53 like KYC and document extraction, all those kind of things.

1:52:57 And then the other thing that I really admire

1:53:00 about the Gemma family is the range of sizes.

1:53:02 We've increasingly been seeing companies fine

1:53:05 tune open models to build task-specific intelligence,

1:53:09 rather than relying on an out of the box model.

1:53:12 And the Gemma models, all the way from 2 to 30 billion parameters,

1:53:16 provide a really wide range of sizes

1:53:19 that sort of other high end models— a gpt-oss-120b,

1:53:23 for example, is a fantastic model, but it's too big for a lot of use cases.

1:53:26 And that's where having that range of smaller models is really,

1:53:30 really essential as basis for fine tuning.

1:53:32 STEPHANIE WONG: Yeah.

1:53:35 Played a little bit with— what's the Gemma 4?

1:53:37 It's the MOE model.

1:53:39 It's the 4 billion MOE.

1:53:40 It's crazy, the activation on that in terms of just what it can pull off.

1:53:45 Jay, you mentioned all these cool things that we're launching together.

1:53:49 How should users think about, different methods for changing model sizes based

1:53:55 on the GPUs that they're looking to get?

1:53:57 And then that scale factor on inference,

1:54:01 how to make it the fastest, most performant thing for them.

1:54:04 What do you usually coach your developers on as they think about these things?

1:54:08 JAY RODGE: Yeah, that's a good question.

1:54:10 So if you want to optimize inference,

1:54:13 we work closely with all the model providers,

1:54:15 including Google DeepMind for Gemma.

1:54:18 So what I usually recommend is them trying out TensorRT LLM,

1:54:22 which is an open source LLM optimization inference SDK.

1:54:28 So with just a couple of lines of code,

1:54:30 you can get the best performance available on any kind of NVIDIA hardware.

1:54:35 So we had day zero support for Gemma 4,

1:54:37 and we work closely with Baseten in order

1:54:40 to scale that— JASON DAVENPORT: That's super cool.

1:54:42 JAY RODGE: Yeah, with NVIDIA Dynamo which is also open source.

1:54:45 So those are the two things that I recommend.

1:54:48 There's also NVFP4 which is a precision format by NVIDIA.

1:54:52 So if you have that model checkpoint in NVFP4,

1:54:56 it will give you the best performance in addition to TensorRT LLM optimization.

1:55:01 And if you have a Blackwell GPU.

1:55:02 JASON DAVENPORT: That's super cool.

1:55:04 So Philip, I think you have a demo for us.

1:55:06 PHILIP KIELY: I do.

1:55:07 We're going to let some models.

1:55:08 JASON DAVENPORT: Let's do this, because we're talking full stack podcasts here.

1:55:12 Let's see a full stack in action.

1:55:14 PHILIP KIELY: All right.

1:55:15 Let's pull it up here.

1:55:17 All right.

1:55:18 JASON DAVENPORT: Jay and I are your demo quarterbacks for this.

1:55:21 PHILIP KIELY: All right.

1:55:21 Sounds good.

1:55:22 JASON DAVENPORT: I've got my chair ready.

1:55:23 PHILIP KIELY: I'm the quarterback.

1:55:24 I'm going to throw you some passes.

1:55:25 Let's score some touchdowns here.

1:55:27 So this is Baseten.

1:55:29 Baseten is a inference platform.

1:55:31 We focus on high reliability, low latency inference at scale.

1:55:36 And I have a little demo platform here.

1:55:39 Some of this stuff is real.

1:55:41 Some of it's simulated behind the scenes just because

1:55:44 I didn't bring a 1,000 B200's with me today.

1:55:47 But I'm going to show you how it works that when you have them— JASON DAVENPORT:

1:55:50 You don't have a thousand just in your back pocket?

1:55:53 PHILIP KIELY: I mean, we got them.

1:55:55 They just didn't give them to me.

1:55:56 Something like a customer needed it for a mission critical workload.

1:56:01 I don't know.

1:56:01 JASON DAVENPORT: Those customers, man.

1:56:02 PHILIP KIELY: Yeah.

1:56:03 So we're going to— JASON DAVENPORT: We love our customers out there.

1:56:05 I love the customers, too.

1:56:07 We're just making a joke.

1:56:08 JAY RODGE: We all love customers.

1:56:09 PHILIP KIELY: So this is a actual Baseten account that I

1:56:13 have here with a bunch of models deployed in it.

1:56:16 One of them is the new Gemma 4 model, and I actually have the smallest Gemma 4

1:56:21 model just to show how capable these models are.

1:56:24 So Gemma 4.

1:56:26 This is the E2B model.

1:56:28 It's pretty tiny, but it has a bunch of great capabilities,

1:56:31 including vision input.

1:56:33 And to take any open model from Hugging

1:56:36 Face— we actually have this one prepackaged, so we have it ready to go.

1:56:41 You can just do one click deploy.

1:56:44 And what that looks like,

1:56:45 here you just put in your Hugging Face access token, get a GPU.

1:56:50 I've got it on an L4 just to show,

1:56:52 like, you can scale these models down quite a bit.

1:56:55 And what that looks like when it's

1:56:57 deployed is you have your production deployment.

1:57:01 You can see your logs.

1:57:03 You can see everything that happened to spin up this model,

1:57:09 everything in terms of pulling the weights, building that optimized engine.

1:57:13 And then you can also just try running the model.

1:57:16 So I'm going to give it here this little dog picture and ask what it is.

1:57:21 JAY RODGE: It's cute.

1:57:22 PHILIP KIELY: Sure enough,

1:57:23 a black Labrador puppy with expressive look on its face.

1:57:26 I don't know.

1:57:27 I'd say that's a pretty expressive look on its face.

1:57:30 JASON DAVENPORT: I think that dog's hungry.

1:57:31 PHILIP KIELY: Yeah.

1:57:31 I mean, probably.

1:57:32 I'm hungry.

1:57:33 It's 12:30.

1:57:35 But yeah.

1:57:36 So you can basically run this model and get a OpenAI compatible API endpoint.

1:57:43 Now that's all pretty straightforward,

1:57:45 just being able to put a model on an L4 1,

1:57:48 1 prompt, that's not a groundbreaking revelation at all.

1:57:55 The point of this platform is scale.

1:57:58 So if you look at, I've got this gpt-oss-120b demo up.

1:58:02 The point is the autoscaling, being able to scale up to hundreds or thousands

1:58:07 of replicas of a model to handle large amounts of traffic.

1:58:10 When you have that, we add in the metrics page where you're able to see,

1:58:15 as your inference volume goes up and down,

1:58:17 your replica count goes up and down with it.

1:58:20 And that keeps your response time steady,

1:58:23 even as you double traffic and scale it back down over the course of an hour.

1:58:28 JAY RODGE: So as a developer, I don't have to worry about scaling.

1:58:30 PHILIP KIELY: Exactly, exactly.

1:58:32 With these dedicated deployments, you throw as much traffic as you have at it

1:58:36 and it scales up and down within the parameters of the system.

1:58:40 You can set things like your window,

1:58:42 your scale down delay, concurrency target, all that kind of stuff.

1:58:46 And it just does it for you.

1:58:48 JASON DAVENPORT: Yeah.

1:58:49 I love the notion of an application developer

1:58:53 moving up the stack in the metrics side.

1:58:56 Hey, does it work is great.

1:58:59 But can I actually maintain some level of SLA, with the application, the LLM?

1:59:04 I think that's critical.

1:59:06 PHILIP KIELY: And that's what we do.

1:59:09 We're in the business of meeting SLAs, both on latency and reliability.

1:59:14 And then the last thing I just wanted to show

1:59:16 really quick because I'm obviously here to talk about Gemma,

1:59:19 but I'm also a huge fan of the Nemotron models from NVIDIA.

1:59:23 So maybe you don't need 20 B200's for a given workload.

1:59:27 Maybe you just need a handful of tokens.

1:59:30 That's why you can hop over to a model APIs,

1:59:32 which are traditional like pay per token APIs.

1:59:35 Nemotron Super is a very efficient model,

1:59:37 so we're able to offload at a very, very affordable cost.

1:59:41 And you can, again, just get an API endpoint

1:59:45 that you copy into your application and run these models

1:59:48 on a per token basis with all of these inference

1:59:51 optimizations we've been talking about already built in.

1:59:55 So with Baseten, you can, of course, go in and configure your own things.

2:00:00 You can use TensorRT LLM or any

2:00:02 other inference engine within the Baseten inference stack.

2:00:06 You can configure your model and then deploy it

2:00:10 over to the thing we were just looking at.

2:00:13 JAY RODGE: How do you suggest GPUs?

2:00:15 Like, which GPUs should I use as a developer?

2:00:17 PHILIP KIELY: That's a great question.

2:00:18 So I showed these demos on a few different pieces of hardware.

2:00:24 I think that for me, the big thing is just matching it to the workload

2:00:28 and then making it as cheap as possible.

2:00:31 And the funny thing is that when you do large scale workloads,

2:00:34 as cheap as possible does not actually necessarily mean the cheapest GPU,

2:00:37 because if you have a GPU that costs twice

2:00:40 as much but it handles three times as much volume,

2:00:43 you don't need as many GPUs to run your workload,

2:00:46 and thus you actually lower your TCO at scale.

2:00:48 JASON DAVENPORT: Yeah, sharding— like model sharding and all that.

2:00:51 I mean, there's a whole other discipline of LLM engineering that's off that.

2:00:56 So one of the things I think you mentioned we

2:00:58 actually haven't talked about yet today is Google Kubernetes Engine.

2:01:01 PHILIP KIELY: Yes.

2:01:02 JASON DAVENPORT: Essentially our Managed Runtime for Kubernetes.

2:01:04 How are you using it and what cool things are

2:01:07 you doing or features are you specifically using within it?

2:01:11 PHILIP KIELY: Yeah, well,

2:01:12 I'm much more of the inside of the GPU guy versus between the GPUs guy.

2:01:16 But I know that one thing that we've really appreciated

2:01:19 about the Google Cloud infrastructure is the low latency between models.

2:01:24 A lot of times what our customers are building is

2:01:27 not just one prompt in response to a single model.

2:01:30 It's an agent.

2:01:31 It's a multi-model compound AI system.

2:01:34 And for that, having at the infrastructure level,

2:01:37 very short turns in between one model to the next to the next.

2:01:41 Solving the hairpinning problem— I know that GKE did a great job

2:01:44 of that— saves us a couple dozen milliseconds every turn between models.

2:01:49 And if you add that up over an agent

2:01:51 where you might talk between a model dozens of times,

2:01:55 that can actually lead to real latency savings.

2:01:58 So that's a great feature of GKE.

2:02:01 The other thing is just the flexibility and scale.

2:02:03 We are running massive, massive workloads for our customers.

2:02:07 This one that I had up with 22

2:02:09 B200's is actually a relatively small demo workload.

2:02:14 And being able to access that flexible

2:02:17 capacity via GKE is really massive for us.

2:02:20 JASON DAVENPORT: Yeah.

2:02:21 What I love— I have done admittedly less with Kubernetes over the years,

2:02:26 because I started off in Cloud Run.

2:02:28 And once you're there for doing a lot of DevRel work, I mean,

2:02:31 it's hard to go off of that just because it's

2:02:34 click a button and then it goes away and it runs.

2:02:37 But with LLMs, using Kubernetes and even setting it up,

2:02:42 it is so easy to have something like Gemini say, hey, here's your manifest.

2:02:47 Here's the model.

2:02:48 Oh, just go deploy it.

2:02:50 Yeah.

2:02:51 It's like magic.

2:02:52 And it's like, it's so fun to see that.

2:02:55 Philip, you wrote a book.

2:02:56 PHILIP KIELY: I did.

2:02:57 JASON DAVENPORT: I have it right here.

2:02:59 It's "Inference Engineering." I'm super excited to read it.

2:03:02 Maybe tell us, what's it about?

2:03:04 What are you covering?

2:03:05 And, what's one thing in here that if I read it,

2:03:09 I would be like, ah, that's pretty cool?

2:03:12 PHILIP KIELY: Yeah.

2:03:12 So I joined Baseten actually more than four years ago,

2:03:15 and I've been working in this space since

2:03:18 we were doing XGBoost models on T4's and Whisper,

2:03:23 the original one on A10G's and that kind of thing.

2:03:26 So I've seen this space evolve and had the privilege of learning

2:03:30 all of these topics in practice over the last few years.

2:03:35 I realized as I was helping new teammates onboard,

2:03:37 as I was out in the world talking to developers,

2:03:40 that there isn't a single centralized resource

2:03:43 for everything you need to know about inference.

2:03:45 And so I decided to sit down and write it because I like writing,

2:03:48 and I thought it would be cool to have a book that I could hand people.

2:03:53 I think that the main argument within

2:03:55 this book is that inference is not one thing.

2:03:58 It's not just an inference engine like vLLM or TensorRT LLM.

2:04:02 It's not just a autoscaling framework.

2:04:05 It's everything from CUDA to infrastructure from the on GPU

2:04:10 optimization to all the distributed systems problems all together

2:04:15 in a single stack with the most tight latency requirements

2:04:18 and the highest uptime requirements that you could possibly imagine.

2:04:22 That's what makes it so cool to work on.

2:04:24 And I just really wanted this book to share my love of inference

2:04:28 with the world and help people understand the complexity of these systems.

2:04:33 JAY RODGE: Yeah.

2:04:33 I wish I had this book when I

2:04:35 joined NVIDIA's inference product team five years ago.

2:04:37 JASON DAVENPORT: Yeah.

2:04:38 JAY RODGE: Better late than never.

2:04:40 JASON DAVENPORT: Yeah.

2:04:41 Well, I mean— PHILIP KIELY: At least your co-workers have it.

2:04:44 JAY RODGE: Yeah.

2:04:45 JASON DAVENPORT: All right, guys, last question.

2:04:47 So you're here obviously, for a few days.

2:04:49 What are you most excited about doing here after we hop off of the live stream?

2:04:53 PHILIP KIELY: Well, I'm going back to the booth,

2:04:55 and I'm going to give away some books.

2:04:57 That's been pretty fun.

2:04:58 And then I'm also just really excited to— I always, at these conferences,

2:05:03 meet developers from all over the world and learn what they're building.

2:05:06 And it's exciting to see the adoption of open models

2:05:09 in the real world when I'm talking to people here at this conference.

2:05:13 JASON DAVENPORT: That's super cool.

2:05:14 Jay, what about you?

2:05:15 JAY RODGE: Yeah.

2:05:15 Like Phil, I'll be also going to NVIDIA booth.

2:05:19 I have a demo, cool demo built on Gemma 4 and Google's ADK.

2:05:22 So that's one thing.

2:05:23 And the other thing is talking to developers

2:05:25 and understanding what they are excited about,

2:05:28 the agentic workflows and what are the pain points so

2:05:31 that I can close those using tutorials through my DevRel book.

2:05:36 JASON DAVENPORT: That's super cool.

2:05:37 And for those of you listening along,

2:05:39 please do check out our NVIDIA and Google Cloud community on Cloud Developers.

2:05:44 And with that, thank you, Philip.

2:05:46 Thank you, Jay.

2:05:47 This has been super fun to have you

2:05:49 on here and talk about all things inference for it.

2:05:52 Thank you, everyone.

2:05:53 JAY RODGE: Yeah.

2:05:53 Thank you.

2:05:54 PHILIP KIELY: Thank you.

2:05:54 JASON DAVENPORT: All right.

2:05:54 Talk to you guys soon.

2:05:55 PHILIP KIELY: Good stuff.

2:06:03 [MUSIC PLAYING] STEPHANIE WONG: Hey, everyone.

2:29:48 Welcome back and hello to everyone from day one of Google Cloud Next.

2:29:53 I'm Stephanie Wong, and I'm super excited

2:29:55 because right now I have Yasmeen Ahmad,

2:29:57 who is the Managing Director of Data Cloud here at Google Cloud.

2:30:02 Thank you so much for joining us.

2:30:03 YASMEEN AHMAD: I'm excited to be here, Stephanie.

2:30:05 STEPHANIE WONG: Amazing.

2:30:06 OK.

2:30:06 All things Data Cloud.

2:30:07 There's been a lot coming out now, and so I want to talk about what's changing.

2:30:11 So we keep hearing that the system of intelligence,

2:30:14 really, is changing and evolving into a system of action.

2:30:17 And so can you explain how the agentic Data Cloud is fundamentally

2:30:22 changing the way that our customers think about their own data strategies.

2:30:27 YASMEEN AHMAD: 100%.

2:30:28 We are seeing a rapid shift.

2:30:30 So if I think about the last decade of building data platforms,

2:30:33 we built them as systems of intelligence,

2:30:36 whether it was a dashboard or report giving you an insight on a KPI

2:30:41 or even a sophisticated data science model would give you a predictive score.

2:30:46 But actually, in the real world,

2:30:48 what you found is a lot of those data insights got left on the shelf.

2:30:53 Maybe 10%, 20% of them were made into production,

2:30:57 so they would get into action in the business.

2:30:59 But getting that action step, that productionalization was always very hard.

2:31:04 What we see with generative AI,

2:31:06 and in particular now these agentic systems is driving action is much easier.

2:31:12 So we're fundamentally building up this agentic Data Cloud

2:31:16 that supports getting from not just intelligent gen AI,

2:31:20 but through to true action.

2:31:22 So as we see customers adopting a system of action,

2:31:25 they are looking not just to be storing data.

2:31:28 They want data to be active in the reasoning loop, live, real time,

2:31:32 and then through MCP tools, through skills,

2:31:36 actually driving action into ledgers,

2:31:38 into operational systems, into marketing systems.

2:31:41 That's where true ROI comes to life.

2:31:43 STEPHANIE WONG: Yeah, agreed.

2:31:44 And it's such an exciting time right now.

2:31:46 I just gave a talk and we were just, again, talking about the same thing.

2:31:50 AI systems now can actually take action on behalf,

2:31:52 but it's still fundamental to have a strong data strategy.

2:31:56 Machine readable, structured, real time data that your AI agents can act on.

2:32:01 So now it seems like we're at this pivotal point where they can.

2:32:04 YASMEEN AHMAD: Yes.

2:32:05 And that data strategy is absolutely critical.

2:32:09 So if I just reflect on our journey over the last two, three years at Google,

2:32:14 building data platforms and data agents, we've learned a lot.

2:32:18 In fact, I would say this whole industry's strategy

2:32:22 to AI ready data was quite naive three years ago.

2:32:25 It was all focused on, do we have clean data?

2:32:28 Do we have lineage over our data?

2:32:30 Is there good data quality?

2:32:32 But actually, just focusing on the data layer

2:32:35 only got you to 50% accuracy with agents.

2:32:38 The rest of the 50% comes from great context.

2:32:42 So when I think about it,

2:32:44 I reflect on the data science teams I used to lead in EMEA.

2:32:48 My best data scientists weren't the ones who

2:32:51 could write the maths or the algorithms the best.

2:32:53 It was the data scientists who would speak to the business users.

2:32:56 They would get into the supply chain and really understand,

2:32:59 how does the business work?

2:33:01 What does this data actually mean?

2:33:03 When we're looking at a PDF file, what's that hidden code that's on page 10?

2:33:08 That context was never built into data platforms.

2:33:12 That context was what I call invisible work

2:33:14 that was outside the data platform in the human mind.

2:33:17 And so today, when we're thinking about data strategy,

2:33:21 the data strategy has to combine, yes, having really good solid data.

2:33:25 That data today has to be structured and unstructured data and its context.

2:33:30 It's the hidden meaning.

2:33:32 It's the business intuition that needs to be coded so an agent

2:33:36 can read that context and infer and reason over data accurately.

2:33:40 STEPHANIE WONG: Would you say that one of the things that's enabling

2:33:43 this transition is the fact that there is a semantic understanding of data now?

2:33:47 That AI agents can actually use contextual understanding.

2:33:51 So you don't have to fill in every gap,

2:33:53 but there is a certain level of inferred understanding of your data.

2:33:56 YASMEEN AHMAD: And that's critical is the inferred understanding.

2:33:59 If I take the traditional world of governance, even if it was column names,

2:34:05 description names, role descriptions, business glossaries,

2:34:10 they were all human coded.

2:34:12 It was a human who was spending

2:34:15 tedious amounts of time filling in descriptions which,

2:34:18 frankly, weren't necessarily that great because,

2:34:21 as humans, we don't like doing those jobs.

2:34:24 Now, if you take the power of gen AI and you give gen AI a sample table,

2:34:28 a sample data set, it will infer a lot

2:34:31 of that descriptive information much more accurately than a human did.

2:34:35 But we're also taking it a step further.

2:34:37 When we're thinking about in the knowledge catalog,

2:34:39 disaggregated data and enrichment,

2:34:42 the enrichment is just column names, table names, descriptions.

2:34:47 If we take unstructured data, PDF files,

2:34:51 unstructured data de facto doesn't have a schema.

2:34:55 And if you give a gen AI one PDF document, it'll reason fairly well.

2:35:00 Two PDFs?

2:35:01 Sure.

2:35:02 But actually, in an enterprise, you have thousands of these documents.

2:35:06 You physically can't fit a thousand documents

2:35:09 into the context window of a model.

2:35:10 But even if you could, it would be exponentially expensive.

2:35:14 What you need to do is create that inferred schema,

2:35:17 that inferred meaning across that unstructured data.

2:35:19 And that's what the knowledge catalog does.

2:35:21 It creates that inferred descriptions,

2:35:23 inferred meaning, inferred schema relationships.

2:35:27 And to your point, an agent can now access that context,

2:35:31 learn how to use that data, understand exactly which data it needs to leverage.

2:35:35 So not only is it higher trust, it's also lower cost and more efficient.

2:35:40 STEPHANIE WONG: Right.

2:35:41 Exactly.

2:35:41 And you just touched on some of the capabilities from Google Cloud.

2:35:44 So with Gemini Enterprise, which we just heard about in the keynote,

2:35:48 it's acting as this new front door for data.

2:35:50 So how are we enabling organizations to turn their existing,

2:35:54 let's say BigQuery and Looker assets,

2:35:56 into active, more helpful assistance for their employees?

2:36:00 YASMEEN AHMAD: Great question.

2:36:01 When I think about Gemini Enterprise as the front door, 100%.

2:36:06 It's that single entry place where a business user actually doesn't have

2:36:09 to think about the complexities of data

2:36:11 pipelines or data platforms under the covers.

2:36:14 And so next year, we're introducing even more

2:36:18 integration across the Data Cloud and Gemini Enterprise,

2:36:22 because a business user shouldn't have to worry

2:36:24 about what the data platform specifics are.

2:36:27 Gemini Enterprise is that front door.

2:36:29 They want to chat with their business data?

2:36:31 Well, yes you can.

2:36:33 We're enabling organizations to now create conversational agents in BigQuery,

2:36:39 in AlloyDB, in Looker, and publish them into Gemini Enterprise.

2:36:43 So for a business user who comes to Gemini Enterprise,

2:36:46 they just chat with their business agent.

2:36:48 They don't worry about which data system it's in.

2:36:51 Another really exciting integration that I think

2:36:55 is awesome is the Deep Research agent integration.

2:36:58 So we've had Deep Research agent for a while.

2:37:01 It does phenomenally well in investigating deeply web data,

2:37:06 document data, and giving you Deep Research answers.

2:37:10 What we've done is we've connected

2:37:12 that Deep Research agent to our knowledge catalog.

2:37:15 The knowledge catalog knows about all the enterprise data.

2:37:18 And now a Deep Research agent can

2:37:22 reason over enterprise data alongside web data, alongside documents.

2:37:27 So now you can get these really deep,

2:37:30 rich answers that are very precise and much more holistic.

2:37:34 So an organization can be looking at web patterns, weather, traffic,

2:37:40 and connecting that with their shipping information that is

2:37:44 in their data platforms and getting real time,

2:37:47 proactive strategies to optimize their shipping strategies.

2:37:51 So the ability now to, through Gemini and the Deep Research agent,

2:37:56 get to that level of insight?

2:37:59 It's in a matter of minutes, seconds and minutes,

2:38:02 which a business user traditionally would have had to go spend weeks

2:38:06 with an IT team who would have stitched together all of this information,

2:38:10 and it definitely wouldn't have been real time.

2:38:12 That's all available now.

2:38:13 STEPHANIE WONG: Yeah.

2:38:14 And I think this is a key unlock, because for the past several years,

2:38:17 we've been talking a lot about the foundation model and needing

2:38:21 to fine tune the foundation model according to your own data sets.

2:38:25 But this is a layer of the stack that I think is really powerful,

2:38:29 the AI agent to do things like Deep Research against your own data and the web.

2:38:33 It's a layer of the stack that you

2:38:35 have more flexibility of, more control over as well,

2:38:38 and you can change on the fly.

2:38:39 So it seems like we're just reaching more capabilities now with AI

2:38:43 agents coming into the play to do things like function calling,

2:38:46 RAG, all these other things.

2:38:48 YASMEEN AHMAD: Absolutely.

2:38:49 And we see with customers today, it's not just one agent, a monolithic agent.

2:38:54 It's actually swarms of agents that activate to complete an intent.

2:38:59 And so as a whole at Google,

2:39:01 we see this shift towards intent driven engineering, even as at Google,

2:39:06 when it was two years ago when we started building our first agents,

2:39:10 they were persona based agents.

2:39:12 It was a data science agent to help the data

2:39:14 scientist or a data engineering agent to help the data engineer.

2:39:17 Well, frankly, the models are just so good now,

2:39:20 you don't have to tell them to be one fixed persona like in the human world.

2:39:25 Human world, humans typically become experts in one domain

2:39:29 because we get very good at that one domain,

2:39:31 and we struggle to access multiple domains at the same time.

2:39:35 Well, these models are amazing.

2:39:37 If you give them the right tools and skills, they can actually do an end to end,

2:39:42 get the data, wrangle the data, find the right model,

2:39:46 build a visualization, even build an application and deploy it.

2:39:49 And so that shift in maturity of the models opens doors.

2:39:54 And so as we think about intent driven engineering,

2:39:57 what we see the future as is the data practitioners can focus

2:40:00 on the objectives and outcomes instead of the tasks that have to be done.

2:40:05 And we provide the agents with the right tools, the skills.

2:40:09 That's why we launched the Data Agent Kit here at Next,

2:40:12 because for us, that Data Agent Kit is the plugins, the extensions,

2:40:15 the tools, the skills so that agent can understand natively Google's Data Cloud,

2:40:21 build and optimize BigQuery pipeline, build and fine tune a Spark pipeline.

2:40:27 These agents can be super powerful against Google's agentic Data Cloud.

2:40:30 STEPHANIE WONG: Yeah.

2:40:31 These tools and skills is the action based

2:40:33 intelligence that we're moving towards that you talked about.

2:40:36 So it's awesome that we're coming out with these pre-built abilities

2:40:38 for the agent to just take action on your existing data sets.

2:40:42 I think the challenge, though,

2:40:43 is that data still can be scattered across many places and environments.

2:40:47 So how does our Cross-Cloud Lakehouse support

2:40:51 teams to do open standards like Apache Iceberg

2:40:55 to ensure that customers aren't leaving any

2:40:57 of their data clouds behind in this new agentic era?

2:41:00 YASMEEN AHMAD: This is a great question because I

2:41:03 feel like every customer I talk to, they are multi-cloud,

2:41:06 whether they chose multi-cloud or multi-cloud because they're running a SaaS

2:41:10 application and AWS and their data gravity is in Google.

2:41:15 And so for us, it's about embracing multi-cloud.

2:41:19 So I think one of the challenges around multi-cloud

2:41:21 has been— there's been many vendors spoken about multi-cloud.

2:41:25 It means they run on multiple clouds,

2:41:27 but you still have to choose a cloud and move wholesale to that cloud.

2:41:31 So we really wanted to turn that around.

2:41:34 For us, we believe customers should be able

2:41:36 to connect the data no matter where it lives.

2:41:39 So Cross-Cloud fundamentally is about reaching across clouds to AWS

2:41:43 and Azure through our Cross-Cloud

2:41:45 Interconnect or intelligent buffer and caching,

2:41:48 so customers can leave that data where it is and just see it universally.

2:41:52 But it's not just other clouds, it's also other data platforms.

2:41:55 So we can reach into Databricks with the Unity Catalog,

2:41:59 Snowflake, Polaris, AWS, S3 Glue.

2:42:01 And a critical piece of that is Iceberg.

2:42:05 So why couldn't we do this last year?

2:42:08 Well, the challenge was every single system

2:42:10 would have its own proprietary format of data.

2:42:13 So anytime you wanted to do any federation across clouds or across data systems,

2:42:17 you were building custom pipelines,

2:42:20 and those custom pipelines had to understand that vendor's

2:42:22 data format and how to ingest data in and out.

2:42:26 Iceberg and open standards blew the door open.

2:42:30 Now we have this open standard, universal standard.

2:42:34 We have the Iceberg REST catalog.

2:42:36 So if your data is in Iceberg, in AWS, S3, in Databricks,

2:42:40 in BigQuery, now you can connect and see all of that data.

2:42:45 So that's why we brought the Cross-Cloud Lakehouse to bear,

2:42:48 because we wanted that one single universal plane where users can see all

2:42:53 of their data and they don't have to worry about where it's sitting.

2:42:57 And the other big unlock I would say here is Cross-Cloud Interconnect.

2:43:01 Historically, the big challenge about moving

2:43:03 data across clouds was latency and egress.

2:43:07 Well, frankly, with Cross-Cloud Interconnect,

2:43:10 you can get subsecond latencies and egress is not a big issue anymore.

2:43:15 And so we have customers that are able to move

2:43:18 a petabyte of data and it's not a big challenge.

2:43:21 So those two things coming together,

2:43:23 the Cross-Cloud Interconnect technology with the Iceberg

2:43:27 open standard allowed us to create Cross-Cloud Lakehouse.

2:43:30 And we're just so excited about what customers will be able to do.

2:43:33 STEPHANIE WONG: Yeah, it is an exciting time.

2:43:35 And you just touched on something that I want to dive into, which is the cost,

2:43:38 the performance, the efficiency, egress.

2:43:41 As organizations move from human scale to now agent scale,

2:43:44 cost and performance are going to continue to become very critical.

2:43:48 So how is our AI optimized infrastructure

2:43:50 here at Google Cloud and our serverless approach?

2:43:53 For example, what are we doing with BigQuery Spark?

2:43:56 How is this all helping customers scale their AI ambitions efficiently?

2:44:02 YASMEEN AHMAD: You're 100% right

2:44:03 that as these agents come online, they are hungry.

2:44:07 And it's not just single agents, it's swarms of agents that we are seeing.

2:44:11 In fact, I was speaking in a session

2:44:13 earlier and I spoke about the stat that we're

2:44:15 seeing in the industry where actually the web

2:44:18 API gateways are seeing massive spikes in incoming traffic,

2:44:23 and that incoming traffic is not because

2:44:24 a human has learned to click the mouse faster.

2:44:27 It's because these agents are waking up, these swarms,

2:44:30 and are doing more calls than a human would.

2:44:34 And so typically, for one click of a human,

2:44:37 you're seeing 10 to 20 API calls from agents,

2:44:41 because an agent will go into a multi-step reasoning loop.

2:44:44 And that multi-step reasoning loop might have multiple

2:44:46 iterations as it's hitting the web for information.

2:44:50 So as we see that scale up happening, for us,

2:44:54 what's critical is you have to address the performance and cost complexity.

2:45:01 Your cost can't go up 10x because now

2:45:03 agents are running 10x more inferencing or queries.

2:45:08 At the individual engine level,

2:45:09 we are super focused on making sure each engine is as efficient as possible.

2:45:14 In fact, here at Next we're talking about how over the last year,

2:45:18 we have made BigQuery 35% more— the query processing

2:45:23 speeds have improved 35% while we have reduced costs 40%.

2:45:29 So amazing, amazing things that our engineering teams are doing there.

2:45:34 In our Apache Spark world,

2:45:36 our managed service for Apache Spark, now with the Lightning engine,

2:45:40 is five times faster than just plain vanilla Apache Spark,

2:45:44 two times better price performance than the market proprietary alternative.

2:45:49 So each engine is getting a boost.

2:45:52 But beyond the engines getting a boost,

2:45:53 I think you mentioned something really critical.

2:45:56 We see as an entire stack, because when an agent comes in and does a request,

2:46:02 that request has to go through multiple levels of the stack,

2:46:05 including the data layer,

2:46:06 including the model layer right down to the infrastructure layer.

2:46:10 And so for us, what's important is we actually optimize all parts of the stack.

2:46:15 So today in BigQuery you will see 230x reduction in token usage when running AI

2:46:22 inferencing over BigQuery data because of how we

2:46:25 are integrating and making the stack super efficient.

2:46:28 In addition, just this morning in the keynote,

2:46:30 we announced our next generation TPU.

2:46:33 And right at the infrastructure layer,

2:46:36 we're also doing things like separating the training and inferencing,

2:46:39 because on a single chip, the silicon,

2:46:42 we can't have a traffic jam where gen AI is trying

2:46:45 to read and write information and train and inference at the same time.

2:46:49 So we're driving innovation at every layer of the stack to ensure,

2:46:53 as that request comes in from an agent,

2:46:56 it moves up and down the stack seamlessly at high speed,

2:46:59 and each layer of the stack actually works with the next layer.

2:47:02 And I think that's the magic of Google.

2:47:04 Only Google is working on infrastructure,

2:47:07 the model innovation, the data innovation all together.

2:47:10 STEPHANIE WONG: Yeah.

2:47:11 Truly.

2:47:11 The vertical integration.

2:47:12 Exactly.

2:47:13 You're going to get optimizations that you can't anywhere else.

2:47:15 So absolutely.

2:47:17 So just going back to just understanding that an agent

2:47:20 is only as good as the data that it's grounded in.

2:47:24 So how does the agentic Data Cloud ensure

2:47:27 that these agents are using the most accurate,

2:47:30 real time context from across a company's entire data estate?

2:47:35 YASMEEN AHMAD: Agents having access to the right data

2:47:38 and the right context at the right time is the critical piece.

2:47:41 In fact, we see that if agents actually have too

2:47:44 much context or too much data, they also get lost.

2:47:47 So one of the key pieces of innovation for us has

2:47:50 been not just building a universal context engine with the knowledge catalog,

2:47:54 but it's actually working on the search and serving layer.

2:47:57 So as agents come in and make requests,

2:48:00 we're serving up the right context and the right data.

2:48:03 And so part of that has been actually us

2:48:04 taking the hybrid search stack that was built for Google

2:48:08 Search and bringing innovations from that Google Search into now

2:48:12 the semantic stack that we're building with the knowledge catalog.

2:48:16 So that search stack can not just search for the right semantic information,

2:48:21 but actually it has complex reranking algorithms

2:48:24 that ensure the right context is being ranked,

2:48:28 prioritized, and served back to agents.

2:48:31 So the search and retrieval is just as important to us as is the context.

2:48:36 And being at Google, we're lucky we can use all of the innovation from across

2:48:41 Google Cloud and bring that together now for an agent serving stack.

2:48:45 STEPHANIE WONG: I was going to say the same thing.

2:48:46 It's another thing that we have.

2:48:48 It's in our blood, and we can bring that innovation right over, right?

2:48:51 YASMEEN AHMAD: Yes.

2:48:51 STEPHANIE WONG: Super exciting.

2:48:53 I guess my last question is,

2:48:54 what are you most excited about in the Data Cloud world in this new agentic era?

2:48:58 I know you just touched on a lot, but just looking ahead.

2:49:02 YASMEEN AHMAD: I'm super excited to see

2:49:04 this system of intelligence move to systems of action.

2:49:08 And in particular here at Next, I've now heard three or four different

2:49:12 use cases from customers just this morning

2:49:14 of how they are engaging swarms of agents that are driving true action.

2:49:19 So working through getting the right data, the semantic context,

2:49:22 but also connecting with the Agent Data Kit to be able to take

2:49:27 action across systems and do things that they just weren't able to do before.

2:49:31 So I'm hearing about customers going from 45 minutes

2:49:34 of what took a human process down to a minute,

2:49:37 and frankly, unlocking ROI that seemed impossible before.

2:49:43 So it's those innovative use cases that I am super excited to see here at Next.

2:49:48 I'm excited that we have 18 Data Cloud sessions here,

2:49:52 all hosting at least one customer talking about what they're doing.

2:49:56 STEPHANIE WONG: Amazing.

2:49:57 Well, there's no better time to be a part of this industry

2:50:00 and just see all the actual ROI and impact that's happening actually today.

2:50:04 So I just want to thank you for taking

2:50:06 the time to come onto our live stream, Yasmeen.

2:50:08 YASMEEN AHMAD: Thank you.

2:50:09 Thank you.

2:50:10 STEPHANIE WONG: See you, everyone.

2:50:19 [MUSIC PLAYING] STEPHANIE WONG: Hey, everyone and welcome back.

3:12:17 Super excited we have Khulan Davaajav,

3:12:19 who is a Product Marketing Manager for Gen Media.

3:12:23 So hey, Khulan.

3:12:23 How are you?

3:12:24 KHULAN DAVAAJAV: Hi.

3:12:24 Good.

3:12:25 How are you doing?

3:12:26 STEPHANIE WONG: Good.

3:12:26 Excited to talk about gen media,

3:12:27 because I feel like we don't talk about it enough.

3:12:30 And you are one of the experts here.

3:12:32 So before we dive into the amazing demo that I know that you have planned,

3:12:36 let's set the stage for everyone about gen media.

3:12:39 We've seen a lot of rapid innovation here in this space.

3:12:43 Can you walk us through the full landscape of the different

3:12:47 generative media models that we currently have available at Google.

3:12:51 KHULAN DAVAAJAV: Yeah, of course.

3:12:52 So one thing I would like to point out is that oftentimes, especially at Google,

3:12:57 we say gen media, generative media,

3:12:59 and our customers and developers are like, what is gen media?

3:13:02 What is generative media?

3:13:03 So I wanted to lay the ground and introduce

3:13:05 everybody to the concept of generative media models.

3:13:08 So if you see on my screen now.

3:13:13 Generative media models is an overarching concept

3:13:15 that I like to call for Nano Banana,

3:13:17 which is our image generation model, image generation, editing.

3:13:20 Then we have Veo, our lovely video generation model.

3:13:24 And Gemini Audio is a family of models for transcription,

3:13:28 for text to speech generation.

3:13:31 And lastly, we have Lyria, which is for music generation.

3:13:34 And some of these models we are literally launching almost

3:13:37 every week or every two weeks we have a new update.

3:13:40 So there's a lot happening in this space.

3:13:42 So I just wanted to get everybody on the same board.

3:13:44 So when we do talk about generative media or gen media in a shortened way,

3:13:48 in all our sessions at Next and everywhere else,

3:13:51 everybody knows that it's for creative AI.

3:13:53 STEPHANIE WONG: Very cool.

3:13:54 And so what have you built here today?

3:13:56 What this aesthetic?

3:13:58 I love it.

3:13:59 The visual style is incredible.

3:14:01 We know that Nano Banana gives creators

3:14:03 a ton of control over their artistic direction.

3:14:06 So what exactly are you going to show us here?

3:14:08 KHULAN DAVAAJAV: OK.

3:14:09 So let's dive into a video demo that I

3:14:12 built today that's using all our generative media models,

3:14:15 from image to video to audio as well as music.

3:14:19 And then we're going to dissect step by step

3:14:21 what actually we created for each of these.

3:14:24 So let's play our demo.

3:14:26 So by the way, this is a story of really

3:14:30 me overeating with snacks when I'm working from home.

3:14:33 So just made a mini story about that.

3:14:36 Personal inspiration.

3:14:37 [VIDEO PLAYBACK]- This is the greatest mystery of working from home.

3:14:42 At 9:00 AM you tell yourself, just one jelly bean.

3:14:46 But as time passes, you get exhausted and snack,

3:14:49 not even realizing how much you've eaten.

3:14:51 Then, 5:00 PM hits and a miracle happens.

3:14:55 You shut down the laptop and suddenly this hidden

3:14:59 store of energy is powering you for the night.

3:15:02 But then you crash.

3:15:04 The inevitable sugar crash.

3:15:06 But you can't beat the commute.

3:15:09 This is the glamour of working from home,

3:15:11 and we'll most likely do it all again tomorrow.

3:15:15 [END PLAYBACK] STEPHANIE WONG: That was adorable.

3:15:19 Real life inspiration, right?

3:15:21 KHULAN DAVAAJAV: Exactly.

3:15:22 I think the biggest danger is when I work

3:15:24 from home because I just eat so many snacks.

3:15:26 But even when I come to the office, I'm still eating a lot of snacks.

3:15:29 STEPHANIE WONG: Yeah, it's kind of hard to avoid.

3:15:30 You really have to have self-control.

3:15:32 KHULAN DAVAAJAV: Yeah.

3:15:32 STEPHANIE WONG: OK, so this is amazing.

3:15:33 I love the artistic direction here.

3:15:35 What exactly did you have to put

3:15:37 into the prompt to get this specific texture and lighting?

3:15:40 KHULAN DAVAAJAV: Yes.

3:15:41 So as you see on my screen,

3:15:44 I essentially created with Nano Banana 2 all the frames.

3:15:47 So I kind of storyboarded actually with Gemini, what story do I want to say,

3:15:51 what kind of image do I want to produce.

3:15:54 And I was really going back and forth with Gemini to create the entire story,

3:15:57 including the visuals.

3:15:58 So you see every single image you see here has been animated.

3:16:02 And as an example, I really wanted to get down to this 3D render style,

3:16:08 very smooth, soft touch.

3:16:10 Everything is kind of rounded geometry.

3:16:13 And more than just the texture of the actual character and the video itself,

3:16:17 I wanted to focus also on the camera, what camera it was shot on.

3:16:21 So 33 millimeter film.

3:16:24 That was a huge thing,

3:16:25 that I wanted to have the glossy highlights, halation and luminance into it.

3:16:30 So that's kind of— I would say the one amazing thing about Nano Banana is

3:16:34 that you can really control it with your artistic

3:16:37 kind of decision you want to make.

3:16:39 And that's something that I know a lot of creatives have been really happy

3:16:42 with, that you can really go down

3:16:44 to the deep level of what camera type was used, what lens type was used.

3:16:48 So that's kind of a showcase of that essentially here.

3:16:51 STEPHANIE WONG: I feel like at least from my perspective,

3:16:53 it's hard to even come up with that specificity

3:16:55 as a creative but not that creative person.

3:16:59 So I feel like there are some tricks to coming up with these prompts.

3:17:01 What do you do?

3:17:02 KHULAN DAVAAJAV: OK, so what do I do?

3:17:04 Essentially, I love just going through different creative work.

3:17:10 So I use oftentimes like Behance or Instagram

3:17:12 or other different places just to get inspiration.

3:17:15 I'm like, what do I actually want to create today?

3:17:17 And I feel like as soon as you see hundreds of images, you kind of feel like,

3:17:22 OK, I think for the story I'm going for, I need it to be comedic.

3:17:26 I wanted to be happy and positive.

3:17:29 So for that, I was like, I need things to be rounded, look like 3D render.

3:17:32 I don't want claymation, really.

3:17:34 So it was really me getting inspiration,

3:17:37 then to use that inspiration to give it to Gemini in the Gemini app.

3:17:41 And I was like, hey, help me please distill the artistic style,

3:17:45 the photography kind of terminology in terms of the camera

3:17:49 lens and what type of lighting we want and all that.

3:17:52 And it really helped me dissect every single image

3:17:54 that I gave it to from a texture perspective,

3:17:56 but also from the camera setting perspective.

3:17:59 STEPHANIE WONG: So you can use AI to help you.

3:18:01 And you should.

3:18:01 KHULAN DAVAAJAV: Yes.

3:18:02 You should.

3:18:02 Yeah.

3:18:03 STEPHANIE WONG: So you have

3:18:04 these beautiful static keyframes that you've created.

3:18:06 You now need to bring them to life.

3:18:08 So you're using our newly launched Veo model as I— Veo Light model for this.

3:18:14 So for developers and some of the marketers watching,

3:18:17 why is this Veo Light the right tool for a campaign like this?

3:18:22 KHULAN DAVAAJAV: So Veo 3.1 Light, just launched it about a couple weeks ago.

3:18:27 And it's amazing because it's our most

3:18:29 cost effective model on the market right now.

3:18:31 And, when we say Light, it's not really sacrificing the quality.

3:18:35 As you saw, the quality is really good here too.

3:18:38 And one of the other things I really like

3:18:39 about it is that it has a fast generation speed.

3:18:43 So I think most of the frames are generated under 60 seconds,

3:18:47 which is pretty impressive for a video model.

3:18:50 And I believe— I don't remember the recent pricing,

3:18:53 but it's one of the most cost effective models out there.

3:18:56 So yeah, that's kind of what I did.

3:18:57 And I can show you how I use the model

3:19:00 now to generate the exact frames that you saw.

3:19:03 So in this demo screen,

3:19:06 so you see that the top images you saw them before in the Nano

3:19:12 Banana section that I gave you all the frames that are generated.

3:19:15 So I put the image as the first frame

3:19:17 and then the second image as the last frame.

3:19:19 And one of the prompts I used was just make

3:19:21 the bottom text appear like a puff or a magic.

3:19:26 And it understood my intent, because honestly, I was just typing.

3:19:29 I was like, let's do something.

3:19:30 Let's figure something out.

3:19:31 And it somehow understood that puff is exactly what you see on the screen.

3:19:35 STEPHANIE WONG: Yes.

3:19:36 KHULAN DAVAAJAV: So it really understands intent really well.

3:19:38 And so this is another example where I had the first

3:19:42 frame of the character turning on the radio and then dancing.

3:19:44 When you see it, the way he finishes dancing is exactly in that position.

3:19:49 And then more than that, you don't have to always do first frame, last frame.

3:19:53 I think first frame,

3:19:54 last frame is amazing if you really want to control the beginning and end

3:19:57 of everything that happens in between

3:19:59 of the video that you're trying to generate.

3:20:01 But if you don't really care about the kind

3:20:04 of first and last from how they begin and end, you can also just do first frame.

3:20:08 So here, as an example, I just did an image,

3:20:11 first frame, and then I said, pan the camera around the character.

3:20:15 This was the result of that.

3:20:17 So you don't always have to do two

3:20:19 images to generate for first frame, last frame video.

3:20:22 STEPHANIE WONG: Yeah.

3:20:22 And it still did a really good job at understanding

3:20:25 that— really inferring what it should look like behind.

3:20:28 KHULAN DAVAAJAV: Yeah.

3:20:29 STEPHANIE WONG: It's really cool.

3:20:30 KHULAN DAVAAJAV: And I would say one thing for developers who are watching.

3:20:34 When you are building a video application, let's say with Veo 3.1 Light,

3:20:37 really try to think about how are creatives using

3:20:40 these models and think about the prompts they're inputting.

3:20:42 So, for example, you'll very often hear creatives

3:20:45 using the prompt Dolly zoom or camera panning.

3:20:49 And then you're like, OK,

3:20:50 instead of having now my creatives that have to prompt that all the time,

3:20:53 why don't I build that as a feature inside my application,

3:20:57 where it becomes part of the prompt when a user clicks on that button of 360,

3:21:01 I don't know, degree of camera movement, things like that.

3:21:05 STEPHANIE WONG: OK.

3:21:06 Now I want to talk a little bit about sound,

3:21:08 because a silent video only gets you so far.

3:21:11 And music here, as I understand it, it's been really timed well.

3:21:16 And so you see this sudden shift of the music

3:21:18 changing when there's the disco party at 5:00 PM.

3:21:21 And I know that we just released this Lyria music model late last month.

3:21:25 So how are you actually directing the music

3:21:28 to change exactly when you need it to?

3:21:30 KHULAN DAVAAJAV: Yeah.

3:21:31 So I'm going to show you here.

3:21:33 Basically, Lyria 3 Pro is amazing at really understanding your prompt

3:21:39 when you add these timestamps that you see on the screen.

3:21:42 So one thing I realized is Lyria 3

3:21:44 Pro understands exactly the musical composition of a song.

3:21:48 It has vocals as well, but for this demo,

3:21:50 I only did the instrumental because I was like,

3:21:52 I don't need it to be distracted by a vocal.

3:21:55 But it's amazing at understanding timestamps.

3:21:59 So for example, in the video, you saw that when he starts dancing because

3:22:03 he's turning on the radio— by the way,

3:22:05 all the sound effects in that video were created using Veo 3.1 Light.

3:22:11 So essentially when he's yawning, when he's turning the radio,

3:22:15 it's all created with the Veo 3.1 Light.

3:22:17 And that's amazing because I don't have to go

3:22:19 and find sound effects now because it's part of the video.

3:22:22 STEPHANIE WONG: Or another model.

3:22:23 KHULAN DAVAAJAV: Exactly.

3:22:23 Or another model.

3:22:24 But amazing thing with Lyria 3 Pro is that you see that when he was dancing,

3:22:29 the music all of a sudden went up because I literally prompted that here.

3:22:34 And as well as when he has a sugar crash and is lying down on the sofa,

3:22:37 you see there is a lullaby box sound playing.

3:22:41 And that exactly happened in 0.25 second.

3:22:44 So that's something that's amazing with Lyria 3 Pro,

3:22:47 that it can control— and let me put it this way.

3:22:50 In the past, I would have to go find

3:22:52 different soundtracks or background music from anywhere like online,

3:22:56 and then stitch them together.

3:22:57 Now I'm like, oh, I can just give Gemini my video and ask,

3:23:02 I would like to generate with Lyria 3 Pro a sound,

3:23:05 like a background music, and I need it to be timed exactly based on the video.

3:23:11 Because Gemini has a great multimodal understanding,

3:23:14 it understood that obviously when he's falling asleep, that means lullaby.

3:23:20 I did not actually write this prompt by myself.

3:23:22 Let me put it this way.

3:23:23 Gemini helped me write it.

3:23:24 I just told it, I wanted to be comedic and fun,

3:23:26 and it then wrote me this prompt, which then I put into Lyria.

3:23:29 STEPHANIE WONG: Yeah.

3:23:30 That's incredible, just the amount of control that you can have now.

3:23:34 But it also is AI assisted, so you don't want to fill in all the gaps.

3:23:37 It's like, well, we have a contextual

3:23:39 understanding of what's going on in the scene.

3:23:41 And we can help you fill in those gaps.

3:23:43 KHULAN DAVAAJAV: Yeah, exactly.

3:23:44 And I definitely must say Gemini— the multimodal capabilities

3:23:47 of Gemini are amazing because it really goes frame by frame,

3:23:50 understanding the video, and then helping me generate prompts for music.

3:23:53 STEPHANIE WONG: Oh my gosh, a really cool space right now that's happening.

3:23:56 Now the voiceover, I want to shift to talk about that, because

3:23:59 that's kind of tying all of this together and has so much personality.

3:24:04 And so it doesn't sound very robotic at all.

3:24:07 I know that we literally just launched

3:24:09 Gemini 3.1 Flash text to speech last week.

3:24:12 So how do you direct an AI voice to actually emote and sound human?

3:24:17 KHULAN DAVAAJAV: OK, this is my favorite launch of all time,

3:24:20 I must say, last week.

3:24:21 And I'm going to show you here.

3:24:22 So Gemini 3.1 Flash Live— oh, sorry.

3:24:25 Flash TTS has around 200 tags that you

3:24:29 can control the expressiveness of the voice.

3:24:32 So here, you see I use this prompt.

3:24:34 But again, by the way, I just gave Gemini the 200 tags that it can use,

3:24:40 these square bracket tags, and it helped me build up this prompt.

3:24:44 So essentially, the way you can control

3:24:47 the expressiveness is using one of those 200 tags

3:24:50 and put them in a square bracket that you

3:24:52 see here in the blue here on the screen.

3:24:55 So the positive or panicked.

3:24:57 So every time the model reads it out,

3:24:59 it will read it out as positive or as panicked.

3:25:02 And so you may have also heard him laugh at the end.

3:25:06 I have that tag there as well.

3:25:08 And the thing is some of the other sound effects that— let's say yawning,

3:25:12 you can also generate it through here and then

3:25:14 just clip it and add it into the video.

3:25:16 But one thing that I really like is that you can change the style of the voice.

3:25:21 So for this example, I want it to be comedic, British accent, casual.

3:25:26 So I was like, hey, let me make it British English, first of all.

3:25:29 Then a comedy narrative style and have the person have a British casual accent,

3:25:34 because before it was giving me a very Queen's English,

3:25:37 and I'm like, this is not the video for that.

3:25:39 STEPHANIE WONG: Not the vibe.

3:25:40 KHULAN DAVAAJAV: Not the vibe.

3:25:41 Exactly.

3:25:42 So I really like that you can control the kind of instruction,

3:25:45 add instructions to the style of the voice,

3:25:47 and then on top of that, this expressiveness,

3:25:49 which makes it really fun to play with.

3:25:51 STEPHANIE WONG: It makes you realize how far

3:25:53 we've come from the monotone AI voice generation days.

3:25:57 KHULAN DAVAAJAV: Exactly.

3:25:57 STEPHANIE WONG: Now you can actually control the exact

3:26:00 emotion and exact pinpoint of the video that you want.

3:26:03 KHULAN DAVAAJAV: Exactly.

3:26:04 STEPHANIE WONG: Really cool.

3:26:04 KHULAN DAVAAJAV: And one last thing I want

3:26:06 to mention here is one of our colleagues, actually,

3:26:08 he built a demo where he has Gemini 3,

3:26:11 I think Flash Live adding these tags programmatically onto,

3:26:15 let's say 200 page audiobook.

3:26:18 So you don't have to manually add these tags, page by page.

3:26:21 So just using all our models together, that's where you get really the power.

3:26:25 STEPHANIE WONG: Yeah.

3:26:26 Truly.

3:26:26 It's like a team.

3:26:27 KHULAN DAVAAJAV: Yes.

3:26:28 Exactly.

3:26:28 STEPHANIE WONG: OK, so I hear that you

3:26:29 have one more really cool thing to show us.

3:26:31 KHULAN DAVAAJAV: Yes, I do.

3:26:32 I do.

3:26:33 OK.

3:26:33 So first of all, I just want to summarize,

3:26:36 generative media model that you see now in the video,

3:26:38 we use Nano Banana, Veo, Gemini Audio, and Lyria.

3:26:42 And we have one more thing to show you today,

3:26:44 which is— we just launched this today in preview.

3:26:47 And that is Gemini 3.1 Flash Live with the live avatar feature.

3:26:51 And we're going to actually demo this right now.

3:26:54 Firstly, I just want to show you something.

3:26:57 So currently, the weather in Las Vegas is 77 Fahrenheit high and then low is 56.

3:27:03 The amazing thing about this model is that it connects to Google Search,

3:27:08 so it can answer you with the live data from Google Search.

3:27:11 And so let's try a demo.

3:27:15 I'm going to start a conversation.

3:27:21 Hi.

3:27:21 What is the weather like now in Las Vegas?

3:27:24 The highs and the lows.

3:27:28 [VIDEO PLAYBACK]- In Las Vegas today, you can expect a high of around 78

3:27:34 degrees Fahrenheit and a low of about 56 degrees.

3:27:37 It looks like it'll be mostly sunny during the day and clear at night.

3:27:41 Anything specific you were planning?

3:27:43 [END PLAYBACK] KHULAN DAVAAJAV: No, thank you.

3:27:45 We are now at Google Cloud Next, so we're just living our best lives here.

3:27:55 STEPHANIE WONG: I love it.

3:27:56 KHULAN DAVAAJAV: Yeah.

3:27:57 I need to scream louder.

3:27:58 STEPHANIE WONG: It is a loud environment, but that was awesome.

3:28:01 So it literally pulled live data from Google Search.

3:28:04 And you can create this at any point.

3:28:06 Just create an avatar for yourself.

3:28:07 KHULAN DAVAAJAV: Yeah, exactly.

3:28:09 And there are so many use cases here, like education.

3:28:11 Let's say if you want to learn something and you

3:28:14 don't want to just read a book or just read text,

3:28:16 you can talk and have a conversation back and forth.

3:28:19 And I just feel like our models are really helping people rethink,

3:28:22 how do we actually learn, build, create.

3:28:25 And this is one of the most exciting things I've seen this week where I'm like,

3:28:30 oh, I can see so many different use

3:28:32 cases for this for myself in my personal life.

3:28:34 STEPHANIE WONG: Yeah.

3:28:35 KHULAN DAVAAJAV: Yeah.

3:28:35 STEPHANIE WONG: I think one of the biggest differences is

3:28:37 we've seen AI avatars on the market, but this is live.

3:28:41 So it's not a static video or image.

3:28:44 You can actually interact with it.

3:28:46 You can use it for things like education or training or whatever,

3:28:50 even live video streaming if you wanted to, like we just did.

3:28:54 So that's really incredible.

3:28:55 Are there any aesthetics that you— like, different aesthetics?

3:28:57 Can you kind of prompt it, or how can you— KHULAN DAVAAJAV: Yeah.

3:29:00 So the team has built actually a bunch of different pre-built avatars.

3:29:03 So you can pick one of them.

3:29:04 You can change the voice as well.

3:29:07 And then yeah, you can do a lot with this.

3:29:11 And it's really exciting that it's like live audio, audio to audio.

3:29:14 That's the highlight of Gemini 3.1 Flash Live.

3:29:18 It's audio to audio.

3:29:19 Yeah.

3:29:20 STEPHANIE WONG: So there's a lot going on behind the scenes,

3:29:21 but it's all abstract, in a way.

3:29:23 And that multi-layered, multi-model approach is kind of— KHULAN DAVAAJAV:

3:29:27 Like, lip syncing, for example.

3:29:28 It's really accurate.

3:29:30 STEPHANIE WONG: Yeah, yeah.

3:29:31 OK.

3:29:31 So you just covered a lot.

3:29:32 This is super exciting.

3:29:33 You've been playing around with both generative image models, also video models.

3:29:38 We have this new avatar.

3:29:39 What are you most excited about as a creative today, and what's changing?

3:29:44 What are you excited about?

3:29:45 KHULAN DAVAAJAV: OK.

3:29:46 That's a really good question.

3:29:48 OK.

3:29:49 I'm excited about, I would say two things the most.

3:29:53 Number one is world models.

3:29:55 So as you know, we have Genie 3.

3:29:57 It's not on Cloud yet,

3:29:59 but that's kind of— a world model is something that is really

3:30:02 going to change how people just see and interact with the world,

3:30:05 number one, from a robotics perspective.

3:30:06 But number two, as a creative,

3:30:08 I can now be the camera operator inside a world and I can

3:30:13 just actually go inside the world and be the person kind of doing things.

3:30:19 Whereas with normal image and video generation,

3:30:21 just creating frame by frame, like assets, individual assets.

3:30:24 But in the world model,

3:30:26 you are the camera operator just moving around, which is amazing.

3:30:30 And then there are so many use cases around world models and creativity.

3:30:34 So that's that.

3:30:34 And the second thing I'm really excited about is for the day when

3:30:38 the latency of the time it takes to generate images and videos will lower,

3:30:42 because a lot of creatives will tell you you're in a flow state.

3:30:47 You're creating these amazing things.

3:30:49 And then you have to wait for two minutes or something to generate.

3:30:52 So I think a lot of people are excited in the space around lowering the kind

3:30:56 of time that it takes to generate images

3:30:58 and videos and role models and things like that.

3:31:01 STEPHANIE WONG: Yes.

3:31:01 And that will continue to be the case.

3:31:03 So it is a really exciting time.

3:31:04 I feel like you're becoming more and more of your own director,

3:31:07 but also, as you said, it's like you're taking that first person

3:31:10 POV as an operator to this world model.

3:31:13 So a lot of changing parts in terms of how we were positioned as creatives.

3:31:18 But thank you so much for joining and giving us this awesome demo.

3:31:21 Yeah.

3:31:21 Latest stuff coming from gen media from Khulan.

3:31:24 And just to let you know, everyone,

3:31:25 if you want to try out anything that you've seen in the live stream today,

3:31:28 we are going to link code labs and things that you can actually play with.

3:31:32 So be sure to check out the links that we include in the comments.

3:31:36 All right.

3:31:36 See you all on the next one.

3:31:42 [MUSIC PLAYING] STEPHANIE WONG: Hey, everyone and welcome back.

3:52:26 We are here now with Katie Nguyen,

3:52:28 who is a developer relations engineer here at Google Cloud.

3:52:32 Welcome to the show, Katie, first of all.

3:52:34 KATIE NGUYEN: Yeah.

3:52:34 Thank you so much for having me.

3:52:36 I'm so excited to be here.

3:52:37 STEPHANIE WONG: Yeah.

3:52:37 Well, we just wrapped up a conversation with Khulan about gen media.

3:52:41 She's the creative, crushing it with the prompts.

3:52:43 But how can we actually maybe automate some of the workflows?

3:52:46 And that's what you're going to talk about, right?

3:52:48 KATIE NGUYEN: Yeah, absolutely.

3:52:49 So I'm a DevRel engineer for gen media.

3:52:51 And Khulan is great.

3:52:53 And if you saw her session,

3:52:54 she's really into prompting and a lot of creative terminology that helps you

3:52:58 get really high quality visuals and music and audio from these prompt models.

3:53:02 But I'm going to show you a little bit more

3:53:04 of the programmatic side and how you can maybe automate some

3:53:06 of this, maybe use Gemini to help with some of the creativity

3:53:09 so that you can ideate faster and make it more agentic.

3:53:13 STEPHANIE WONG: Yeah.

3:53:13 OK.

3:53:14 Well, what are some of the benefits of using

3:53:16 an agent to generate these kind of media assets?

3:53:18 KATIE NGUYEN: Yeah.

3:53:19 I think that when you're doing this and you want to create a full story,

3:53:23 with images, with videos,

3:53:24 with music, with narration layered in, sometimes keeping track

3:53:27 and making sure that everything is consistent can be a challenge.

3:53:30 But the benefit of doing this in an agentic workflow is that Gemini is

3:53:34 going to keep track of all of this, and the agent has that memory.

3:53:37 It's going to be able to reference and use the previous assets that it

3:53:40 created to really create a whole story and make sure it's really cohesive.

3:53:44 STEPHANIE WONG: All right.

3:53:45 And so you're going to show us a little

3:53:47 bit about what that looks like on Google Cloud, right?

3:53:49 KATIE NGUYEN: Yes, absolutely.

3:53:50 I'm going to be doing a live demo, so bear with me on the live stream.

3:53:54 It's a lot of live going on.

3:53:55 So if we jump right into it,

3:53:58 we're going to start by running— so I've built this agent.

3:54:00 We can talk about the code in a little bit.

3:54:02 But it uses ADK, which is our Agent Development Kit framework.

3:54:06 So I'm going to start by running ADK Web,

3:54:08 which will launch the web UI where we can test some of this stuff out.

3:54:13 So we can see now that it's up and running.

3:54:15 So I'll flip over here.

3:54:16 We can see in the top, I have our character story agent in a new session.

3:54:20 So just to make sure that everything's running I'll just say,

3:54:23 hello, make sure it's online.

3:54:25 It's thinking, so so far, so good.

3:54:28 All right.

3:54:28 So it says, I'm here to help you bring a character

3:54:30 to life and tell their story through images, video, and sound.

3:54:34 So to get started, do you have a character idea, or should I make one up?

3:54:38 STEPHANIE WONG: Ooh.

3:54:39 KATIE NGUYEN: I've been doing things like a friendly

3:54:41 neon robot or a dog getting into trouble.

3:54:44 STEPHANIE WONG: Oh, I like the dog getting into trouble.

3:54:46 Yeah.

3:54:47 Let's go with that.

3:54:47 KATIE NGUYEN: Let's do it.

3:54:49 So let's generate a story about a dog

3:54:56 that's home alone and getting into trouble.

3:55:01 STEPHANIE WONG: Classic.

3:55:02 KATIE NGUYEN: Yeah.

3:55:03 Do you have a dog?

3:55:04 STEPHANIE WONG: I wish.

3:55:06 Oh, I wish.

3:55:08 KATIE NGUYEN: OK, so Gemini's back.

3:55:09 That sounds like a fun and relatable concept.

3:55:12 Let's nail down some details.

3:55:13 Oh, it wants a lot of things.

3:55:14 But you know what?

3:55:15 Let's say— I have a Shih Tzu at home, so let's maybe use that.

3:55:21 The dog is a Shih Tzu.

3:55:27 I think I spelled that all right.

3:55:28 Named Lulu.

3:55:31 That's not my dog's name.

3:55:32 I don't know where that came from, but we'll go with it.

3:55:39 So the first step of this framework is it's going to generate

3:55:43 a sample character image based on this description using Nano Banana 2,

3:55:46 which I think Khulan talked a little bit about.

3:55:49 And the real benefit of this is it's able to take a lot of the natural language

3:55:52 and preserve those details when generating outputs.

3:55:55 So here we're going from a pure text to an image prompt,

3:55:58 and you can see that I didn't give it a lot of storytelling on the background,

3:56:02 but it's gone ahead and filled in a lot

3:56:04 of the blanks to help with the output a little bit more.

3:56:10 OK.

3:56:10 And it looks like it's jumped right into the scene.

3:56:13 So essentially, we'll take a look at the code.

3:56:15 But what I did here is I asked

3:56:17 it to create a storyline of three different scenes,

3:56:20 generate images for all of those scenes,

3:56:23 animate all of those scenes into videos,

3:56:25 and then go ahead and layer in narration.

3:56:29 OK.

3:56:29 And it's back.

3:56:31 STEPHANIE WONG: Amazing.

3:56:33 KATIE NGUYEN: Meet Lulu.

3:56:34 OK.

3:56:34 And let's see.

3:56:36 Where is this image stored?

3:56:41 Which is the other added benefit of agents is

3:56:44 that you can interact with them as a creative collaborator.

3:56:47 Stored in a folder named lulu_story.

3:56:50 So let's go check that out right here.

3:56:52 We can see in my local folder where the agent is running, we have lulu_story.

3:56:57 And it looks like this is one of the sample images.

3:57:00 STEPHANIE WONG: Aw.

3:57:00 KATIE NGUYEN: Lulu has a bow and everything.

3:57:03 STEPHANIE WONG: Does she look like she's up to no good?

3:57:05 KATIE NGUYEN: She doesn't, actually.

3:57:07 She looks like she's like a perfect little— STEPHANIE WONG:

3:57:10 The second you walk out the door, a different story.

3:57:13 KATIE NGUYEN: So let's just say for the sake of time, that looks great.

3:57:22 OK.

3:57:23 So now that it's gone ahead and it's

3:57:25 generated a character that we think is good,

3:57:27 now it's going to go into that story arc

3:57:29 that I was talking about where it generates the three scenes,

3:57:31 where it animates those into videos.

3:57:34 It layers in audio with Lyria and with Gemini text to speech,

3:57:38 which is the new 3.1 model we just launched that Khulan was talking about where

3:57:42 you're able to add in wonderful layers

3:57:44 of audio and nuance with those audio tags.

3:57:49 All right.

3:57:50 And we can see that, from everything that I've done,

3:57:53 I've just basically prompted with really simple prompts.

3:57:56 I've basically said all I want is a certain dog.

3:57:58 I've given the type of breed and a name,

3:58:00 and it's gone ahead and completely reimagined her story from there.

3:58:03 So you can see the prompt that it's using to generate the first video.

3:58:07 Lulu the Shih Tzu is in a kitchen standing on her hind legs,

3:58:10 reaching for a plate of cookies.

3:58:12 Trouble.

3:58:14 But you can see that using an agent kind

3:58:16 of handles a lot of the prompting for you,

3:58:17 where you're able to just focus on maybe your creativity or maybe your output,

3:58:21 where you don't want to have to fine tune the prompts quite so much.

3:58:24 Gemini is able to help you in that process,

3:58:26 which is an added benefit of doing this a little more objectively.

3:58:29 But while this is generating, we can pop over and take a look at the code.

3:58:36 Still see Lulu.

3:58:37 So this is our agent using ADK.

3:58:41 We defined some tools, all using these on Google Cloud.

3:58:45 So we're generating the Veo video first and we're

3:58:49 doing this with the Google gen AI SDK.

3:58:51 You can see we're able to configure things like resolution,

3:58:54 aspect ratio, number of videos.

3:58:57 For this step, we're going to make sure that every

3:58:59 video is six seconds long just for the sake of time.

3:59:01 But we could, of course, make this a little bit longer.

3:59:06 And then for these other ones,

3:59:07 we actually have these really cool MCP servers built

3:59:10 to access our gen media models on Google Cloud.

3:59:13 And so what we've done here is we've used

3:59:15 those MCP servers for things like contacting Gemini on Google Cloud,

3:59:19 which is going to access the text to speech models,

3:59:22 and Nano Banana, which is our image generation, and Lyria for music.

3:59:26 And then we have different AV tools like FFmpeg and open source tools,

3:59:30 where they can combine all of those things together so

3:59:32 that we don't have to worry about the video editing process,

3:59:35 which is another benefit of doing a lot of this from the command line.

3:59:39 STEPHANIE WONG: I see.

3:59:40 Yeah.

3:59:40 Because from the prompting perspective,

3:59:42 you really didn't have to provide much detail,

3:59:44 but you've already set up basically the logic and the structure that you need

3:59:47 for the agent to kick off this automated

3:59:50 process from image to video and include sound.

3:59:52 KATIE NGUYEN: Exactly.

3:59:53 And fingers crossed we can check back in that this is

3:59:56 still running in the background, which it is.

3:59:58 We're able to talk about other things and iterate new ideas rather than looking

4:00:02 at the model output super individually

4:00:04 and making sure that the outputs are concise,

4:00:06 because the agent's going to handle a lot of that, or consistent.

4:00:10 STEPHANIE WONG: Right.

4:00:11 Now just kind of talking about the story and the character consistency,

4:00:13 how is the agent able to keep

4:00:16 the story and the characters consistent through this?

4:00:18 KATIE NGUYEN: Yeah.

4:00:19 I think one of the really awesome parts

4:00:21 of this is that for a lot of these media models,

4:00:23 you can start with an input image.

4:00:25 You can do that for music.

4:00:26 You can do that for image analysis.

4:00:28 If you're generating narration variation,

4:00:30 you could do that for editing images and also for image to video with Veo.

4:00:35 And so I think that by generating the first image here of Lulu,

4:00:38 which I'll bring her back up.

4:00:41 Generating the first image here,

4:00:43 that it's really able to take that image and use

4:00:45 that as a baseline for all the creation that it's doing going forward.

4:00:48 So you're able to say, regenerate this dog in a different context.

4:00:52 And then from there,

4:00:53 you make sure you have the same dog and familiar settings and the home

4:00:56 scenario to go ahead and create a video from that single reference image.

4:01:03 We can go ahead and check on some of the outputs here.

4:01:06 So you can see all of these calls right here

4:01:09 are all tool calls that we have from the MCP servers.

4:01:13 So we're using Gemini image generation, which is Nano Banana,

4:01:17 and then we're using Veo, Gemini, and Veo again.

4:01:23 So we can look in our local folder here.

4:01:26 We close out of this.

4:01:31 And in lulu_story, we can see a lot of what's happening.

4:01:34 So we have scene one.

4:01:35 We've already generated the images for that.

4:01:37 So talking about that consistency,

4:01:39 we see the same dog right here going for that plate of cookies in a setting

4:01:44 that looks really familiar to the initial living room that we had her in.

4:01:49 And so that's scene one.

4:01:51 We can see that in real time, we generated scene two.

4:01:54 Oh.

4:01:55 She got into even more trouble.

4:01:57 And that's the fun part about using Gemini as a creative collaborator,

4:02:01 because Khulan is so great at this stuff.

4:02:03 But my background is in engineering,

4:02:04 and so maybe some of the creativity doesn't come super naturally.

4:02:07 And that's where you can use Gemini

4:02:09 to generate ideas and work with it, play around.

4:02:12 And maybe I liked some of them better than others,

4:02:14 so we can play around with that.

4:02:16 STEPHANIE WONG: Nice.

4:02:17 Yeah, it's good for inspiration and just kind

4:02:19 of letting it come out with the outputs.

4:02:21 If you're like, I don't know where to start.

4:02:23 It's like the blank slate problem.

4:02:24 KATIE NGUYEN: Yes.

4:02:25 Yes.

4:02:26 Absolutely.

4:02:27 Let's see what else we have here.

4:02:28 Oh, now she's done.

4:02:31 STEPHANIE WONG: Trouble has been complete.

4:02:32 KATIE NGUYEN: Trouble has ended.

4:02:34 She ate a chocolate cookie, so that's really not good.

4:02:37 STEPHANIE WONG: Yeah.

4:02:38 KATIE NGUYEN: But it looks like she's sleeping peacefully.

4:02:41 STEPHANIE WONG: Yeah, she's satisfied.

4:02:42 KATIE NGUYEN: Yes.

4:02:43 Exactly.

4:02:44 We can see— so right now if we flip back over,

4:02:48 we can see that the three scenes have all been

4:02:50 generated from input images using the OK Gemini image generation tool.

4:02:56 And then we generated all of the subsequent

4:02:58 Veo video clips with the Veo generate tool.

4:03:01 And then we've gone through and generated Lyria.

4:03:04 You can see it's calling Lyria,

4:03:06 generate music and audio with the Gemini text to speech tool.

4:03:09 And it's all done here.

4:03:12 So Lulu's mischievous adventure is complete.

4:03:16 It kind of tells about it.

4:03:17 So we saw in scene one, reaching for the cookies in the kitchen.

4:03:21 Scene two, a snowy mess of shredded toilet paper.

4:03:24 Scene three, a sleepy, innocent Lulu on the sofa.

4:03:27 And it saved it right here to lulu_home_alone_story.

4:03:31 So let's go check that out.

4:03:35 STEPHANIE WONG: Can't wait to see the output.

4:03:37 KATIE NGUYEN: Lulu_home_alone.

4:03:39 OK, let's make sure I have the right one.

4:03:40 All right.

4:03:42 So this is the first time I'm hearing this too.

4:03:44 So if we need to iterate, we have Gemini as an agent to help us.

4:03:48 STEPHANIE WONG: There we go.

4:03:49 KATIE NGUYEN: Let's cut to the video.

4:03:51 [VIDEO PLAYBACK]- Lulu the Shih Tzu was home alone,

4:03:55 and the trouble began in the kitchen with some cookies.

4:04:00 Then she turned the hallway into a toilet paper winter wonderland.

4:04:06 When her humans returned,

4:04:08 they found a perfectly— [END PLAYBACK] STEPHANIE WONG: OK.

4:04:12 Well, it did the six seconds, though.

4:04:14 KATIE NGUYEN: Yeah.

4:04:14 So it cut off a little bit at the end,

4:04:16 but you can see it matched the narration perfectly to the scenes it

4:04:19 generated and added them in and fit different voices and stuff like that.

4:04:24 We could even go back to the agent now and say,

4:04:26 maybe the music was a little too quiet.

4:04:30 That's great, but the music was a little too quiet.

4:04:35 Can you make it a little louder in the final video?

4:04:41 And that's also what's great about using agents is you can

4:04:44 prompt with natural language to do a lot of these things.

4:04:46 Instead of remembering all the API calls or writing all

4:04:49 the code for this, you can give agent the tools to do

4:04:52 this, and then you can prompt it through natural language

4:04:54 to go off and make all these decisions on its own.

4:04:57 STEPHANIE WONG: Yeah.

4:04:57 I was going to ask about that.

4:04:58 Just like, what's the benefit of using

4:05:00 a skill or adding a skill to this workflow?

4:05:03 KATIE NGUYEN: Yeah, that's a really great question.

4:05:06 So if we go over back here, close out of Lulu,

4:05:11 to the actual LLM agent code that we built using ADK,

4:05:17 essentially we give a really long instruction

4:05:19 prompt about everything we want it to do.

4:05:21 So we covered basically everything I articulated where

4:05:23 we want to use image generation with Nano Banana.

4:05:27 We want to generate each scene, generate a voiceover,

4:05:30 generate background music, and do all of that.

4:05:33 But this is a lot of text in an instruction.

4:05:35 And so every time we're sending that call,

4:05:37 we're sending this whole instruction text in the context to the LLM.

4:05:41 And so what we can do instead is separate some

4:05:43 of this logic out and make it even more robust agent skill.

4:05:47 And so then we can load that into this framework,

4:05:50 which would help clean up some of this and maybe

4:05:52 give it even more resources to make the audio even better.

4:05:55 So speaking of audio, if we go over here to gen media voice director,

4:06:02 which is another open source skill that's actually all available through

4:06:06 this repository if you want to go ahead and check that out.

4:06:11 So we have right here in this experiments folder, mcp-genmedia,

4:06:14 where we have all the MCP servers I used in this demo,

4:06:18 and then some sample agents and then all

4:06:20 of these skills that we're talking about now, which is really awesome.

4:06:23 But for the gen media voice director skill,

4:06:26 essentially it's a— let's reformat this here, or look at it in a markdown file.

4:06:35 But essentially what you can see is it talks about giving even

4:06:38 more information to the model about how to construct some of these things.

4:06:42 So the voice director, it has this explanation.

4:06:44 You're an expert audio director.

4:06:46 It talks about some of the core capabilities,

4:06:48 and basically is giving the agent all this information.

4:06:51 So Khulan talked a lot about the expressive audio tags,

4:06:54 and you can see that it tells the agent how

4:06:56 to do that, because we didn't provide that necessarily in the instruction.

4:06:59 So putting this information in a skill that it can

4:07:02 just call when it needs to generate that audio part is

4:07:04 really beneficial to the agentic framework and gives it even more

4:07:08 foundation in how to get the most out of these models.

4:07:11 It talks about the tools it needs to call,

4:07:13 which it is doing, but how it can improve these even more.

4:07:17 The model names, the voice names.

4:07:18 That's something else we didn't touch on is with these models

4:07:21 you have access to choose from a bunch of pre-built voices.

4:07:25 And I think agentically when it was in this framework,

4:07:27 since we didn't give it information on that, it

4:07:29 was kind of just picking which one.

4:07:30 But we could say, give a voice profile of each of the pre-built voices and say,

4:07:35 this is a sweet, innocent voice.

4:07:37 This is one that would maybe fit more into a calm and soothing demeanor.

4:07:41 So you can make your agentic framework and your creative assets even

4:07:45 more cohesive by providing even more information through these types of skills.

4:07:49 STEPHANIE WONG: Right.

4:07:49 And Khulan kind of gave us a sneak

4:07:51 peek of that because hers was very human-like.

4:07:53 It had emotions, but it's about combining a description

4:07:56 of the type of voice with some of these emotion tags,

4:07:58 and it's all something that you can actually automate and abstract

4:08:01 away so you don't have to keep prompting it every time.

4:08:03 KATIE NGUYEN: Exactly, exactly.

4:08:05 And we can offload this to a lot of Gemini,

4:08:08 to be able to try to understand this a little bit better,

4:08:10 and you can do this with a whole bunch of different skills too.

4:08:13 Like I just showed the audio one,

4:08:14 but if you go back in here into this GitHub repository, we have different ones.

4:08:19 If you're a gen media image artist and then this skill,

4:08:24 you can basically see how it talks about how

4:08:26 to get the most out of Nano Banana image generation.

4:08:28 And it gives you different kind of narrative descriptions.

4:08:31 It shows you how to render text with an image,

4:08:33 so you can control even more of these things

4:08:35 and help the prompts to be even more robust,

4:08:37 so that you're able to evaluate some of this content,

4:08:40 and then possibly even have the agent help you evaluate some of it as well.

4:08:44 STEPHANIE WONG: Right, Well, I want to dig into that.

4:08:46 How do you actually do evaluation and evaluate

4:08:49 content in this type of agentic loop?

4:08:51 KATIE NGUYEN: Yeah, that's a great question.

4:08:53 So in this example that we did right here, we didn't really touch on that.

4:08:58 But another really awesome way that we can do is create

4:09:00 an image evaluator agent in addition to the storyboard creative agent.

4:09:04 And we could have it take in and use LLM

4:09:07 as a judge in a way to take in the media,

4:09:09 compare it against the original prompt.

4:09:11 Did it adhere and create an asset that did everything it was supposed to?

4:09:15 Did it stick to the character really true to size?

4:09:17 Is Lulu's bow exactly where it's supposed to be from the initial image?

4:09:21 So a lot of those fine tuned details, that is a lot of human preference

4:09:25 and still requires human evaluation to make sure everything's

4:09:29 exactly the way you want it and make sure that you brought your vision to life.

4:09:32 But also you can do some of this evaluation

4:09:35 and have it as a separate agent within your agentic framework.

4:09:41 Basically you could have the agent fact check every image

4:09:43 that it generated before it turned that into a scene with Veo,

4:09:46 so that we make sure that there's nothing— or for example,

4:09:49 in the audio that we saw,

4:09:51 how it ran over slightly because it was only six seconds.

4:09:53 We could say, oh, that— the agent would be able to detect that and then

4:09:57 regenerate the audio and recreate a script based on the given time frame.

4:10:01 And you could use some of that so that we don't have

4:10:03 to go back and continuously run this but the agent could self-correct that.

4:10:07 STEPHANIE WONG: OK.

4:10:07 And so it sounds like— I think, when it comes to LLMs,

4:10:10 it feels like evaluation is possible because it's an LLM.

4:10:13 It's a language model.

4:10:14 With image, you can even do that.

4:10:16 An AI model can still help you evaluate whether or it's

4:10:20 aligning with the instructions for even an image or a video,

4:10:23 which is kind of mind blowing, actually.

4:10:25 KATIE NGUYEN: Yeah.

4:10:26 STEPHANIE WONG: It's cool.

4:10:26 KATIE NGUYEN: Absolutely.

4:10:27 We have a lot of frameworks to do this.

4:10:29 We have to make sure that the output is adhering to the prompt.

4:10:31 It'll generate a bunch of questions based on the prompt

4:10:34 to make sure that the image actually reasons across those.

4:10:38 Gemini is really awesome at multi-modality,

4:10:40 so we're able to analyze those images even with Gemini,

4:10:43 and then fact check a lot

4:10:44 of these questions to make sure that everything's aligned.

4:10:47 STEPHANIE WONG: Yeah.

4:10:48 The contextual understanding of a multimodal output.

4:10:50 KATIE NGUYEN: Yeah.

4:10:50 Absolutely.

4:10:51 STEPHANIE WONG: That's one of the unique capabilities.

4:10:52 KATIE NGUYEN: Yeah.

4:10:52 STEPHANIE WONG: Very cool.

4:10:54 Anything else you'd like to show us before we wrap up?

4:10:56 KATIE NGUYEN: Well, let's see how Lulu's final image

4:11:00 or video did if we were able to switch the music.

4:11:03 Let's see.

4:11:04 So this said lulu home alone story louder music.

4:11:08 Very creative title.

4:11:15 Louder music.

4:11:16 I just can't see it right there.

4:11:18 All right.

4:11:19 Let's see if we can play this one and see if the music's any louder.

4:11:22 [VIDEO PLAYBACK]- Lulu the Shih Tzu was home alone,

4:11:26 and the trouble began in the kitchen with some cookies.

4:11:30 Then she turned the hallway into a toilet paper winter wonderland.

4:11:36 When her humans returned, they found a perfectly— [END PLAYBACK] STEPHANIE WONG:

4:11:43 I hear that violins rising, so.

4:11:44 I know it's noisy here, but yes, I hear the violins.

4:11:47 KATIE NGUYEN: Exactly.

4:11:48 And so it is able to generate music based on the scene,

4:11:50 but you can use different tools too to adjust that, which is awesome.

4:11:53 You can just prompt that in natural language and this kind

4:11:56 of an agent will be able to assist with that.

4:11:58 STEPHANIE WONG: Very cool.

4:11:59 Well, thanks for showing us how you

4:12:01 can actually bring agent workflows into a gen

4:12:04 media type use case for us creatively

4:12:07 non-creatives that might need some help there.

4:12:10 KATIE NGUYEN: I love that title.

4:12:11 Yeah.

4:12:12 Absolutely.

4:12:13 STEPHANIE WONG: Well, thanks for joining us for the live stream.

4:12:14 And for everyone else,

4:12:15 if you are interested in getting your hands on this and learning,

4:12:19 then we will share the code labs

4:12:21 with you in the descriptions and the comments later.

4:12:24 Thanks again, Katie.

4:12:25 KATIE NGUYEN: Thanks so much, Stephanie.

4:12:26 STEPHANIE WONG: And see you all soon.

4:12:27 KATIE NGUYEN: Bye.

4:12:32 [SIDE CONVERSATION] [MUSIC PLAYING] AJA HAMMERLY: Hey, folks.

4:24:56 I'm Aja Hammerly from the builder relations team,

4:24:58 and I'm here today to talk a bit with one of our GDE experts about AI DevTools.

4:25:03 So let's start.

4:25:04 Can you introduce yourself and tell us a bit about what

4:25:06 you do and how you've been using these products a bit.

4:25:08 TOMEK POROZYNSKI: Hi, Aja.

4:25:09 Thank you for having me here.

4:25:12 My name is Tomek Porozynski.

4:25:13 I'm from Poland, from Bydgoszcz.

4:25:16 I work as a staff engineer at deepsense.ai,

4:25:19 and I'm also Google Developer Expert for Cloud AI.

4:25:22 AJA HAMMERLY: Awesome.

4:25:23 So I'm sure you've built some amazing things recently.

4:25:27 Can you tell me about just one

4:25:28 of them that you're really excited about right now?

4:25:30 TOMEK POROZYNSKI: Oh, just one?

4:25:31 Let me think.

4:25:33 I'm really excited about Google Gemini text to speech APIs.

4:25:38 So I created a workflow where you can change the text to multi-voice audiobook.

4:25:45 And it's pretty neat because there are a couple of things.

4:25:49 When you think about the application like that, you have to see

4:25:53 if the narrator is the first person narrator in the story,

4:25:57 or the first person narrator in the story, try to assign the proper voice.

4:26:01 Then you have to choose,

4:26:03 try to estimate which voice fit perfectly to that character.

4:26:07 Make sure that each character have different voices,

4:26:10 and then split the text into chunks to know which character is saying what,

4:26:16 and then connect everything together.

4:26:18 And then there is a way to also add

4:26:21 some additional noises and extra special effects to the audio,

4:26:24 so that the end effect is really nice.

4:26:27 AJA HAMMERLY: OK, so you've got

4:26:30 the multi-voice— multi-voice storybook with text to speech,

4:26:32 and now you've added special effects to make it even cooler for kids.

4:26:36 TOMEK POROZYNSKI: So honestly, the special effects is still on to-do list.

4:26:39 You know how it is with those kinds of solutions.

4:26:41 AJA HAMMERLY: I know how it is with those kinds of projects.

4:26:42 So can you tell us a little bit about what tools you use

4:26:45 to build this, and then also potentially what tech you use to build this.

4:26:48 TOMEK POROZYNSKI: All right.

4:26:49 So a very useful Gemini CLI,

4:26:52 especially when you combine that with a proper skill.

4:26:55 So there is a proper skill for Live API, Gemini Live API,

4:26:59 as well as Vertex AI or Gemini API Journal API development.

4:27:07 So using those skills,

4:27:09 Gemini CLI is awesome of creating the small proof of concepts.

4:27:15 So the way I wanted to create that, I started

4:27:19 with small chunks and checked if all the chunks worked correctly.

4:27:23 And once I got all the chunks ready,

4:27:25 then I connected them together into one bigger solution.

4:27:28 AJA HAMMERLY: OK, so let's talk a little bit more about the architecture.

4:27:31 So what kinds of technologies did you bring in?

4:27:33 Are there agents involved?

4:27:35 Are there microservices?

4:27:36 How did you build this up?

4:27:37 Because it sounds pretty complicated.

4:27:38 TOMEK POROZYNSKI: Oh, yeah.

4:27:40 I tried not to complicate it too much because

4:27:42 I open source the solution and— AJA HAMMERLY: Oh, awesome.

4:27:44 TOMEK POROZYNSKI: Yeah.

4:27:45 I use that on the DevFest talks to actually show people how that can be done.

4:27:50 And the end product I created after having the proof

4:27:54 of concept was moving the solution to Google Colab.

4:27:58 So you can actually run each separate cells,

4:28:02 separately, understand the code, see the nodes, what's happening,

4:28:06 and try to change that so everyone could get the notebook

4:28:09 and play with that and adjust that to their needs.

4:28:12 And I think that's a cool way of learning and cool way of touching

4:28:17 the text so you can actually start with something and build on top of it.

4:28:21 AJA HAMMERLY: So is the way it's laid out that you have,

4:28:24 one Gemini, one piece of code that parses through the story and figures out?

4:28:29 Is it multiple calls to Gemini?

4:28:30 Is it one call to Gemini, like the Gemini API and another call to text speech?

4:28:34 Tell me a little bit more about how it works.

4:28:36 TOMEK POROZYNSKI: So some parts have to go one by one.

4:28:40 For instance, you have to understand how many characters there are in the story,

4:28:44 then to split the text.

4:28:45 But once you've got the text split,

4:28:46 then that you know that this character has one, two, three,

4:28:49 or four different pieces and the other one has another one,

4:28:54 you can send the request at the same time to speed up the process a little bit.

4:28:57 So it's a bit of combination.

4:28:59 So part of the program has to go in the linear way,

4:29:01 and the other part of the program can

4:29:03 be streamlined into multiple calls at the same time.

4:29:06 AJA HAMMERLY: So you've got a part of it that has to be done in series,

4:29:09 but that last step can be done in parallel.

4:29:11 TOMEK POROZYNSKI: Exactly.

4:29:11 AJA HAMMERLY: Oh, that's really, really cool.

4:29:13 So you said you built this with Gemini CLI

4:29:14 and you talked a little bit about the agent skills.

4:29:16 Can you tell me a little bit about how the agent

4:29:18 skills were helpful for you and which ones you found particularly helpful?

4:29:21 TOMEK POROZYNSKI: Oh, right.

4:29:23 The challenge with building the applications that are actually using

4:29:28 generative AI models and generative AI SDKs is that those technologies

4:29:34 is evolving so fast that sometimes the model itself is

4:29:38 not aware of up to speed or up to date information.

4:29:42 That's where the skills come very useful,

4:29:45 because the skill can actually give the links to additional resources so

4:29:50 the model knows where to look for the up to date information.

4:29:53 And you have the guarantee that the model will actually use

4:29:56 the best solution or the best approach to solve the problem at hand.

4:30:03 AJA HAMMERLY: So did you use some skills for the Gemini APIs so that you could

4:30:06 make sure you had the up to date API interfaces and the model names and stuff?

4:30:09 TOMEK POROZYNSKI: Yes, exactly.

4:30:10 Yes.

4:30:10 Gemini APIs, and also for Gemini Live API as well,

4:30:14 because I tried to also incorporate the Live session as well

4:30:18 as a kind of proof of concept while doing the experimental phase.

4:30:22 But at the end, I decided that Gemini text

4:30:25 to speech is good enough for that particular solution.

4:30:27 AJA HAMMERLY: Awesome.

4:30:28 OK, so I've built some stuff with AI.

4:30:31 You've built some stuff with AI.

4:30:32 I don't know what your experience has been like.

4:30:35 Doesn't always go exactly the way I planned.

4:30:37 I always run into something.

4:30:38 So can you tell us a little bit about maybe something you learned

4:30:41 or something that was a little more challenging and how you worked through that?

4:30:44 TOMEK POROZYNSKI: Yeah.

4:30:45 So the thing with working with generative AI is

4:30:49 that not everything works out of the box always.

4:30:53 So sometimes you create your perfect bomb, send it,

4:30:55 and the solution is not necessarily perfect of the first time.

4:30:59 But that's good, right?

4:31:00 You have to just go through it, try again.

4:31:03 Maybe ask model or ask AI agent to fix it,

4:31:07 try to run it, depending on the AI tool you're using.

4:31:10 For instance, if you use Antigravity,

4:31:13 the Antigravity can actually spin up the Chrome browser

4:31:17 and actually try to click through the solution if

4:31:19 your solution has the graphical interface or component

4:31:21 of that, which speed up the troubleshooting process a lot,

4:31:26 or it can be fully or almost fully automated.

4:31:30 So I guess my advice would be not

4:31:34 to give up when the first version is not perfect.

4:31:39 Just kept working on that a bit more.

4:31:40 AJA HAMMERLY: OK, so you talked about an idea there that I

4:31:42 think is really cool that I don't think a lot of folks realize,

4:31:45 that you can actually ask the model to help you improve your code,

4:31:49 and ask the model to help you improve your prompts.

4:31:51 Do you do that?

4:31:52 TOMEK POROZYNSKI: Yeah, definitely.

4:31:54 So one thing that I think is sometimes

4:31:56 overlooked is the plan phase or the brainstorming phase.

4:31:59 So before even starting writing any type of code,

4:32:02 regardless if I'm writing the code or the AI agent is writing the code,

4:32:06 I like to have a fairly long conversation about my idea, the tech stack,

4:32:11 and then ask it to do the research if maybe

4:32:14 my idea is not perfect or try to optimize it.

4:32:17 And even when the program or the application

4:32:21 creating is done and something is not perfect, you can still change it.

4:32:27 Sometimes it takes a bit longer,

4:32:29 but it's better to invest a bit more time and create

4:32:32 something better than to have something partially working, partially not.

4:32:35 AJA HAMMERLY: Yeah, I do the same thing.

4:32:37 I spend a lot of time at the plan phase, having a conversation with the agent,

4:32:41 thinking through, talking through different ideas,

4:32:43 thinking through different ideas myself, getting the feedback,

4:32:45 asking if the agent thinks that my idea

4:32:47 was any good before I start writing any code,

4:32:49 because I want to make sure that the idea

4:32:51 is sound before I start with that coding phase.

4:32:53 So that's really, really smart.

4:32:55 So one of the things that I keep getting asked,

4:32:57 so I'm going to ask you, is, how do you get started?

4:33:00 Where do you start with this stuff?

4:33:02 There's agent skills.

4:33:03 There's MCPs.

4:33:03 There's a million AI DevTools.

4:33:06 There's so many models.

4:33:07 Each of the models has a bunch of different features and different APIs.

4:33:11 Where do you get started if you have a cool

4:33:13 idea and you want to turn it into an app,

4:33:16 and now you're trying to navigate this giant— it's just so much.

4:33:21 So how would you help folks get started?

4:33:23 TOMEK POROZYNSKI: That's true.

4:33:24 The problem is that it's very easy nowadays

4:33:27 to be in the feeling of fear of missing out,

4:33:31 because every day you open your social media and you see this new solution,

4:33:35 this new approach, et cetera, et cetera.

4:33:37 And I think you just have to leave that behind.

4:33:40 Just slow down.

4:33:41 Actually slow down and start small.

4:33:43 If you've got an idea and you've got already some coding skills,

4:33:48 I think that the good idea is to give it a try with Antigravity as an IDE.

4:33:53 If you like to work with IDE, Antigravity is a perfect place to start.

4:33:57 If you don't have coding skills,

4:34:00 I think the great idea is to check Google AI Studio,

4:34:03 because— AJA HAMMERLY: I love AI Studio.

4:34:04 TOMEK POROZYNSKI: Yeah,

4:34:04 with Google AI Studio you can just create a simple prompt,

4:34:07 go for the build option, and the solution will be uploaded for you,

4:34:12 deployed for you so you can actually play around and see how good that is.

4:34:15 And again, you can talk with your AI agent and ask

4:34:19 to change that, even take the snapshot of your application,

4:34:22 like the screenshot of the application,

4:34:23 select what you want to change, and it will actually update that.

4:34:27 And it takes a couple of minutes.

4:34:28 It's really, really impressive and really simple.

4:34:31 I mean, really simple.

4:34:32 And once you are ready, you can even integrate that with Firebase Auth.

4:34:37 So you don't need to worry about authentication,

4:34:39 and Firestore for the backend information.

4:34:42 So if you want to have stateful application, it's super simple to do.

4:34:46 And once you are happy with your solution

4:34:48 and you want to share that with the world,

4:34:51 with a single button you can deploy that on Cloud Run.

4:34:53 AJA HAMMERLY: Yeah.

4:34:54 TOMEK POROZYNSKI: And with Cloud Run,

4:34:55 with, I don't know, maybe two or three additional clicks,

4:34:58 you can assign your custom domain,

4:35:00 and suddenly you've got almost production ready application.

4:35:03 It's pretty neat.

4:35:04 AJA HAMMERLY: You've gone from an idea to something

4:35:07 that's actually running in the Cloud really, really quickly.

4:35:09 TOMEK POROZYNSKI: Yeah.

4:35:09 AJA HAMMERLY: Well, I want to thank you so much for sharing

4:35:11 your insights and your ideas and telling us about your project.

4:35:14 It's an open source, so hopefully people can find it on GitHub.

4:35:16 TOMEK POROZYNSKI: Yes.

4:35:17 AJA HAMMERLY: Awesome.

4:35:17 TOMEK POROZYNSKI: Yes, definitely.

4:35:18 AJA HAMMERLY: So thank you so much.

4:35:19 I really, really was glad we got to have a chance to talk today.

4:35:22 TOMEK POROZYNSKI: Sure.

4:35:22 Thank you for having me.

4:35:29 [MUSIC PLAYING] MUHAMMAD FAROOQ: Hey, everyone, this is Muhammad.

4:51:02 And today, I have Omar with me from Google DeepMind.

4:51:05 And we're going to be talking about all things Gemma.

4:51:07 Omar, how are you doing today?

4:51:09 OMAR SANSEVIERO: Hi.

4:51:09 Very good.

4:51:10 Thank you.

4:51:10 MUHAMMAD FAROOQ: All right, so you guys had Gemma 4 launch recently,

4:51:14 and it's been a pretty successful launch.

4:51:16 How do you feel about it?

4:51:18 OMAR SANSEVIERO: Yeah, we are super excited.

4:51:19 So this has been our largest open model release ever.

4:51:22 The community reaction has been amazing.

4:51:24 The model was actually launched just three weeks ago,

4:51:27 and we have over 40 million downloads already.

4:51:30 So we are seeing lots of community excitement around it.

4:51:32 MUHAMMAD FAROOQ: That's pretty impressive.

4:51:34 Can you tell me about what different models were released?

4:51:37 Because I think there is a family of models, right?

4:51:39 OMAR SANSEVIERO: Exactly.

4:51:40 Yeah.

4:51:40 So the models go from a very, very small,

4:51:42 so 2 billion parameters, all the way to 31 billion parameters.

4:51:47 When we designed Gemma,

4:51:48 we made sure that these were models that are developer friendly.

4:51:51 That means that people can actually run these models in their own devices.

4:51:54 So the smallest models can run in a phone.

4:51:57 The largest model is still small enough that it can run in a consumer GPU.

4:52:01 So if you have a workstation, a gaming computer,

4:52:04 you can most likely run these models.

4:52:07 The largest ones are super good,

4:52:09 the most intelligence per parameter, per watt that you can get.

4:52:13 So highly capable models at very developer friendly size.

4:52:17 MUHAMMAD FAROOQ: And what are the different capabilities?

4:52:19 Because I think the smaller models have

4:52:21 a little different capabilities than the bigger ones.

4:52:24 Can you talk about that?

4:52:25 OMAR SANSEVIERO: Yeah, so these are multimodal thinking models.

4:52:28 So the smallest ones are designed for mobile use cases,

4:52:31 so for running directly in the phone.

4:52:33 And they can understand audio.

4:52:36 So you can speak to the model.

4:52:37 It can do speech to translate the text as well.

4:52:40 They can also understand videos and images.

4:52:43 The largest models don't accept audio input,

4:52:45 but they are still extremely capable models that can

4:52:48 do very advanced things on the vision side of things.

4:52:51 So it can understand images, it can understand videos as well.

4:52:54 MUHAMMAD FAROOQ: And these are multimodal as well and multilingual as well.

4:52:57 OMAR SANSEVIERO: Yeah, yeah, yeah.

4:52:59 So the models were trained in over 140 languages.

4:53:03 These are extremely good models.

4:53:04 And one part that is important for Gemma is

4:53:06 that it's not just a model for the US, it's a model for the whole world.

4:53:11 So we really want to make sure

4:53:12 that these models are accessible for the ecosystem.

4:53:14 And there's this thing which we call the Gemmaverse.

4:53:16 So the Gemmaverse is this whole family

4:53:19 of models built around Gemma by the community.

4:53:21 So the community is, for example, picking Gemma, and as it's an open model,

4:53:25 they can go and fine tune the model for a different language.

4:53:29 I come from Peru.

4:53:31 An Indigenous language there is called Quechua,

4:53:33 and people are training Gemma even further

4:53:36 to make it stronger for Quechua to Spanish translation.

4:53:39 MUHAMMAD FAROOQ: Oh, interesting.

4:53:40 Interesting.

4:53:41 What are the different, I guess interesting applications that you have seen

4:53:45 that developers have built in the last three weeks?

4:53:47 OMAR SANSEVIERO: Yeah.

4:53:47 So it's, again, just been three weeks.

4:53:49 I've been seeing people combining Gemma with other models.

4:53:53 So it's an agentic model as well, so it can do function calling.

4:53:56 It can pick different APIs and use them to solve different tasks.

4:54:00 So we have seen people using Gemma combined with other models,

4:54:04 using it as a router or using it to call

4:54:07 maybe a segmentation model that is a very specific use.

4:54:10 So that's been one part that I find exciting.

4:54:13 The other part is that the model is really small.

4:54:15 So we are seeing that the community is able to put the models in very,

4:54:20 very hardware constraint features.

4:54:22 So that's quite exciting to see,

4:54:23 because the reality is that most people out there don't have lots of GPUs.

4:54:27 They don't have lots of TPUs.

4:54:28 So it's very exciting to see what the community

4:54:30 is able to use directly at home in Raspberry Pis,

4:54:34 in Jetson Nanos and very, very small devices.

4:54:36 MUHAMMAD FAROOQ: Yeah,

4:54:37 the community can get really creative with different applications.

4:54:40 Yeah.

4:54:41 You touched upon routing.

4:54:43 So how do you see that?

4:54:45 Because normally when people talk about local models,

4:54:47 they either are talking about situations where they care about privacy,

4:54:53 or maybe even the compute resources that are in.

4:54:57 But it comes with a cost of intelligence.

4:55:02 So how do you balance both of these API providers or these high

4:55:08 intelligence models that are behind an API and a local model?

4:55:11 OMAR SANSEVIERO: Yeah, yeah.

4:55:12 That's a great question.

4:55:14 You're totally right.

4:55:15 If you want the most intelligence, you would use the largest model.

4:55:18 So you would use Gemini.

4:55:19 But there are many things for which Gemma is capable enough.

4:55:22 Agentic things, I do think we are in a world in which 70%,

4:55:25 80% of the things that you want to do in your day

4:55:28 to day may be able to be fulfilled directly locally.

4:55:32 So I think there are a couple of things you can do.

4:55:34 There's a startup from YC called Cactus Compute.

4:55:37 Cactus Compute is doing this thing which is called hybrid inference.

4:55:40 So pretty much you have a local router

4:55:43 that, based on the complexity of the input prompt,

4:55:46 the prompt may be fulfilled locally by a model such as Gemma,

4:55:50 or maybe sent to a server for a model such as Gemini.

4:55:53 So there will be many use cases where you will still want to do

4:55:56 an API call to a proper model that is larger and more capable.

4:56:00 That said, there are many things for which a local model is just good enough,

4:56:03 which is something quite interesting.

4:56:05 So I think that it's also important to note is Gemma is very small.

4:56:08 It has very few parameters compared to much larger models.

4:56:12 And that means that you have a bit of a restriction

4:56:14 in how much knowledge you can put in a single model.

4:56:17 If you use a very small model,

4:56:19 you should not expect that model to know all the facts about the world.

4:56:23 You should use a larger model for that.

4:56:26 So there's still a world in which you will have these larger models or API

4:56:30 calls to specific tools to get more

4:56:32 information about the world or about specific facts.

4:56:35 MUHAMMAD FAROOQ: Yeah.

4:56:36 OK.

4:56:37 Nice.

4:56:37 And for local models,

4:56:38 I want to go back and talk about the type of applications that you see.

4:56:43 So why do we need local models or open source models, for that matter?

4:56:48 OMAR SANSEVIERO: Yeah,

4:56:49 so open access models have a couple of different use cases.

4:56:52 So privacy first or highly regulated areas.

4:56:55 So if you work in the health care domain or in sovereign

4:56:58 AI efforts where the data cannot live on servers at all,

4:57:01 like all the data needs to stay in the local network,

4:57:04 you will need an on-device model.

4:57:06 The second one is if you don't have an internet connection.

4:57:08 There are so many communities in the world

4:57:11 that simply don't have a Wi-Fi connection.

4:57:13 They don't have a satellite connection, but they can still have the model,

4:57:16 maybe in a phone or in a device, and have access to LLM,

4:57:21 a powerful LLM directly locally without having to have this internet connection.

4:57:25 So privacy is sovereign AI use cases,

4:57:29 highly regulated fields, a fully offline setup.

4:57:32 So those are the main ones, I think.

4:57:35 But the other one is around fine tuning.

4:57:37 So as an open model, you have full control of the model and you

4:57:40 can specialize the model for different use cases.

4:57:43 So for example, let's say that you want

4:57:45 the model to become better at a specific domain.

4:57:48 Let's say that in your company, the open models don't work out of the box

4:57:53 for certain specific domains such as finance, for example.

4:57:57 You may still have a data set.

4:57:59 You may still be able to train the model and improve

4:58:01 the capabilities of that model and use that model locally,

4:58:05 or maybe in your own infrastructure, or maybe in Cloud Services.

4:58:09 But you do have the option to have more control over how the model behaves.

4:58:12 MUHAMMAD FAROOQ: So you talked about fine tuning.

4:58:14 And I think a lot of people are

4:58:16 going to be building applications on top of Gemma.

4:58:19 And you guys, I think for the first time, released it under Apache 2.0.

4:58:23 So how did that go and what was the logic behind it?

4:58:30 OMAR SANSEVIERO: Yeah.

4:58:31 So the previous Gemma models had an open

4:58:34 license called— it was a custom Google Gemma license.

4:58:38 It was commercially permissive.

4:58:39 So it was actually quite permissive.

4:58:41 But over the last year and a half we have

4:58:43 been gathering as much feedback as we could from the community.

4:58:45 So we are talking with developers, we are talking with startups,

4:58:48 we are talking with NGOs, we are talking with institutions,

4:58:52 we are talking with enterprises and getting as much

4:58:54 feedback as we can about the model capabilities,

4:58:57 about how the model behaves, about which are the friction points that take up.

4:59:02 A very common friction point was the license.

4:59:04 Many companies have a set of licenses they can use.

4:59:07 They have many restrictions about what they could use or not,

4:59:11 and we saw quite a bit of skepticism with the previous license.

4:59:15 So we wanted to make sure that everyone felt like

4:59:17 they could actually use the model for their use cases.

4:59:20 So yeah, we changed the license to Apache.

4:59:22 It's been just three weeks,

4:59:23 but we have seen lots of excitement around the new license.

4:59:25 So it's still early stages,

4:59:27 but we want to see what the community feels with this model.

4:59:30 So we're really looking forward.

4:59:31 MUHAMMAD FAROOQ: Yeah.

4:59:31 I think that's really great because I think not only in terms of capabilities,

4:59:36 but just having very— clarity on the license itself,

4:59:40 I think, as you said, it makes life easy for everyone.

4:59:44 I do want to touch upon the agentic capabilities, right?

4:59:49 Especially for smaller models.

4:59:50 Where do you see that going?

4:59:52 OMAR SANSEVIERO: Yeah.

4:59:52 So I think small agentic models can actually be extremely capable.

4:59:57 I would not expect this model to be able to pick

5:00:00 a whole code base and refactor the whole code base,

5:00:03 but if you want the model to pick between 5,

5:00:05 10 different tools and make different APIs

5:00:07 calls or tool uses directly on device,

5:00:10 a local model most likely will be good enough.

5:00:13 You may even still fine tune a bit model for your own use cases.

5:00:17 So there is the Android AIx gallery you

5:00:21 can download it directly in your Android phone.

5:00:23 You can play with the model directly locally,

5:00:25 and it even has an option to play with different skills directly locally,

5:00:30 so you can play with a fully on-device experience using Gemma

5:00:34 as an agent and it can do things such as control the device,

5:00:37 turn on the flashlight, do different draft and email and that kind of stuff,

5:00:42 all using tools, all locally.

5:00:44 Of course, if you need to do a web search for a tool,

5:00:47 then you may still need to do an API call.

5:00:49 But I do think we're in a world in which

5:00:51 on-device models are extremely capable for these kind of things,

5:00:54 in which you would need to rewrite the text,

5:00:56 or you need to summarize or translate

5:00:59 or do certain basic to somewhat complex agentic codes.

5:01:05 MUHAMMAD FAROOQ: All right.

5:01:06 That's great.

5:01:07 Thanks for the awesome release.

5:01:09 Congratulations, and thanks for the time.

5:01:12 That is a wrap.

5:01:13 OMAR SANSEVIERO: OK.

5:01:14 Thank you so much.

5:01:15 MUHAMMAD FAROOQ: Thank you.

5:01:28 [MUSIC PLAYING] SAM WITTEVEEN: Hello, and welcome to Google Cloud Next.

5:12:25 And I'm joined by Logan Kilpatrick.

5:12:28 Logan, great to have you here.

5:12:29 LOGAN KILPATRICK: Thank you.

5:12:29 I'm excited.

5:12:30 This is a super cool setup.

5:12:33 It's awesome to be literally live in the venue right now.

5:12:35 SAM WITTEVEEN: Exactly.

5:12:36 We're sitting right with hundreds of people around us and stuff like that.

5:12:42 Walking around, what have you seen today that's been interesting?

5:12:45 LOGAN KILPATRICK: Yeah.

5:12:46 It feels like the era of agents is upon us.

5:12:48 I think this morning's keynote you heard from TK and from many

5:12:51 other folks about just how much— on the Google Cloud side,

5:12:56 how much progress has been made from a platform

5:12:58 perspective to bring agents in all these new ways.

5:13:00 And so it feels like it's being embraced by the ecosystem.

5:13:03 It's being embraced by Google and our partners.

5:13:07 And it's interesting to see it actually work.

5:13:09 I think there was— SAM WITTEVEEN: That's what I was about to say.

5:13:11 LOGAN KILPATRICK: 12 months ago we were at Next,

5:13:13 and it was like people hypothesizing about what the future might look like.

5:13:16 SAM WITTEVEEN: There was hype, but no— LOGAN KILPATRICK:

5:13:18 It's delivering, which I think is really, really exciting.

5:13:20 And I think we're at of still inning

5:13:23 or chapter number one of that actually playing out

5:13:25 of, what does the world look like when agents

5:13:28 actually deliver on the mission that I think they can.

5:13:31 SAM WITTEVEEN: It is really fascinating, yeah, talking to people.

5:13:34 And one of the things I'm hearing, and I'm sure you're the same,

5:13:36 is that there are just lots of these use cases that I never would have

5:13:40 thought of and people are getting Gemini

5:13:42 to do all this amazing stuff because it's multimodal,

5:13:44 because it can— stringing with a simple harness,

5:13:47 putting these things together, maybe with a sandbox and stuff like that.

5:13:50 So you're one of the key people at AI Studio.

5:13:55 I think we agreed that we'll say it's your baby, right?

5:14:01 That's been a really interesting ride.

5:14:04 It's been around.

5:14:06 I think originally it was called Maker Suites.

5:14:08 And you've been adding a lot to it.

5:14:11 Do you want to tell us a little bit about of what

5:14:13 some of the big things you've been working on recently there?

5:14:15 LOGAN KILPATRICK: Yeah.

5:14:16 So AI Studio has been— I think we've sort of— I need to come up

5:14:20 with a better way to describe this, but we

5:14:22 have the different eras of the journey.

5:14:24 I think it was originally, it was like prompt to prototype,

5:14:27 get your API key, get off to the races, test the models a little bit.

5:14:32 I think we crossed the chasm probably 18 months ago

5:14:35 to really help people go to production with what they're building.

5:14:39 And I think the takeaway from that was,

5:14:40 we can help so many people do more than just get an API

5:14:46 key and kick around the models and then go off and build.

5:14:49 Why not actually help them build the thing that they want directly in AI Studio?

5:14:54 And I think the whole vibe coding wave that's happened.

5:14:57 So we added this build tab in AI Studio actually last year at I/O,

5:15:00 and we're coming up on a year now of pushing

5:15:03 in that direction to see what's the interesting thing.

5:15:06 SAM WITTEVEEN: So why don't you describe that?

5:15:08 I'm amazed that I still find people that they

5:15:10 don't really understand what the Build tab is,

5:15:13 and you've been adding amazing stuff to it later on.

5:15:15 So how about you start off telling us a little bit about it,

5:15:18 and then we can talk about what's new and stuff?

5:15:20 LOGAN KILPATRICK: For sure.

5:15:20 So it's a full on vibe coding experience.

5:15:22 So you can go literally go from prompt to a working app to adding

5:15:25 a database to actually deploying it using Cloud Run behind the scenes in, like,

5:15:32 minutes, which is really cool.

5:15:33 And a lot of it is actually free as well, which is really exciting.

5:15:36 So we have a huge— millions of people

5:15:38 building apps in AI Studio using all these services,

5:15:42 and they're not spending a ton of money in order to make it happen,

5:15:44 which is something that we think a lot about is trying to make sure

5:15:47 that the technology is accessible in the hands

5:15:48 of this next generation of builders, which is really exciting.

5:15:53 And to your point, the team is shipping lots of stuff.

5:15:55 So we just landed a bunch of really cool things like design previews.

5:16:00 So as you ask for your app— and you can

5:16:02 kick it off right now if folks go to ai.studio/build.

5:16:06 Ask for your app and you can see a bunch of different

5:16:09 iterations of the UI and click through those while it's building,

5:16:11 and then choose which direction you want to go in.

5:16:14 Another one.

5:16:15 This is the classic reimagination of the I'm

5:16:18 Feeling Lucky button in Google Search.

5:16:20 So we're trying to solve this inspiration problem,

5:16:22 which is, people want to build.

5:16:24 They're like, I'm so excited.

5:16:25 The tech are here.

5:16:26 The technology can do all these cool things.

5:16:28 I don't know where to get started.

5:16:30 You can come to AI Studio,

5:16:31 you can go to the Build tab, you can click one button,

5:16:34 and we'll come up with your first app idea for you,

5:16:36 have it connected to the Google ecosystem,

5:16:38 and then you can go and build that thing.

5:16:40 SAM WITTEVEEN: So what I love is that you can also say,

5:16:43 I'm going for the I'm Lucky,

5:16:45 but I want it to have Nano Banana— I want an image model,

5:16:48 I want a database, I want some auth.

5:16:50 LOGAN KILPATRICK: Exactly.

5:16:50 SAM WITTEVEEN: And it will just take all of those things and run with it.

5:16:53 LOGAN KILPATRICK: And we've gone one step further too,

5:16:54 which is if you start typing out your idea— we just landed this.

5:16:58 It's called this feature called Tap Tap Tap.

5:17:01 And as you start to type out your prompt, we'll use of traditional autocomplete,

5:17:06 except it's using Flash behind the scenes,

5:17:08 to come up with the extension of your idea.

5:17:10 So you say, I want an app that uses AI to help me organize,

5:17:15 and then AI will complete that thing.

5:17:18 You can press Tab and then it'll go the next step,

5:17:20 and then it'll go the next step.

5:17:22 And that's where the Tap Tap Tap comes in, because

5:17:24 you just keep— the model will help create your prompt generatively,

5:17:27 which is really exciting,

5:17:28 because I think people have a hard time actually articulating

5:17:31 the breadth of what they can actually do as you're building apps.

5:17:35 And so I think this helps meet people where they are as far

5:17:38 as trying to get their idea into a text box and bring it to life.

5:17:43 SAM WITTEVEEN: It's changing so quickly.

5:17:44 One of the things I find is that stuff that just wouldn't work last year,

5:17:48 perhaps with the previous generation models or something like

5:17:51 that, people went and tried it last year and they're like,

5:17:53 oh, well, that doesn't work.

5:17:54 I'm never going to use it.

5:17:55 It's kind of like, hey, did you use it in the last two weeks?

5:17:58 If not, go and check it out.

5:18:00 That's what I find myself telling people.

5:18:03 And also just what seemed to take— you needed a 20 page PID before.

5:18:09 The model now is much more better at intuiting what it is that you want.

5:18:13 The fact that you can add in that, oh yeah, I want Firestore,

5:18:16 I want a database, or I want some auth and stuff like that.

5:18:19 Is that what you guys are also seeing?

5:18:21 LOGAN KILPATRICK: For sure.

5:18:22 And yeah, so the model in AI Studio

5:18:24 has all this context about how to use Gemini,

5:18:27 how to use Firebase, how to use Cloud Run to deploy things.

5:18:31 And so as you ask for your idea,

5:18:32 it has that context in the background and it knows all

5:18:35 the things that it's capable of and can intelligently put them into place.

5:18:40 I like to say we've given a more opinionated

5:18:44 take on what it looks like to build an app.

5:18:46 And I think there's trade-offs, but it's helpful because it gets people

5:18:51 to something that actually works and uses the technology

5:18:53 faster than they would otherwise be able

5:18:55 to if you had to literally start from scratch, no context, et cetera.

5:18:58 So I think we continue to— AI Studio is a story of we're

5:19:03 taking an opinionated look at what it means to build a vibe coding platform,

5:19:07 and we'll keep pushing in that direction.

5:19:09 SAM WITTEVEEN: That seems to be really smart, though.

5:19:12 I find one of the things that— I guess

5:19:14 a few months ago when it didn't support Next.js.

5:19:16 So now just adding Next.js where— opinionated, I can say, look,

5:19:20 I want you to build something, but I want it in Next.js.

5:19:23 I want it to have a database.

5:19:24 I want some auth.

5:19:26 OK.

5:19:27 I kind of know now that I'm getting best practices on those things,

5:19:30 and I can go wild with it.

5:19:33 Or more importantly, not so much me,

5:19:35 but I find other people that can't even code can do this.

5:19:40 This is a whole thing that we can touch on as well

5:19:43 is that— but I find developers that have never done front end.

5:19:47 Go in there, boom.

5:19:48 They're now amazed that they can build a pretty nice looking front end app.

5:19:54 I think you're also integrating, like you said, the new design stuff.

5:19:57 What's— LOGAN KILPATRICK: Lots of design stuff in the works.

5:20:00 I think we have a whole slew of different things.

5:20:03 The first one is design preview.

5:20:05 So as you're building your app, you can actually iterate,

5:20:08 and it'll show you a bunch of different options.

5:20:10 Actually, once you are— and that's while

5:20:12 the app is doing the initial generation.

5:20:14 We'll be rolling out soon the ability

5:20:15 that, once you actually already have an app generated,

5:20:18 give you multiple different options as far as other themes of what

5:20:22 could this app look if you change a bunch of stuff.

5:20:26 We have a bunch of targeted edits rolling out.

5:20:31 We have this edit mode where you can draw on the preview of your app,

5:20:35 or you can directly select elements and then

5:20:38 regenerate images or change different filters around things.

5:20:41 So that will actually hopefully roll out,

5:20:43 I think this week or early next week, something like that.

5:20:45 SAM WITTEVEEN: It seems like every week you've got something new.

5:20:47 LOGAN KILPATRICK: I mean, honestly, I have a hard time— credit to our team

5:20:50 for pushing so hard and making all this stuff happen.

5:20:52 I genuinely have a hard time keeping up with all

5:20:54 the things that are happening because we're doing so much stuff,

5:20:57 which is a good place to be in.

5:20:59 SAM WITTEVEEN: So that's good to hear for us outside.

5:21:02 If you're having a hard time keeping up, that's good.

5:21:05 Another thing, actually, that you didn't even mention was you've added voice,

5:21:09 so the ability to actually talk to it and I guess

5:21:13 prompt it and get stuff going just purely with voice now.

5:21:17 LOGAN KILPATRICK: Yeah.

5:21:18 I mean, the voice models, which we should also talk about,

5:21:20 the text and speech models, the live models are incredible.

5:21:23 And I think Gemini's really pushed the state of the art as far

5:21:27 as audio capabilities and we have a new sort of a Lyria music playground.

5:21:31 We have a new text to speech playground.

5:21:33 We have all the live models rolling out across the ecosystem.

5:21:36 But we also, in vibe coding experience,

5:21:39 we have this feature which I've come up with all these goofy names for things.

5:21:43 It's called Yap to App.

5:21:45 Yap to App.

5:21:46 So you can go in and you can just say

5:21:48 whatever random garble of words and ideas that you have.

5:21:53 And then the thing that we've done is we've put

5:21:55 Gemini— it's not just a pure text to speech model.

5:21:58 We've put Gemini in there to actually formulate

5:22:01 the idea that you are just putting together in a coherent way so the model can

5:22:05 take action and actually bring that idea to life,

5:22:07 fill in any blanks that you forgot to mention

5:22:10 or shape it in a way that's going to work.

5:22:13 And so we've seen a ton of people using this, which is really exciting.

5:22:16 I think beyond the I'm Feeling Lucky button,

5:22:18 this Yap to App experience is so popular.

5:22:21 It's like the number two most popular thing, which is awesome.

5:22:24 SAM WITTEVEEN: So what about mobile?

5:22:25 One of the things that a lot of people are asking about is,

5:22:28 OK, this is great, but I don't really need to make a website.

5:22:31 I want to make an app.

5:22:31 I want to put it up on the iOS App Store or the Android Google Play Store.

5:22:35 LOGAN KILPATRICK: iOS App Store is contentious— SAM WITTEVEEN: I'm sure, yes.

5:22:39 LOGAN KILPATRICK: There's lots of conversations going on around there.

5:22:41 We definitely have a lot of exciting stuff coming in the mobile space.

5:22:45 The team's been pushing super hard to bring AI Studio to life on mobile,

5:22:49 so we'll have more to share there soon.

5:22:52 And it'll definitely be the first cut.

5:22:54 I think the future for mobile for us is we want to let people build anywhere,

5:22:59 wherever they are, for whatever platform they want.

5:23:03 And so mobile is obviously super important.

5:23:04 Also, as you think about reaching this next 100 million user,

5:23:09 next gen developer audience, a lot of those folks are on mobile.

5:23:13 They're not on a web, laptop, desktop, whatever it is.

5:23:16 So we need to go there as far as reaching those users.

5:23:20 Yeah.

5:23:20 Lots of cool stuff on the Android side as well, collaborating with those teams.

5:23:23 Lots of interesting things on potential on-device models to enable,

5:23:27 which is also really exciting.

5:23:28 SAM WITTEVEEN: Gemma.

5:23:29 LOGAN KILPATRICK: The Gemma 4 models are incredible.

5:23:31 I saw a demo from our team of AI Studio mobile running local models,

5:23:35 which is really exciting.

5:23:37 So don't know if it'll make the cut for the initial release,

5:23:40 but lots of cool things that we're poking around, which is exciting.

5:23:44 SAM WITTEVEEN: So I'm curious to chunk up a bit and ask you,

5:23:46 OK, so we've got vibe coding.

5:23:49 And we know that more traditional or older

5:23:52 developers were against that at the start.

5:23:55 There was a lot— and now we're hearing the term agentic engineering,

5:23:59 which sounds a nice rebranding of vibe coding.

5:24:03 And I can get it, why people are against it last year,

5:24:06 because you would get to a certain point and then

5:24:09 you would reach bugs and you couldn't fix it or whatever.

5:24:12 It does seem that's changed, though,

5:24:14 especially in the last few months, since Gemini 3 came out.

5:24:17 What's your take on that?

5:24:18 LOGAN KILPATRICK: Yeah, you're right.

5:24:20 I think folks, for the right reasons— and actually,

5:24:24 we feel this inside Google as well.

5:24:25 There's a high bar for landing code, especially in production systems.

5:24:30 And even for AI Studio specifically,

5:24:32 what we've actually ended up doing is there's a lot of folks, me, Amar,

5:24:36 others on the team who are vibe coding things

5:24:39 in the actual AI Studio production agentic engineering, not vibe coding.

5:24:44 And we're making changes to the product.

5:24:46 And what we've had to do is have a deeper partnership with our engineering team.

5:24:50 So we actually have somebody now whose job is our product or our members

5:24:56 of the technical staff team goes and makes changes to AI Studio,

5:25:01 gets it so that all the CI is passing.

5:25:03 It's all green.

5:25:04 We run the tests, things look good.

5:25:06 And then we actually hand off to the engineering team.

5:25:08 They take a bunch of these changes, they get them over the line,

5:25:11 and they become the owner and the steward of getting that code

5:25:15 over the line and directly into the actual AI Studio code base.

5:25:18 And I think this partnership model between folks who are trying to vibe code

5:25:23 and the actual senior engineers who want to make

5:25:27 sure that this thing is reliable and scalable,

5:25:29 et cetera, I think works really well.

5:25:31 And then the best part is like, that person who's on the hook to get

5:25:35 a bunch of these agentically engineered changes

5:25:37 into the code base is also responsible for, how

5:25:40 do we make sure that cycle is better?

5:25:44 So, what are the skills that we need to put in place?

5:25:46 What is the infrastructure we need to build

5:25:48 so that we have better test coverage, we have all those things?

5:25:50 So I think we should probably do a blog post at some point because I feel

5:25:56 like there's lots of conversation about this, and I

5:25:58 feel like our team's found a really nice,

5:26:00 sweet spot of letting in new contributors

5:26:02 to a production cut base while also, it's Google.

5:26:05 We're holding the bar for quality really high

5:26:08 with millions of paying customers using this platform.

5:26:12 SAM WITTEVEEN: It also sounds like the lessons that you

5:26:14 learned from that are going to just improve for everyone,

5:26:17 not just for you guys doing it yourself.

5:26:19 As you learn that, oh,

5:26:20 the models maybe guide not well here or that kind of thing,

5:26:23 it allows you to course correct and make

5:26:25 things even better for higher level quality code.

5:26:28 LOGAN KILPATRICK: 100%.

5:26:29 SAM WITTEVEEN: That's really fascinating to hear.

5:26:31 One of the things— so we were both at an event, private event yesterday.

5:26:35 One of the things you said there really struck a note with me.

5:26:38 You talked about— and I realized when you said this that I felt the same thing.

5:26:43 And I thought this is a really nice— you talk about having more ambition.

5:26:49 I really like the way— I've forgotten exactly how you languaged it.

5:26:52 Maybe you can remember.

5:26:53 But you were saying that, just trying these things out,

5:26:56 you're constantly having to push yourself to have more ambition

5:26:58 to do stuff that you wouldn't have thought was possible.

5:27:02 And then time and time again, at least for me,

5:27:04 time and time again, I'm blown away that, wow, it works.

5:27:08 It can do it.

5:27:10 I kind of thought, OK,

5:27:12 because the last model— because at some point I tried something.

5:27:15 It didn't work.

5:27:16 And this whole idea of ambition.

5:27:17 You want to talk to that.

5:27:18 LOGAN KILPATRICK: Yeah, no.

5:27:20 I think about this all the time, which is the models— to your point from before.

5:27:25 If you haven't tried the thing in the last six months— SAM WITTEVEEN:

5:27:28 Even in the last two weeks.

5:27:29 LOGAN KILPATRICK: Even the last two weeks.

5:27:30 Yeah.

5:27:31 It's like, you historically had to be,

5:27:33 very precise about what you wanted AI to do for you.

5:27:35 And I think the models have crossed

5:27:37 the chasm where instead of asking for one thing,

5:27:39 you can now ask for 30 things and the model can actually do that.

5:27:42 You don't need to— we're artificially hampering what the technology is

5:27:47 capable of because we're trying to not let it fumble over itself,

5:27:50 which it used to do in certain cases.

5:27:52 And so I think about this all the time.

5:27:54 And I actually think the other threat of that story is like,

5:27:56 I now feel this weight on my shoulders because

5:27:59 it's not that the model— I can't be like,

5:28:02 oh, well, the model can't do it so it's fine.

5:28:04 I'm just not going to— now the weight is on me.

5:28:07 The onus is on me to be like, I really could build this.

5:28:11 And so now there's an interesting—

5:28:14 at least for actually for internal work stuff,

5:28:17 I'm like, oh shit, there's some bug in AI Studio.

5:28:20 I should be the one to go fix it.

5:28:21 I can't be like, oh sorry.

5:28:23 I'll just bank on the engineering team going and solving this problem.

5:28:26 It feels like the responsibility falls on me.

5:28:28 The same thing is true for external projects where I'm now like,

5:28:32 the bar is high in my mind for what I will actually go build as a side project.

5:28:39 And it just means I almost need more time than before.

5:28:41 It's an interesting dichotomy, which is like,

5:28:44 you'd expect, because the models are so capable,

5:28:46 now I'm just doing something in an hour over the weekend.

5:28:50 Instead, I'm like, my idea is 20 times as ambitious.

5:28:53 I'm like, OK, I'm going to— SAM WITTEVEEN: I need to take a week off.

5:28:55 LOGAN KILPATRICK: Yeah.

5:28:55 I got 20 ideas.

5:28:56 I'm going to take a week off of work in order

5:28:58 to pull this thing off because I'm able to be so ambitious.

5:29:00 And I actually know— it's another thing that's changed is like,

5:29:04 I know in my heart of hearts that I can actually pull it off.

5:29:09 I think before, I always had this question of, am

5:29:12 I going to go down and try to build this thing and then realize 30% of the way

5:29:17 in that I don't have the technical capability to pull this off?

5:29:20 I don't feel that way anymore.

5:29:22 And so it's like I'm kind of— yeah,

5:29:24 it's an interesting experience to go through this and turn

5:29:27 that chapter and I think we'll all have

5:29:30 more of these types of experiences over the next

5:29:32 12 to 18 months as the models become more capable.

5:29:35 SAM WITTEVEEN: I definitely find that myself, too,

5:29:37 that just things like multiplayer games or multiplayer

5:29:43 apps and stuff like that, you think, OK, that's a pretty hard thing to code.

5:29:46 I'm going to have to read up stuff.

5:29:48 LOGAN KILPATRICK: That's one prompt in AI Studio.

5:29:50 One prompt, literally.

5:29:52 SAM WITTEVEEN: One of the other things I wanted to ask— and this is going back,

5:29:54 I guess to something you talked about.

5:29:55 The whole next billion users.

5:29:58 So I have some people on my team who are not coders.

5:30:05 But I make sure that— build is awesome for that.

5:30:08 I basically say, here, open this.

5:30:10 Build something.

5:30:11 Try something out.

5:30:12 I find that they're making software that I would have never thought to make.

5:30:17 This is one of the things that's fundamental, at least to me.

5:30:22 And then I find myself sort of constantly going, how did you make that?

5:30:26 LOGAN KILPATRICK: Yeah.

5:30:26 SAM WITTEVEEN: And then I find that, OK, actually, it wasn't that complicated.

5:30:30 They had a conversation that described what they want.

5:30:33 Are you seeing things like that?

5:30:35 LOGAN KILPATRICK: 100%.

5:30:36 I think there's better quotes than I'll be able to remember,

5:30:40 but there's so many things about intelligence is so distributed

5:30:45 across the globe and great ideas are so distributed across.

5:30:50 The thing that hasn't been distributed is opportunity.

5:30:52 And I think what is exciting to me is,

5:30:56 I fundamentally believe the reason that it's important

5:30:58 to build these types of products is because we're putting

5:31:01 the means of opportunity in the hands of people

5:31:03 who wouldn't otherwise have been able to build this thing.

5:31:05 SAM WITTEVEEN: Totally agree.

5:31:06 LOGAN KILPATRICK: It's going to be like a— the creation of software,

5:31:09 at least today, is fundamentally the most

5:31:12 economically empowering thing that you could possibly do.

5:31:15 That's why people learn how to write code historically, all that stuff.

5:31:18 And now putting that in the hands of this cohort

5:31:21 that couldn't before is going to be— we're

5:31:23 already seeing the impact of this, millions of these people

5:31:26 using AI Studio to do this in other products.

5:31:29 And I think we're chapter 1 of that story, which is really exciting.

5:31:33 And AI Studio, I think, has this responsibility to— now,

5:31:37 as the ability to create software actually gets solved,

5:31:41 I'm fully convinced in the next 12 months

5:31:44 you'll be able to build whatever software you want.

5:31:47 What are the next set of challenges to actually

5:31:49 help you build that thing that you want to build?

5:31:52 Actually, it won't be software.

5:31:53 It'll be 15 other things.

5:31:55 And so we're already starting to think about,

5:31:57 what are the next 15 other things that we need to make

5:31:59 sure that you can do and do them in AI Studio.

5:32:01 Use the Google ecosystem and make that all super seamless.

5:32:04 SAM WITTEVEEN: How do you design— for me, AI Studio is ai.dev.

5:32:08 So if people don't know the shortcut,

5:32:10 that's what I always— LOGAN KILPATRICK: Ai.studio as well.

5:32:12 SAM WITTEVEEN: Right.

5:32:12 Ai.studio too.

5:32:13 Yes.

5:32:14 But the funny thing is,

5:32:16 I'm reminded by the ai.dev is that, what is the dev nowadays?

5:32:19 LOGAN KILPATRICK: Yeah.

5:32:20 SAM WITTEVEEN: Right?

5:32:21 LOGAN KILPATRICK: That's the tension.

5:32:23 And actually, I think we feel this tension a lot in AI Studio as a product,

5:32:27 because AI Studio is the front door to the APIs as well.

5:32:30 So if you want to build a Gemini,

5:32:32 you actually don't have to vibe code in AI Studio.

5:32:34 You can just take the API, go build off-platform if you want,

5:32:37 use whatever coding models use, use whatever tech stack you want.

5:32:41 And we have this dual identity now

5:32:44 as a builder product and as a developer— like, a deeply developer product.

5:32:48 There's truly millions of actual businesses built

5:32:51 on top of the APIs that we have.

5:32:53 And so we're always walking this fine line.

5:32:56 I think it's definitely a transition.

5:32:59 And so we're trying to walk the line as much as possible during this transition,

5:33:04 but it's tough because we want to build a powerful tool.

5:33:09 To your point before,

5:33:10 developers who don't know how to do front end stuff or don't want to do it,

5:33:13 you can come do that in AI Studio and we want to meet them where they are.

5:33:15 We also want to enable this generation of people who don't have any code,

5:33:19 don't even ever want to look at a single line of code.

5:33:22 It's really tough.

5:33:23 It's hard work to try to find the balance.

5:33:26 SAM WITTEVEEN: I can see that's definitely, definitely a challenge.

5:33:29 So let's talk about some models.

5:33:31 My background is in deep learning.

5:33:34 Big believer in LLMs.

5:33:38 It's now about four months since Gemini 3 came out.

5:33:41 LOGAN KILPATRICK: Yeah.

5:33:42 December 2025.

5:33:44 SAM WITTEVEEN: We've seen 3.1 come along.

5:33:49 Definitely the powerhouse of those seems

5:33:51 to be getting better and better and better.

5:33:52 The other thing that's fascinating too, though,

5:33:55 is that— and I think this goes back even to the original Gemini.

5:34:00 People didn't really understand what multimodal meant,

5:34:02 and they didn't understand the consequences of that.

5:34:05 I feel now with the new live voice mode, all these other things— do you want

5:34:10 to talk a little bit about perhaps live voice?

5:34:12 What's the state of that now?

5:34:13 LOGAN KILPATRICK: Yeah.

5:34:14 So we rolled out— I think it was last year actually at I/O,

5:34:18 our first live model.

5:34:19 And it gives the ability to stream audio and video and text

5:34:24 directly to the model and get responses back in real time.

5:34:27 And I think this enables a bunch of these real time omnipresent use cases.

5:34:32 It's actually really cool.

5:34:34 I think it's like ai.studio/live and you can try out that experience.

5:34:39 But I think the use case that I think

5:34:41 we were seeing a ton of traction with initially,

5:34:43 which is interesting to think about, is literally screen sharing with an agent.

5:34:49 It sees everything that you see, and then you can just ask it questions.

5:34:52 And we've seen customers, like,

5:34:54 I have a really complicated product experience and I'm

5:34:57 a non-technical user trying to use an e-commerce platform.

5:35:01 How do I set up a custom domain name?

5:35:03 And I don't know how to navigate through the product and find it.

5:35:06 And literally the live model can walk you through and be like, click.

5:35:08 And you can do cool things like add overlays and stuff like that to be like,

5:35:12 click here, and then go to this section and do that.

5:35:15 And you can imagine,

5:35:17 I think that omnipresent tutor or helper is really exciting.

5:35:22 I actually think we're going to do some interesting

5:35:24 stuff around vibe coding too to help you— your one

5:35:27 click thought partner to see everything that you see

5:35:30 and help you debug problems is something that we're thinking about.

5:35:33 But the model is so capable.

5:35:35 SAM WITTEVEEN: It definitely seems amazing,

5:35:37 just the personalization of where it can meet you, where you're at.

5:35:40 So it's not like I'm watching a YouTube video where

5:35:42 I have to get through the first 30 minutes to go, yeah, I know all that stuff.

5:35:45 And now, oh, OK, this is the three minutes.

5:35:47 Like you said, you can just power it up, say, hey,

5:35:49 I don't know which button does this, and it will tell you.

5:35:52 I had a really interesting experience with that last year just after I/O

5:35:56 when it first came out that where I was staying in San Francisco,

5:36:00 the refrigerator broke down because of a filter.

5:36:04 I'm like, I have no idea about this.

5:36:06 So I opened this thing up and it knew exactly what refrigerator it was.

5:36:12 It knew what the warning button was.

5:36:15 It told me I had to change the filter, told me where the filter was.

5:36:19 And sure enough, within 10 minutes, I'd ordered one from Amazon.

5:36:23 The next day I had it changed and it was done.

5:36:25 And that would have— that was just amazing for me at the time.

5:36:28 LOGAN KILPATRICK: I'll share my experience.

5:36:30 Actually, JD, who's walking past right now, who's on our team at Google.

5:36:34 We were doing a bunch of Astra demos, which was the precursor to Gemini Live.

5:36:41 And this was, I think,

5:36:43 at I/O 2024 maybe when we first started showcasing Astra in early forms.

5:36:48 And I don't know if folks know those very fancy coffee

5:36:51 machines where there's handles and you do all the things separately.

5:36:56 I'm a very simple coffee drinker,

5:36:58 and every time I see those machines, I'm like, OK,

5:37:00 I'll go find a bottle of coffee

5:37:02 or something because I don't know how this works.

5:37:04 I opened up Astra, the precursor to this, to Gemini Live,

5:37:08 and was asking, what do I actually do?

5:37:10 And I'm showing the machine.

5:37:11 It's like, OK, put your hand on this handle, twist it, pull it down, go here.

5:37:16 And like, literally, zero shot, just worked.

5:37:19 And I was like, OK, I actually know how to do this thing now.

5:37:21 SAM WITTEVEEN: It really is insane.

5:37:22 LOGAN KILPATRICK: It's very cool.

5:37:23 SAM WITTEVEEN: I find so many people don't know it too.

5:37:26 When you pull it out and just suddenly show them, they're like, whoa.

5:37:29 LOGAN KILPATRICK: It's magic.

5:37:29 SAM WITTEVEEN: It is magic.

5:37:30 LOGAN KILPATRICK: It's interesting you say that.

5:37:31 I think we should talk about our text to speech model as well,

5:37:34 which I think is more widely adopted because

5:37:37 I think developers really understand that use case.

5:37:39 This generate audio synthetically, I think makes sense.

5:37:43 There's a ton of use cases.

5:37:44 I think the challenge for live has been— it's a new paradigm,

5:37:49 and so there isn't an existing— there

5:37:51 wasn't something else you were using before.

5:37:53 What you were using before was like a combination of 50 different things.

5:37:57 And so I think from a product market fit perspective,

5:38:00 it's been interesting to see who is adopting this thing.

5:38:03 We've had to do a lot of education to help people understand those use cases.

5:38:09 But I think we're finally— it's starting to make its way into everyday products,

5:38:14 and I think people will understand.

5:38:16 SAM WITTEVEEN: I find that people, developers especially, yeah,

5:38:19 don't have no clue that it can do function calling,

5:38:21 that it can use— LOGAN KILPATRICK: Search.

5:38:23 Yeah.

5:38:24 Maps, everything.

5:38:24 SAM WITTEVEEN: All those things that suddenly, even though it's voice,

5:38:28 it's still able to then do— the model's voice

5:38:31 is still able to do function calling, return things back.

5:38:33 LOGAN KILPATRICK: Yeah.

5:38:33 That's the magic is because can really use it as a home assistant product.

5:38:37 I think that's how a lot of folks do it.

5:38:39 My favorite thing is like an omnipresent help button.

5:38:43 You're somebody who's just like, you're stuck somewhere on the internet.

5:38:46 You click that button all of a sudden, there was a person over your shoulder

5:38:50 with your consent that it can see what you see,

5:38:52 you can talk to it, and help guide you through whatever you're trying to do.

5:38:56 I think that that's a magical experience that I don't think is

5:38:59 widely diffused into the ecosystem and there's lots of alpha in building this.

5:39:03 SAM WITTEVEEN: For sure.

5:39:05 Another magic model.

5:39:06 Nano Banana.

5:39:07 LOGAN KILPATRICK: Yeah.

5:39:07 SAM WITTEVEEN: Huge moment.

5:39:09 It's only gotten better since it's first sort of come along.

5:39:13 Where do you see models like that going?

5:39:17 LOGAN KILPATRICK: Yeah, I think the gen media portfolio— SAM WITTEVEEN: Yeah,

5:39:20 and then actually we can talk about Veo and stuff as well.

5:39:22 LOGAN KILPATRICK: Veo, Lyria, everything else has just been incredible.

5:39:24 I think it's been a shining light of the Gemini story.

5:39:27 It's actually, back to what you mentioned before,

5:39:29 I think it's tied to the multimodal story,

5:39:31 because when you have really great multimodal understanding,

5:39:33 it lets you actually build a really great multimodal generation model.

5:39:38 I think the direction of travel is consolidation

5:39:40 in the offerings as far as gen media goes.

5:39:43 So I think we've done a ton of these bespoke gen media models.

5:39:47 I think hopefully we'll see a lot of that come

5:39:49 into the mainline version of Gemini in the future, which is really exciting.

5:39:53 And I think, yeah, we've got a lot of different— we have the live model,

5:39:57 we have the TTS model, we have Nano Banana, we have Veo, we have Lyria.

5:40:00 So there's a lot of different models, and I think it introduces some complexity.

5:40:04 SAM WITTEVEEN: Even though there's a lot of different models,

5:40:06 from what I understand— and correct me if I'm

5:40:08 wrong— it's still fundamentally the Gemini— LOGAN KILPATRICK: It is.

5:40:12 SAM WITTEVEEN: --key models, and its knowledge is what makes things like being

5:40:15 able to reason over images in Nano Banana so cool.

5:40:18 LOGAN KILPATRICK: Exactly.

5:40:19 Yeah.

5:40:20 It is.

5:40:20 I think there's a slightly different training

5:40:22 mixture and a slightly different— SAM WITTEVEEN:

5:40:24 And a different hand or something when it comes— LOGAN KILPATRICK:

5:40:25 Yeah, slightly different architecture.

5:40:26 But all the base— which is actually great.

5:40:28 It's what makes it possible for us to take a number of research bets

5:40:31 like this is conceptually what we're doing

5:40:34 for these models is actually really, really similar.

5:40:37 There are changes, and then the tension becomes, like,

5:40:39 what happens when you fuse these things together into a single model?

5:40:43 There's some gains in certain places.

5:40:44 There's definitely some losses in others.

5:40:46 And so there's been a huge amount of research as far

5:40:48 as, how do you actually bring it all together, which is exciting.

5:40:50 SAM WITTEVEEN: It's super interesting.

5:40:52 Any other new models that you want to announce for us on this live stream?

5:40:56 Just casually drop it.

5:40:58 LOGAN KILPATRICK: Yeah, no.

5:40:59 There's lots of great things.

5:41:02 I see lots of— I see all the tweets.

5:41:04 Lots of people looking for new models, which is exciting.

5:41:07 I think we're pushing the rock up the hill on coding.

5:41:09 I think there's lots of— huge amount of investment

5:41:12 happening from us to make a great coding model.

5:41:14 I think we've got tons of products that will benefit from us.

5:41:17 Our customers want this, so lots of stuff coming.

5:41:19 Hopefully we'll have more to share on coding stuff soon.

5:41:23 SAM WITTEVEEN: Coming soon.

5:41:24 LOGAN KILPATRICK: Yeah.

5:41:25 Hopefully.

5:41:25 Fingers crossed.

5:41:26 SAM WITTEVEEN: So speaking of coding, Antigravity also came out with 3.0,

5:41:31 has really had a lot of good feedback around it,

5:41:34 and they're just getting— I'm amazed that that team has just sort

5:41:36 of getting started when I've seen some of the things that they're doing.

5:41:40 Do you want to talk a little bit about

5:41:41 that and how that's influenced maybe AI Studio and vice versa?

5:41:46 You're in this interesting sort of situation where you've

5:41:50 got these amazing teams working on very unique things,

5:41:54 but the lessons that they learned go across multiple areas.

5:41:57 LOGAN KILPATRICK: And I think this is— actually to your point,

5:42:00 this is the beauty for Google is I

5:42:02 think the Antigravity team is doing an incredible job,

5:42:04 not only in that product side,

5:42:06 but actually really close to research and helping push

5:42:08 the rock up the hill from a coding model perspective,

5:42:10 from an infrastructure perspective, and the benefits and the things that they're

5:42:14 learning are diffusing across many Google products.

5:42:16 So AI Studio is a great example of this.

5:42:18 The coding engine harness is actually powered by the same Antigravity harness.

5:42:24 So behind the scenes,

5:42:25 we're using literally the same binary as what's being used.

5:42:29 So I think you'll see more of this cross Google diffusion of the technology.

5:42:34 And Google and AI Studio and DeepMind are making a huge bet on the Antigravity

5:42:38 team and the work that they're doing in order to bring this to more developers,

5:42:41 which is super exciting.

5:42:43 So I think to your point, it's like it feels like chapter 1 of that story still,

5:42:46 and it's going to be fun to see all the new things that they end up landing.

5:42:52 SAM WITTEVEEN: Rolling out.

5:42:54 Just speaking quickly,

5:42:55 one of the things that the community is frustrated, I think,

5:43:00 with Antigravity is just like the fact that the quota,

5:43:04 the lack of— LOGAN KILPATRICK: I see the tweets.

5:43:07 SAM WITTEVEEN: I'm sure you do.

5:43:08 So we saw new TPUs announced last night and this morning.

5:43:12 Pretty amazing architecture.

5:43:15 About 3x more for the inference and stuff like that.

5:43:19 Is that going to help to be able

5:43:21 to serve more of Gemini models for Antigravity and stuff?

5:43:23 LOGAN KILPATRICK: Yeah.

5:43:24 There's definitely— I mean, first of all,

5:43:27 I acknowledge the tension that folks have— SAM WITTEVEEN:

5:43:31 This is the problem with being so popular.

5:43:32 LOGAN KILPATRICK: It is.

5:43:33 I mean, it actually is.

5:43:34 It's a death by success story,

5:43:35 which I think we have way more demand across— actually,

5:43:39 not even just Antigravity,

5:43:41 but across because every Google product surface that's landing AI stuff.

5:43:45 There's just way more demand than there is supply in order to do this.

5:43:48 So trying to be really intentional, trying to be transparent with people,

5:43:51 trying to make sure that we get the models in the hands of our paying customers.

5:43:55 There's a bunch of trade-offs.

5:43:57 It's not going to be perfect in every instance,

5:43:59 but we really are trying to make sure that it lands.

5:44:02 The reason we do all of this is to make

5:44:04 sure that the models end up in the hands of people, and that it benefits them.

5:44:08 SAM WITTEVEEN: I get frustrated reading some of those tweets

5:44:10 where I think you don't think if they could,

5:44:12 that they would actually— LOGAN KILPATRICK: We're trying.

5:44:14 Yeah.

5:44:15 It's tough.

5:44:15 And I think there's a lot of prioritization trade-off questions.

5:44:19 And I think that's where a lot of the tension is.

5:44:21 And I think there's actually a macro— stepping back, separate from our products,

5:44:26 but just as an ecosystem, I think over the next two to three years,

5:44:28 we're going to go through this challenge, which is like,

5:44:31 there's going to be so much demand for this technology, and there already is.

5:44:35 There's so much pent up demand.

5:44:37 How is everyone going to go through this trade-off exercise

5:44:40 of, you probably will end up with a fixed amount of tokens,

5:44:44 and where do you deploy the tokens in your own life, in your business?

5:44:48 All of those, that's very much the reality of what's going to happen.

5:44:52 And so finding those high value use cases,

5:44:54 finding those examples of where you can actually move

5:44:56 the needle and get a huge amount of leverage,

5:44:59 I think is going to be the next era of this.

5:45:01 Instead of just throwing AI at everything,

5:45:03 I think you're going to have to be more intentional about doing it

5:45:06 in the highest value cases because there's so much demand for the models.

5:45:11 SAM WITTEVEEN: Interesting.

5:45:13 LOGAN KILPATRICK: Even with Google spending tons of money to build TPUs.

5:45:16 Like, we can— SAM WITTEVEEN: Announced 100 and something million.

5:45:19 LOGAN KILPATRICK: All the money is being spent

5:45:21 on TPUs and there's still a huge amount of demand.

5:45:24 SAM WITTEVEEN: Wow.

5:45:25 You mentioned coding does seem to be where it's at the moment, right?

5:45:32 And I'm curious to know, OK, what's the second thing going to be?

5:45:36 So coding, I get, because all the labs benefit from being—

5:45:40 getting better coding models means everyone can improve the models,

5:45:44 can improve what they're building, that kind of thing.

5:45:46 It does seem really interesting to see what other

5:45:48 areas of society are going to have these big impacts.

5:45:52 And I think people have proposed ideas.

5:45:55 What do you see?

5:45:57 What do you think about it?

5:45:58 LOGAN KILPATRICK: Yeah.

5:45:59 I think robotics is at this crossroads that I think

5:46:02 a lot of the coding models were probably at 18 months ago.

5:46:07 So I think we're probably like 18 months,

5:46:09 maybe 12 months from a bunch of these significant breakthroughs.

5:46:12 And I think it's just because the intelligence

5:46:14 that we're packing in in some of these new

5:46:17 systems is just so high that it just covers

5:46:19 a lot of the edge cases that existed before.

5:46:22 So I'm excited for that.

5:46:24 I think there's— as a customer who's pre-ordered

5:46:26 a bunch of these robotics products, I'm very excited.

5:46:29 I hope those teams all pull it off.

5:46:31 And there's a ton of stuff actually happening in the Gemini side from robotics.

5:46:34 We have a huge amount of partnerships with Boston Dynamics

5:46:36 and others to power a bunch of these next gen experiences.

5:46:39 So I think that field— I could be off by another six months

5:46:44 or something like that, but it feels like it's ripe to actually happen.

5:46:48 I think the other one is long running agents.

5:46:51 And I sat down with Jeff Dean earlier today.

5:46:53 If you look at the frontier of how long you can

5:46:57 let an agent run without a human in the loop today,

5:47:00 it's like— depending on the use case,

5:47:01 it looks different, but it's on the order of hours.

5:47:04 I think as we look in the next 12 months,

5:47:06 it is truly going to be on the order of days or weeks.

5:47:09 And there's a huge amount of work that needs to be done in order to enable that.

5:47:13 But I think that is the direction that we're going.

5:47:15 The models will go out and do the things they need

5:47:17 to do a lot longer without human intervention, which is exciting.

5:47:20 SAM WITTEVEEN: What are the big things?

5:47:22 I'm guessing research is one of the big ones,

5:47:25 and that's the first agent on Gemini actually is Deep Research.

5:47:29 You just updated it this week.

5:47:31 LOGAN KILPATRICK: This week.

5:47:31 SAM WITTEVEEN: I still haven't had a chance to read the blog post.

5:47:33 LOGAN KILPATRICK: It's good, it's good.

5:47:34 Read the blog post.

5:47:35 So back in December, we released the Interactions API.

5:47:38 And as part of the Interactions API,

5:47:40 we brought this concept of models and agents

5:47:43 both being first class citizens of the API experience.

5:47:46 And so we launched with Deep Research initially.

5:47:49 Last week, we just rolled out with two new versions of Deep Research,

5:47:53 Deep Research and Deep Research Max,

5:47:55 which sort of pushes even further and goes deeper and more rigorous.

5:47:59 And I think all of this is laying the groundwork

5:48:02 for letting developers create their own agents in the Gemini API,

5:48:05 in addition to a bunch of new hosted

5:48:07 agents from Google which we'll bring to the world, which is really exciting.

5:48:11 So I think we're seeing that.

5:48:12 We've been laying the groundwork to make this happen.

5:48:15 And the thing that I really am excited about

5:48:17 is using the same underlying API for models and agents.

5:48:22 I think as developers make this transition from interacting

5:48:25 with raw models to actually working with agents,

5:48:28 we're trying to make that seamless.

5:48:29 And I think by having it sort of side by side

5:48:32 in the API and literally side by side in the UI now,

5:48:34 you can see models and agents trying to meet folks where they are.

5:48:38 SAM WITTEVEEN: Do you think that that's this generation's Gmail?

5:48:41 Like, Google became really famous because of Gmail.

5:48:43 I'm not talking about just the communication aspect, but I mean,

5:48:45 as a product that was sort of a boom.

5:48:47 Everyone had Gmail within a year or so.

5:48:50 And it does seem to me that as some of these agent things start to roll out,

5:48:55 people are going to be talking about, oh, my Google agent or my Gemini agent

5:49:01 that do this, this, and this and whatever,

5:49:03 or maybe people give them names whatever.

5:49:05 It does seem like there's something that changed there.

5:49:08 LOGAN KILPATRICK: Indeed.

5:49:09 Yeah.

5:49:09 I agree with you.

5:49:10 I think that the corollary is maybe every product is going to become a agentic.

5:49:15 And so I think that will be maybe the only reason

5:49:18 what you're saying is not true is like Gmail— SAM WITTEVEEN:

5:49:20 Just Gmail becomes an agent.

5:49:22 LOGAN KILPATRICK: --just becomes an agent,

5:49:23 and just all the other stuff becomes— like Google

5:49:25 Search or Google is an agent or many agents, obviously.

5:49:29 So it'll be interesting to see the foundational pieces all become agentic,

5:49:34 which I think is the direction that we're going.

5:49:37 SAM WITTEVEEN: That's fascinating.

5:49:38 OK.

5:49:39 Let's start to wrap it up.

5:49:40 What I wanted to ask you also was like, OK,

5:49:42 so you've now experienced working in a couple

5:49:45 of different companies in this field.

5:49:47 You've been at the forefront though probably for at least the last six years,

5:49:52 seven years, maybe longer, a little bit longer.

5:49:55 What are you most excited about?

5:49:58 You mentioned the robotics stuff.

5:50:01 The point I wanted to make about that is,

5:50:02 in many ways, that's just another modality.

5:50:05 This is where Gemini is being used,

5:50:07 and there are some really good fine tunes of Gemini for robots and stuff.

5:50:11 What are the things that— and really, what are the things that three years ago,

5:50:17 you thought, we'll never do that in my lifetime,

5:50:21 I find myself constantly looking at things that were research problems six,

5:50:26 seven years ago, and they're just done now.

5:50:30 Right?

5:50:30 And they're just solved to the point where

5:50:33 there's really not that much point in even

5:50:35 doing research in that particular— a specific kind

5:50:38 of NLP or something like that kind of thing.

5:50:41 LOGAN KILPATRICK: I think I have a hypothesis,

5:50:44 and I think it's what our team is spending a bunch of time thinking about,

5:50:47 which is similar to what happened with YouTube in the internet era,

5:50:52 the sort of early internet area where everyone was able to become a creator.

5:50:58 I think that's happening with software.

5:51:00 So we're thinking about this similar to how YouTube actually

5:51:03 thought about building a platform and an ecosystem for creators.

5:51:06 Now, everyone is a— now everyone's a builder.

5:51:09 Everyone can build, and the technology is enabling that to happen.

5:51:14 There's so much breadth in such a long tail of actually pulling that story off.

5:51:19 But it's what we're spending a lot of time thinking about.

5:51:21 And it's also, emotionally, the thing that gets me most excited,

5:51:25 because I think you look at this group of people who

5:51:29 have not had the means to contribute in this software economy,

5:51:33 and bringing those— there's so many.

5:51:34 I actually work with lots of these people.

5:51:36 So many smart people at Google, even as an example,

5:51:39 who are right next to the software who haven't historically created software.

5:51:44 And we have someone on my team, Harrison,

5:51:47 who is incredible and does a bunch of our growth stuff.

5:51:49 He's now like the number two token consumer and is building all of these really

5:51:55 cool internal— had never written a single line

5:51:56 of code in his life six months ago, and is now building all of these really

5:52:00 cool internal tools like shipping and landing pages,

5:52:03 making product experiences in AI Studio.

5:52:05 And I think we're just scratching the surface

5:52:08 of that becoming widely distributed in the economy.

5:52:11 And I think when that happens,

5:52:13 we actually will see this phase change transition of what

5:52:18 the world looks like as more people build software,

5:52:21 and I think it's going to be for the better, which is really exciting.

5:52:25 And we'll also— I think I'll make my last comment,

5:52:27 which is I think it will also increase the demand

5:52:29 for traditional developers I think is the other thing.

5:52:32 I think a lot of people look at these things as mutually exclusive.

5:52:34 I think it's very positive sum.

5:52:36 As this total addressable market

5:52:38 of the number of people making software increases,

5:52:41 it increases the demand for developers because

5:52:43 there will be a stopping point of how far you can go if you don't understand

5:52:47 all the detail of how this technology works.

5:52:49 And so there'll be a lot more cases where

5:52:52 people want to go farther and they need somebody,

5:52:54 they need a developer to partner with, and they

5:52:56 need that technical co-founder or whatever it is.

5:52:59 So I think that direction is really positive and exciting.

5:53:02 SAM WITTEVEEN: How do we deal with the doomers?

5:53:05 From what you're describing, society is going to change.

5:53:08 I think it's changed pretty much all the time for the last.

5:53:12 At least hundred years or so.

5:53:15 It's clearly going to change.

5:53:16 Things like education are going to have to change,

5:53:19 the whole sort of systems around those things.

5:53:22 How do we as professionals in this AI industry or something,

5:53:29 convince people that it's not like everyone's going to die.

5:53:33 It's not like everyone's sort of— so you get these really hardcore doomers,

5:53:38 which seem to get a lot of publicity and stuff like that.

5:53:42 But they're not seeing all these opportunities,

5:53:44 they're not seeing all these things going on.

5:53:47 LOGAN KILPATRICK: Yeah.

5:53:48 I mean, I feel this sense of responsibility that our team has.

5:53:51 The DeepMind mission is build AI and make sure it benefits humanity.

5:53:55 And I think the way that we do that is deploying the technology.

5:53:58 And I think the way that anybody— people should rightfully

5:54:01 be skeptical of many things that are happening in the world.

5:54:04 The way that you meet people where

5:54:05 they are if they're skeptical is build technology,

5:54:08 deploy it, show it, and let them use it.

5:54:10 There's no better, way to have the conversation than

5:54:14 being able to put your hands on the technology.

5:54:16 I think we have that deployment-first sort of mindset

5:54:19 as far as what our team is doing in AI Studio, and doing it with urgency,

5:54:23 because I think there is— if you wait too long,

5:54:27 society goes through that transition and then people didn't

5:54:31 have the chance to have that conversation and understand.

5:54:33 So I think it's super important.

5:54:35 It's why we have an incredible developer advocacy

5:54:38 relations team who's trying to— and now builder relations,

5:54:41 trying to bring the technology to the world

5:54:44 so that folks really understand what's happening.

5:54:47 It's super important.

5:54:48 SAM WITTEVEEN: On that note, I think we'll finish up.

5:54:50 That's awesome.

5:54:52 Thank you for the conversation.

5:54:53 It's fascinating to see where things are going.

5:54:56 Fascinating to hear what you're working on.

5:54:58 Sounds like you've got a lot of cool stuff coming soon.

5:55:00 LOGAN KILPATRICK: Lots of cool stuff coming.

5:55:00 SAM WITTEVEEN: Anything else you want to leave the people with?

5:55:03 LOGAN KILPATRICK: I think the only last thing I'll say is,

5:55:04 if you have feedback, please send it to us.

5:55:06 We're trying to push the rock up the hill.

5:55:08 If you need— SAM WITTEVEEN: Paper cuts, anything.

5:55:10 LOGAN KILPATRICK: Anything.

5:55:11 Paper cuts, features, whatever it is, ping us,

5:55:13 email, Twitter, LinkedIn, whatever, however you best mail.

5:55:19 Send us a fax and we'll make AI Studio better for your use case.

5:55:22 So please keep the feedback coming.

5:55:24 Thank you for doing this, Sam.

5:55:25 Thanks for coming to Next and flying all the way.

5:55:27 SAM WITTEVEEN: Thank you for joining us today.

5:55:28 It's been fascinating.

5:55:29 And thank you to everyone who's been listening.

5:55:32 I'm sure you'll see Logan at future things going forward,

5:55:35 and there's a lot more content coming in Next as well.

5:55:38 LOGAN KILPATRICK: I love it.

5:55:38 Thank you all.

5:55:39 SAM WITTEVEEN: Bye.

5:55:46 [MUSIC PLAYING]

Study with Looplines Download Captions Watch on YouTube