AI transcript
0:00:13 >> Hello, and welcome to the Nvidia AI Podcast.
0:00:15 I’m your host, Noah Kravitz.
0:00:18 Zoom became a household name in 2020,
0:00:19 as it rose to prominence as
0:00:23 the go-to video conference platform during the COVID pandemic.
0:00:26 Since then, the company has not only been refining their video technology,
0:00:30 but also helping us all rethink the way we approach work in
0:00:33 the era of digital communications and AI.
0:00:35 At Zoom Topia this past October,
0:00:39 Zoom took the wraps off of a number of new AI-first products and initiatives,
0:00:42 all in service of the company’s mission to deliver
0:00:45 an AI-first work platform for human connection.
0:00:49 Here to discuss everything from Zoom’s approach to federated AI and
0:00:55 AI agents to the future of how we all live and work with technology is Dr. XD Huang.
0:00:58 XD is Zoom’s Chief Technology Officer and has
0:01:01 a prolific background in artificial intelligence coming to Zoom from
0:01:05 Microsoft where he founded the Speech Technology Group in 1993,
0:01:10 and most recently served as Azure AI CTO and Technical Fellow.
0:01:13 XD is an IEEE and ACM Fellow and
0:01:16 an elected member of the National Academy of Engineering,
0:01:19 and the American Academy of Arts and Sciences.
0:01:22 Most importantly, he’s with us right now, so let’s get to it.
0:01:27 XD Huang, welcome, and thank you so much for joining the NVIDIA AI podcast.
0:01:29 Thank you, I’m glad to be here.
0:01:33 So we’re recording this on the Friday immediately following Zoom Topia,
0:01:37 where Zoom basically announced to the world that you’re going all in on AI.
0:01:39 We want to hear all about the new stuff of course,
0:01:43 but first maybe we can set the scene a little for the audience.
0:01:48 Can you tell us broadly about Zoom’s approach to AI and AI in the workplace?
0:01:52 Yes, I think this is really the most exciting time.
0:01:57 I started working on AI since I was a graduate student.
0:02:00 This has been over 40 years.
0:02:06 Now Gen of AI really, really transformed how activity is going to be.
0:02:08 So Zoom is in the forefront.
0:02:13 We have provided amazing media conferencing for the whole world.
0:02:18 It’s a household name that everyone understands what Zoom is about.
0:02:21 – It’s a verb even at this point, right? – Yes, that’s right.
0:02:28 So we’re now facing even more exciting opportunities in the world.
0:02:32 So meeting is one of the most important business functions,
0:02:39 but we want to expand that capability for people to work happily on Zoom platform.
0:02:44 So Zoom workplace is going to take advantage of Gen of AI capability.
0:02:49 We believe that Gen of AI is going to really provide an exciting opportunity.
0:02:55 I reflected my own journey where I started writing my first master thesis
0:03:00 in Beijing’s Qinghua University, the first paper I was using typewriter.
0:03:03 I love the expensive liquid paper.
0:03:09 In China, in Beijing’s Qinghua University, that was 1983-82.
0:03:12 I remember liquid paper was expensive, it’s a luxury.
0:03:17 By this type, any letter, I have to really use the liquid paper.
0:03:19 I could rather interrupt that.
0:03:26 And when I wrote my book, Smoke and Energy Processing with my colleague and Microsoft,
0:03:30 I was fortunate we had the Microsoft Word.
0:03:36 And even at that time, Word kind of accommodated 800 pages document.
0:03:37 – Too big? – Too big.
0:03:43 So we have to separate the file for each chapter,
0:03:45 but Microsoft Word did a wonderful job.
0:03:48 It’s hard for me to imagine without Microsoft Word,
0:03:51 we have to use a typewriter to write that.
0:03:56 How hard it’s going to be, because we have a lot of graphs, math, and references.
0:03:59 – Right. – Time passed quickly.
0:04:04 One of my colleagues in Microsoft wrote his book
0:04:07 with the systems of GPT-4 charging.
0:04:09 It’s just amazing.
0:04:13 Not for the activity, it really pushed everything to getting level.
0:04:15 So you can see that reflection journey.
0:04:18 To stop you for a quick second,
0:04:22 if you can think back to when you were writing on the typewriter,
0:04:24 could you have imagined where we’d be now?
0:04:27 You’ve been in this field for a long time, so perhaps you could.
0:04:30 But I’m just curious, you know, if 30 years ago,
0:04:34 where we’re sitting today, your colleague using GPT-4 to help write a book,
0:04:36 is that something you thought about back then?
0:04:40 Yes, I selected a speech recognition as my thesis,
0:04:42 and the pages should be in the bookstale.
0:04:45 At that time, I had only IBM PCS.
0:04:47 I don’t know if you know what that means.
0:04:52 – Yeah, I remember. – Plus, a few Apple II computers.
0:04:56 – Yeah, I had a 2E growing up. – Yeah, that was all we had.
0:05:00 – Right. – And I actually told myself,
0:05:03 “If I could let the computer understand spoken language,
0:05:06 I could retire.” – Right.
0:05:09 – 40 years passed. I’m excited then ever.
0:05:12 So I’m not retiring. The frontier is Bushla.
0:05:15 – Yes. – It’s not about speech recognition.
0:05:20 In Microsoft, we were the first to reach the humanity
0:05:24 on the most difficult speech test switchboard in 2016.
0:05:26 Most people didn’t believe we could have done that.
0:05:27 Yes, we did.
0:05:32 Now, China GPD really redefined and opened up
0:05:34 the imagination for the whole world.
0:05:36 – Right. – I think Oakland did a great job
0:05:40 to really redefine the new frontier.
0:05:42 – We shared a story in YZOOM. – Yes.
0:05:45 You know, going to grab this opportunity
0:05:48 to redefine productivity. – Yeah.
0:05:52 Every era of computing created productivity lead.
0:05:55 Microsoft revolutionized desktop computing,
0:05:58 office, unquestionably,
0:06:01 as the productivity leader for desktop computing.
0:06:02 That’s why I shared with you.
0:06:04 When I wrote the book,
0:06:07 spoken language processing with my colleague and Microsoft,
0:06:09 we loved Microsoft Word.
0:06:13 When the work came together, Google took advantage of it.
0:06:16 They enhanced productivity
0:06:19 to support multiple people working on the same docking.
0:06:23 As we know, Google Docs and the Sheet Slides,
0:06:27 they all really supported the cross-pin collaboration.
0:06:29 Right, the collaboration, yeah.
0:06:32 That took notice from almost everyone.
0:06:34 – Right. – One viewer
0:06:37 is an incremental info, in my opinion.
0:06:39 – How so? – Now, generally,
0:06:41 we are all in the same leveling field.
0:06:44 Whether it’s Microsoft, Google, or Zoom.
0:06:47 Of course, Zoom has a unique advantage.
0:06:48 You know, the most important business function
0:06:51 to connect people on the meeting.
0:06:52 We are the leader.
0:06:55 But just having that meeting capability
0:06:56 would be insufficient.
0:06:58 If you think about the work,
0:07:01 we have probably a few key functions.
0:07:05 Like, one is to consume information.
0:07:06 We need to learn.
0:07:09 This is using the human body to read,
0:07:13 to satisfy our own curiosity.
0:07:15 Generally, I can help you to read
0:07:20 500 or 800 page books, like one my colleague and I published.
0:07:22 800 pages in one page.
0:07:26 So, magically, YoungDai can redo this
0:07:31 and create an amazing amount of learning for almost everyone.
0:07:34 We just do not consume information.
0:07:35 That’s one of the important functions.
0:07:37 We also need to communicate.
0:07:39 Young friends bring people along.
0:07:43 So, YoungDai can help you to compose the draft
0:07:44 because we understand your need.
0:07:49 So, those two most important human fundamental capabilities
0:07:51 to read, to write, to speak
0:07:54 are a group to write and to speak on the same.
0:07:57 Right. Consume information to communicate, yeah.
0:08:01 They are going to be really, really fundamentally helped.
0:08:05 So, magically, we can take that capability.
0:08:08 We design productivity,
0:08:12 not just both those capabilities to the existing software.
0:08:15 That’s an opportunity to really, really possess.
0:08:18 Right. The approach for us to work on
0:08:22 productivity suite and the approach we are addressing AI,
0:08:24 there are three key ways about the highlight.
0:08:27 Now, talk about this in ZoonTopia.
0:08:30 Let me just explain this in details.
0:08:32 The first thing I want to really highlight is
0:08:34 our federally-AI stack.
0:08:38 So, we integrate the best from leading AI companies,
0:08:42 like Floppy, OpenAI, Meta,
0:08:45 (indistinct)
0:08:47 across leading open source opportunities.
0:08:51 We operate this with web search leaders,
0:08:53 they get a new one for complexity.
0:08:56 Okay. So, we federally, all of them together,
0:09:01 in addition to our own proprietary small language model
0:09:03 we are trained, we are developing,
0:09:06 that is really reaching amazing capability.
0:09:10 We appreciate that the small language model,
0:09:11 because of the scaling law,
0:09:14 need to really work together
0:09:17 with this amazing cloud-based large language models.
0:09:19 So, we have this unique approach
0:09:22 who combine them together seamlessly behind
0:09:27 to support the productivity of each individual state.
0:09:31 What does the small language model do in the stack?
0:09:32 How does it differ from what
0:09:34 you’re tasking the large language models with?
0:09:38 We are training the small language model,
0:09:40 like everyone else, to train large language model.
0:09:42 It’s just a whole other task.
0:09:44 Okay. In addition to that,
0:09:49 we’re also incorporating each individual’s unique contacts.
0:09:52 So, we can really make it that personalized.
0:09:56 In addition to consuming this massive amount of tokens,
0:09:58 right, always somewhere.
0:10:02 So, if I’m using Zoom’s AI features
0:10:04 and I give permission, Zoom can basically
0:10:07 just ingest all of my conversations,
0:10:11 all the meetings I have, the voice conversations,
0:10:14 the documents, the chats, all of that and use that all,
0:10:18 as context for the generative AI going forward.
0:10:21 This is the feature that is coming together
0:10:23 through our AI companion.
0:10:24 Oh, got it.
0:10:27 AI companion, to put it all, is horizontal, generic,
0:10:29 not personalized.
0:10:29 Okay.
0:10:31 It’s a custom AI companion
0:10:35 that we will introduce later this next year.
0:10:36 Next year, okay.
0:10:38 We’ll actually incorporate the ability
0:10:42 for anyone to customize and the personalized.
0:10:43 Right.
0:10:45 This is actually a very powerful opportunity
0:10:49 for the small language model running on the devices
0:10:53 to really augment what the large language models cannot offer
0:10:56 because you don’t understand your personal needs,
0:11:00 your learning pattern, your writing pattern, etc.
0:11:02 So, I just want to really first highlight
0:11:07 our federal AI stack is unique, very unique in the industry.
0:11:09 Unlike many other productivity companies,
0:11:10 they use only one.
0:11:11 Right.
0:11:13 And so for the audience who might not be familiar,
0:11:16 federated AI, a federated stack,
0:11:19 does that essentially just mean that the system can choose
0:11:23 which LLM to prompt depending on the situation?
0:11:25 Or what does federated mean, the way you’re using it?
0:11:28 There are multiple ways to federate.
0:11:30 The way to federate the large language models
0:11:33 and the small language model is a new frontier.
0:11:37 The way we are federating this is different from federated
0:11:37 learning.
0:11:38 Okay.
0:11:40 That’s what you’re trying to really combine
0:11:45 multiple models together to form this powerful capability
0:11:48 that can preserve the client, obviously.
0:11:51 What we’re doing is we can choose based on different workloads.
0:11:56 Because AI complaining to collo is almost like a super agent.
0:11:59 That is trying to understand with different modality,
0:12:03 different memory, expand, etc.
0:12:08 So, we can choose what model is best for different tasks.
0:12:11 We can also combine different models together.
0:12:14 And we can reflect like a chain of thought, we think.
0:12:16 And we perform the same tasks based
0:12:19 on what we have learned from a small language model,
0:12:20 for example.
0:12:23 So, if a small language model can perform a task
0:12:25 with very little work, then we start there.
0:12:26 It’s sufficient.
0:12:28 So, it’s a very sophisticated system
0:12:32 that can actually orchestrate multiple models together.
0:12:36 This has been developed and pushed by Zoom AI talents.
0:12:39 So, this is a very unique approach
0:12:41 that set us apart from almost anyone else.
0:12:42 Yeah.
0:12:43 You use the word agent.
0:12:47 And as we’re recording this, AI agents are…
0:12:48 There’s a lot of buzz.
0:12:50 I’m hearing a lot of buzz around the word agent
0:12:54 and the concept of agentic AI, which isn’t new.
0:12:56 But as it’s come to the fore lately,
0:12:58 can you talk a little bit about what that means,
0:13:01 what the idea of an AI agent is, kind of broadly,
0:13:04 but then specifically to how Zoom is using it?
0:13:06 And I want to come to that later.
0:13:07 You want to come to that later?
0:13:08 OK.
0:13:11 Nick, why we approach AI differently
0:13:14 with three key ways that are different
0:13:16 from traditional approach, right?
0:13:20 So, the first one, if you think about traditional
0:13:22 or the typical suite, most of the companies
0:13:26 are using one model, either OpenAI or Gemini,
0:13:29 to augment what they do.
0:13:33 They bolded their capability to the existing software.
0:13:34 Right.
0:13:37 So, on the back end, they are mostly using
0:13:40 one very good generation model,
0:13:42 either Gemini or OpenAI.
0:13:43 So, approaches different.
0:13:48 We massage OpenAI, anthropic, Gemini, matter,
0:13:51 and our own smaller model together
0:13:53 to offer a matched performance.
0:13:55 So, that’s the number one I want to really highlight.
0:13:58 Of course, we’re also integrating our partner,
0:14:00 our Plexi, for the amazing web search results.
0:14:01 For the web search.
0:14:05 So, whether it’s web, question, or work, question,
0:14:07 and in the future, personal question,
0:14:08 we can massage them together.
0:14:11 That’s what we are pushing to differentiate
0:14:14 the Zoom AI through our favorite approach.
0:14:15 That’s the number one.
0:14:16 OK.
0:14:20 Number two, our user experience is AI first.
0:14:22 This is what I call AUI.
0:14:27 We often swing optimized, graphically used interface,
0:14:31 as defined by Xerox, many, many, many years ago.
0:14:32 Yes.
0:14:34 You can go back.
0:14:38 It’s populated by Mac, Microsoft Windows.
0:14:43 So, both Office, Google Docs, are examples
0:14:46 of taking advantage of graphically used interface.
0:14:49 So, that’s the way I understood.
0:14:54 And challenging, we define conversational use interface.
0:14:57 They reach 100 million users, amazing, fast.
0:14:59 Right, faster than anyone.
0:15:02 So, what the Zoom is doing is developing AUI.
0:15:07 That will seamlessly combine GUI and the CUI together.
0:15:10 What that means in Zoom workplace,
0:15:14 AI companion to promote will be a persistent panel on the right.
0:15:15 OK.
0:15:18 And the traditional graphically used interface services,
0:15:20 whether it’s scheduling a meeting
0:15:22 or having a meeting with someone.
0:15:24 Right, you’re under your meeting.
0:15:25 Yeah.
0:15:26 It’s on the left.
0:15:27 OK.
0:15:31 Information flows seamlessly between those two in the AUI.
0:15:33 So, we are trying to take advantage of both
0:15:37 conversational use interface and the screen optimize
0:15:39 using the face seamlessly.
0:15:44 The future AI emission is one where the technology
0:15:47 intuitively adapts to your own needs.
0:15:48 That’s getting more personal.
0:15:50 That’s what we are coming with custom AI companion.
0:15:51 Sure.
0:15:56 When you say adapts, do you mean that the user interface changes
0:15:59 or that it can create a conversational window
0:16:01 sort of in context when you mean it?
0:16:07 Or could AI potentially just redesign the UI on the fly
0:16:08 to match what you’re doing?
0:16:09 How do you envision that?
0:16:11 AI has the vision.
0:16:13 We’re leading the point.
0:16:17 And with that, what information you want to consume,
0:16:18 how you want to consume it.
0:16:19 OK.
0:16:20 I want to AUI explain a bit.
0:16:24 It’s not because the interface has strategy
0:16:26 but it’s defined today.
0:16:28 I’ll just graph using the face.
0:16:30 Like a Zoom meeting is defined today.
0:16:31 Right.
0:16:34 Also, combine seamlessly in the multimodal environment.
0:16:38 We learn that based on your individual needs.
0:16:41 But right now, we’re trying to combine those two categories
0:16:42 into one.
0:16:46 GUI, traditional, is so massive.
0:16:48 Most of the world’s services and applications
0:16:50 that are GUI optimized.
0:16:50 Yeah.
0:16:55 CharterGDT, conversational user interface, is a new category.
0:16:58 And we just kind of have that to be the only one.
0:17:01 We’re bringing those two together with information flowing
0:17:04 across those two categories seamlessly.
0:17:05 Right.
0:17:07 And they’re trying to understand the user’s needs
0:17:09 and adapt on the fly.
0:17:09 OK.
0:17:11 That is what AUI is going to be.
0:17:11 Got it.
0:17:14 I want to really call in this world AUI.
0:17:16 So this is on record.
0:17:19 This is the first time I’m telling you in detail
0:17:21 what the future user interface is going to be.
0:17:21 World premiere.
0:17:22 I love it.
0:17:22 Yeah.
0:17:26 So that is the principle of the approach.
0:17:31 Zoom is taking, embracing AI natively.
0:17:32 That’s what will go AI first.
0:17:35 You joined Zoom about a year and a half ago,
0:17:36 a little less.
0:17:36 Yes.
0:17:39 When you can join, did you–
0:17:43 I know Zoom has had AI functionality, AI companion,
0:17:44 version one.
0:17:48 And you can use third party apps for transcription
0:17:49 and et cetera, et cetera.
0:17:51 That’s been around for a little while.
0:17:55 But when you joined, is this sort of you came in and thought,
0:17:57 OK, let’s rebuild this from the ground up.
0:17:58 AI centric.
0:18:01 Was that sort of already happening when you joined?
0:18:05 Just kind of wondering, as you stepped into the role,
0:18:09 sort of what was envisioned and how much you’ve
0:18:10 shaped things since then?
0:18:14 So every year, Zoom’s CEO got this great vision.
0:18:17 So Zoom has invested in AI before.
0:18:18 Yes.
0:18:19 Yeah.
0:18:22 Since I came, I worked with Eric and the leadership team
0:18:23 together.
0:18:24 We defined AI first.
0:18:25 OK.
0:18:26 Great.
0:18:29 Before I came, it was just adding AI,
0:18:31 both AI, almost every other company.
0:18:32 Right.
0:18:32 Yeah.
0:18:36 We have transformed that because of the consensus
0:18:39 and pushing AI first to the platform.
0:18:41 So what does AI first mean?
0:18:43 That’s three things.
0:18:45 The first is that AI technology back in.
0:18:50 Both the edge, small-lock, small-language model,
0:18:54 and build on the shoulders of great AI companies out there,
0:18:59 whether it’s OpenAI or Anthropic or Meta or other open source
0:19:01 companies like Mistra, et cetera.
0:19:03 Just that there are a lot of them.
0:19:06 It will be a mistake not to take advantage of all of them.
0:19:07 Of course.
0:19:07 Right.
0:19:10 So it’s like we form a committee working
0:19:12 to support our workloads.
0:19:14 That’s always better than just using
0:19:18 one single model trying to really perform the same task.
0:19:18 Right.
0:19:19 Two brains are better than one.
0:19:20 Yeah.
0:19:23 So you see how inclusive we are.
0:19:26 On the models, we’re trying to combine parts of them together.
0:19:28 And they’re using the face.
0:19:31 Like some companies that try to use the face is the only way.
0:19:34 Or graphics using the face is the only way.
0:19:36 I’m going to add a button here and there.
0:19:38 We are combining those two categories,
0:19:42 using them face classes, into one
0:19:44 that will adapt to your own needs,
0:19:47 with information flow between those two categories
0:19:51 seamlessly as a second important advance.
0:19:54 I want to use AUI as the phrase.
0:19:57 Longerize this design principle.
0:19:59 So the third thing I want to talk about is,
0:20:03 what is the work productivity suite?
0:20:06 That’s in the general AI Europe.
0:20:11 I would say it’s all about creating a true system of action.
0:20:14 We exist, but we have a task to do.
0:20:16 We take action.
0:20:19 Of course, you can say you want to entertain people,
0:20:22 but that’s not productivity suite.
0:20:24 They allow for payment software.
0:20:24 Sure.
0:20:26 AI will do that.
0:20:30 So when we say we are AI first work platform,
0:20:33 this is about AI companion is designed
0:20:35 to understand your workflow.
0:20:38 Can learn from your pattern.
0:20:39 Everyone got different workflow.
0:20:44 Everyone got different selection of services, software.
0:20:48 And we use AI to anticipate your personal needs,
0:20:50 emphasize that, personal needs.
0:20:54 And they can take action on your behalf with your permission,
0:20:58 or with your code participating to make a better decision,
0:21:01 more than what you can just do by yourself.
0:21:06 Those are the soul and the spirit of AI first productivity.
0:21:10 That’s very different from just to replace paper
0:21:13 with word processing.
0:21:18 Or just to support three people co-editing the same document.
0:21:23 Or just about formatting this document with nice fonts.
0:21:25 It’s about those three things.
0:21:29 It’s about learning from your own pattern,
0:21:32 anticipating your own personal needs,
0:21:34 and take action on your behalf.
0:21:37 Whether it’s tracking tasks or managing action items,
0:21:40 it’s always one step ahead of you,
0:21:44 ensuring that productivity flows seamlessly,
0:21:49 effortlessly throughout the whole ecosystem,
0:21:51 in workplace and the third part of the solution.
0:21:52 Right.
0:21:57 So if the AI companion understands my workflow
0:22:00 and then can suggest to me actions to take,
0:22:02 either now or going forward, is it
0:22:06 a case of imagining the AI would say to me,
0:22:09 hey, you should do these things in this order?
0:22:13 Or will it actually call up an additional tool
0:22:16 to help facilitate getting these things done?
0:22:17 Like, how does that?
0:22:19 Or how do you envision that working?
0:22:25 Just envision AI companion can proactively inform you.
0:22:29 You are not answering the question right in the meeting.
0:22:31 Only you can see that, right?
0:22:32 Right.
0:22:33 Just imagine how powerful that is going to be.
0:22:34 Right.
0:22:36 We’re in real time, and I start to give the wrong answer on it.
0:22:40 AI companion is always opening your ability
0:22:43 to influence others, make others like you better.
0:22:47 So this is just what I want to call another phrase.
0:22:51 So I’ll talk about federated AI stack, unique.
0:22:51 Yes.
0:22:54 I talked about the use interface, that’s AUI.
0:22:55 The AUI, yes.
0:22:59 This is about action-oriented task flow.
0:23:00 Action-oriented task flow, OK.
0:23:02 This will flow through every corner
0:23:05 for the whole life cycle of what you need to do.
0:23:08 Because it’s almost like you have a very
0:23:09 expensive ecosystem.
0:23:10 Yes, right.
0:23:13 The most important task that you need to pay attention to
0:23:16 for the life cycle of the whole project,
0:23:19 until you get that project done beautifully
0:23:21 in a time-sensitive manner.
0:23:25 And in a way, you delight your co-worker,
0:23:29 your family members, for better human connection.
0:23:34 This is the goal of Zoom’s AI-first work platform.
0:23:36 It’s action-oriented information flow.
0:23:39 If there’s something you don’t need to take action,
0:23:42 we can still accumulate those tasks to confuse you.
0:23:43 That’s OK.
0:23:45 And you can decide.
0:23:48 And you do not want to actually track those actions.
0:23:50 We learn that pattern from you.
0:23:50 Yeah.
0:23:53 And we improve our ability to track.
0:23:57 But if you select the old AI companion,
0:23:58 tell me this action is important.
0:24:00 You check that on that.
0:24:01 AI companion will work harder.
0:24:03 Keep your eyes open.
0:24:06 So a week later, if you receive an e-mail
0:24:08 that is relevant to the task you’re tracking,
0:24:13 AI companion will work 24/7 to update what you need to do
0:24:17 and to give your tips and to advise what you have to do better
0:24:19 to accomplish that task.
0:24:23 This is what I talk about, action or render information flow.
0:24:23 Right.
0:24:27 And so is this the AI companion of an example of an AI agent
0:24:28 working on your behalf?
0:24:29 Absolutely.
0:24:34 AI companion to Verneau already brought agent-like capabilities,
0:24:38 like in a meeting, will not just actually
0:24:40 use speech recognition to understand
0:24:42 what is being talked about.
0:24:46 If you presented your slide, AI companion to Verneau today
0:24:48 understand what is presented in the slide
0:24:52 or what you wrote on the paper that you shared
0:24:55 your swimming with that capability multimodal.
0:25:00 Or if you showed your points in the side panel with chat,
0:25:02 we take that into account as well.
0:25:03 That’s amazing.
0:25:06 It’s almost like an agent really participating in the meeting
0:25:06 as you do.
0:25:09 Then we present meeting recap.
0:25:12 In that meeting recap, the most powerful way for us
0:25:16 is identify next steps you need to pay attention to
0:25:19 or your colleague need to pay attention to.
0:25:21 Then next step is unique.
0:25:23 We offer a larger quality.
0:25:27 We worked on this so hard in the past year and a half
0:25:30 to improve next steps to reduce hallucination,
0:25:33 to assign the right task to the right person.
0:25:36 We are roughly right now probably 80% accurate.
0:25:36 OK.
0:25:39 So we’re not done with– it’s not perfect.
0:25:41 But 80% is really impressive.
0:25:44 I was going to say, in my experiences with LLMs
0:25:48 and hallucinations and accuracy, 80% sounds pretty good.
0:25:50 So we do not stop here.
0:25:54 So let’s say you have a meeting, you discuss.
0:25:58 You are campaigning to identify the task.
0:26:03 And that task we show up in the upcoming suite.
0:26:04 This task panel.
0:26:07 And a week later, through the whole lifecycle,
0:26:09 that’s something you want to try.
0:26:10 You want to try it.
0:26:13 You receive a piece of email from Zoom
0:26:16 and create an update that on your behalf.
0:26:19 And if you want to read or write a book about that action item
0:26:23 or status report for your colleague,
0:26:25 that information flow into Zoom docs
0:26:27 will drive the status report on your behalf.
0:26:29 You’ll feel pretty happy with a few changes
0:26:32 without doing everything using liquid paper
0:26:34 or Microsoft Word to format everything.
0:26:37 And you want to say, hey, change this into the form
0:26:42 that I can present in that next status update meeting.
0:26:42 Done.
0:26:43 One comment.
0:26:43 Yes.
0:26:44 Right.
0:26:48 And it’s not as beautiful as what a PowerPoint can
0:26:50 do with beautiful picture.
0:26:52 But the key points is very much like when
0:26:56 I was a colleague in Maryland, when we presented information
0:26:58 with black and white to tear off.
0:27:03 And printed on the really transparency paper.
0:27:08 And just really projected to talk about the key points.
0:27:10 We still actually reflect it.
0:27:11 It still works, yeah.
0:27:15 The information still flows with that beautiful color
0:27:16 black marble animation.
0:27:18 So that’s the point I want to take.
0:27:23 Zoom docs alone is actually performing most of the function
0:27:25 because of the general AI.
0:27:27 With AI company, you can instruct,
0:27:30 you can summarize in the form of status report,
0:27:33 who can publish this as a blog that’s
0:27:36 more ready to be consumed for the public,
0:27:39 or in the simple form of slides, that you
0:27:41 can communicate your points in your next meeting.
0:27:45 You do not need the last generation productivity suite,
0:27:46 as we know.
0:27:48 So that is the actual high-level three key points.
0:27:50 Friendly AI stack.
0:27:54 AI first, use interface, AUI, and action-oriented information
0:27:55 flow for productivity.
0:27:57 That is really the landmark.
0:28:01 How AI first, productivity suite,
0:28:04 or differential form, web centric, productivity suite,
0:28:06 or desktop centric.
0:28:09 Of course, both web centric and desktop centric
0:28:12 can add AI capability.
0:28:13 That’s bolded.
0:28:13 That’s not–
0:28:14 Sure.
0:28:14 Yeah.
0:28:16 What Zoom is defining this native.
0:28:17 Right.
0:28:19 Back-action-oriented information flow
0:28:23 into every corner of the productivity suite.
0:28:24 I made myself clear.
0:28:25 Only three things.
0:28:27 [LAUGHTER]
0:28:29 Our guest today is XD Wang.
0:28:32 XD is the CTO of Zoom, a position
0:28:34 he’s held for going on a year and a half now.
0:28:39 Before that, he was with Microsoft for quite some time.
0:28:41 And really, it’s just a continuation
0:28:44 of an illustrious career that started way back,
0:28:46 as XD was talking about in the days of typewriters
0:28:48 and liquid paper, as we both know.
0:28:50 XD, I want to switch gears here.
0:28:51 We have a few minutes left to talk.
0:28:55 I want to look at things from the public standpoint,
0:28:57 and specifically from business users
0:29:00 and the types of customers who Zoom has been working with
0:29:01 for a while now.
0:29:06 When Zoom is talking to business customers about AI,
0:29:09 about adopting Zoom’s products and all these wonderful things
0:29:13 you’re building, but personally just about using generative AI
0:29:16 and making the investment and spending the time to upskill
0:29:20 workers and figure out, how are we using these things?
0:29:24 How do you help your customers think about both adopting AI
0:29:27 and also how to measure return on investment?
0:29:30 There’s a lot of conversation we’ve had on the podcast
0:29:33 and just generally in the world about being kind of,
0:29:36 for exciting as the past couple of years have been,
0:29:39 still being in the early days of figuring out,
0:29:41 what can Gen AI do?
0:29:42 How do we use it?
0:29:45 How do we rethink things like productivity
0:29:47 suites from the ground up with AI?
0:29:50 So when you’re talking to the customer companies,
0:29:52 how do you educate them about getting started
0:29:54 and measuring performance?
0:29:58 There are a few things, absolutely our customers love.
0:30:03 First, Zoom workplace as a whole offers ease of use.
0:30:07 That’s just unmatched with its tingle, docks, or chat.
0:30:11 The second thing is really, Zoom AI campaigning to promote
0:30:14 is offered under no additional cost.
0:30:17 That’s just stunning for most of the customers.
0:30:21 Because they get used to, you have to pay $30 a month
0:30:22 for person, right?
0:30:27 So Zoom offered this capability for the high-end customers
0:30:29 who are going to offer a custom AI company
0:30:32 where we charge $12 per person per month.
0:30:33 So you can bring your own data,
0:30:36 get your own pattern into the AI campaigning,
0:30:39 campaigning where you can run that on your behalf
0:30:41 with fine-tuning the customized capabilities.
0:30:45 So Zoom offers this amazing horizontal,
0:30:48 absolutely game-changing capability
0:30:53 to make a Zoom workplace a very viable productivity candidate
0:30:54 for just this.
0:30:55 And for the high-end,
0:30:58 we offer you unmatched custom AI campaigning
0:31:01 before the info was $12 per person per month.
0:31:04 It still offers the best in the TCO.
0:31:08 So interviews, cost-defective, unmatched quality.
0:31:11 That’s what we know of our customers.
0:31:14 – And is kind of the, I don’t know, a goal or sort of the vision
0:31:19 that customers who use Zoom come away with is that the AI,
0:31:22 the companion is just over time going to learn more and more
0:31:26 about how you work and what your workflows are like
0:31:29 and how you sequence tasks and the people,
0:31:31 the colleagues you’re working with,
0:31:33 and the companion will just be there
0:31:36 to help you think a couple of steps ahead,
0:31:39 help you maximize your own efficiency,
0:31:40 whatever the word is.
0:31:43 Is that, ’cause that’s a different conversation
0:31:45 than conversations I’ve had
0:31:46 or I’ve read about or listened into
0:31:48 where companies are saying,
0:31:52 “Okay, we need to start with wrangling all our data
0:31:54 “and then we need to figure out how to clean the data
0:31:55 “and how to…”
0:31:58 And it’s kind of this big, deep investment process
0:32:00 where it sounds like with Zoom,
0:32:02 it’s more like, “Hey, you’re already using it
0:32:05 “for video calls and now we’re gonna give you
0:32:09 “this groundbreaking change the way
0:32:11 “that you do everything companion.”
0:32:12 And it’s just kind of gonna be there
0:32:15 and there’s not a lot you have to do as the user.
0:32:18 – Yeah, so we offer the choice to our customers.
0:32:21 If they have the comfort,
0:32:25 yes, who have improved customized capability
0:32:27 to suit their needs.
0:32:29 If they don’t, they can decide
0:32:31 how much data they want to share
0:32:32 or whether they want to turn off
0:32:36 for some sensitive meeting.
0:32:41 That complexity is in the hands of the customer.
0:32:43 They can control themselves directly.
0:32:44 – So giving them the choice.
0:32:46 – Yeah, on top of that,
0:32:47 I want to really emphasize,
0:32:50 Zoom never takes any custom data
0:32:53 in the meeting to train all the way on model.
0:32:56 – Let’s end on a kind of looking ahead note,
0:32:57 if that’s all right.
0:32:59 As you envision the next,
0:33:01 I’m gonna say three years, you can change that.
0:33:04 Two years, five years, whatever you think it is.
0:33:06 Both in terms of Zoom’s mission
0:33:08 and using AI and generative AI
0:33:11 to help people do things smarter,
0:33:13 better, faster in the workplace.
0:33:15 And then more broadly,
0:33:18 as Gen AI and other forms of AI,
0:33:19 and machine learning and deep learning
0:33:22 just continue to impact the world more.
0:33:25 What are you most excited about
0:33:27 in the short term?
0:33:28 Again, three years, however long it is.
0:33:30 What are you really excited about
0:33:34 and see coming down the pike that may,
0:33:36 I don’t know if it’s the next transformational moment
0:33:38 or just kind of a trend
0:33:39 that’s gonna really take fire
0:33:40 and change the way we do things.
0:33:42 What are you looking at too?
0:33:43 – I’m really thinking this
0:33:45 action of like the estimation flow
0:33:47 to be your companion.
0:33:48 – Yeah.
0:33:50 – This is really just a game changer
0:33:54 and all of us will not have enough time.
0:33:57 So if a company can really help you
0:33:59 to get job done quickly,
0:34:02 you can have additional time to do whatever you want.
0:34:04 That’s an opportunity capability,
0:34:05 some entertainment for–
0:34:06 – Whatever it is, yeah.
0:34:08 – Yeah, and we’ll also bring,
0:34:10 this will be a better place.
0:34:11 That’s what Gen AI is.
0:34:13 We’ll make you work happy,
0:34:17 be happy and do whatever you want.
0:34:18 – Be happy in between.
0:34:19 You’re doing whatever you want.
0:34:22 – So Delighted Customers is in a core mission.
0:34:23 – Excellent.
0:34:26 XD, for people who would like to learn more
0:34:28 about what Zoom’s doing,
0:34:30 announcements at ZoomTopia,
0:34:31 perhaps some of the,
0:34:33 I don’t know if there’s a technical blog
0:34:36 for developers and people more technically inclined
0:34:38 to learn more about how you’re approaching everything,
0:34:41 federated AI and everything else we’ve discussed.
0:34:44 Where’s a good place or some good places online
0:34:46 for people to get started to learn more?
0:34:50 – Yeah, you can check out the Zoom blog in Zoom.com.
0:34:53 That’s actually probably the best place to learn.
0:34:54 – Best place to start.
0:34:55 – But even better wise,
0:35:00 really start turning on AI company in Zoom workplace.
0:35:03 Without using that, you do not know how powerful–
0:35:06 – You gotta use it, yeah, absolutely, fantastic.
0:35:08 XD, thank you so much for taking the time,
0:35:09 particularly at the end of this,
0:35:11 what I’m sure was a busy week, a crazy week for you.
0:35:14 But congratulations on ZoomTopia,
0:35:15 on the work you’ve done so far,
0:35:19 and I for one am excited to use Companion 2.0,
0:35:22 if I could have a panel on the side of my screen
0:35:25 that’s always telling me that the next best thing I should do,
0:35:27 that would be a game changer for me personally,
0:35:28 so I’m excited to go ahead and start–
0:35:30 – Absolutely, I haven’t even got an AI companion.
0:35:33 My productivity software, seriously.
0:35:35 – Yes, fantastic.
0:35:36 Well, thank you again,
0:35:38 and perhaps we can catch up somewhere down the line
0:35:40 to see what’s going on at ZoomTopia next year.
0:35:42 – Absolutely, thank you.
0:35:43 I have to be here.
0:35:46 (somber music)
0:35:49 (inspirational music)
0:35:52 (inspirational music)
0:35:55 (inspirational music)
0:35:58 (inspirational music)
0:36:02 (inspirational music)
0:36:05 (inspirational music)
0:36:08 (inspirational music)
0:36:11 (inspirational music)
0:36:14 (inspirational music)
0:36:17 (inspirational music)
0:36:21 (inspirational music)
0:36:24 (inspirational music)
0:36:27 (inspirational music)
0:36:30 (inspirational music)
0:36:33 (upbeat music)
0:36:43 [BLANK_AUDIO]
Zoom, a company that helped change the way people work during the COVID-19 pandemic, is continuing to reimagine the future of work by transforming itself into an AI-first communications and productivity platform.
In this episode of NVIDIA’s AI Podcast, Zoom CTO Xuedong (XD) Huang shares how the company is reshaping productivity with AI, including through its Zoom AI Companion 2.0, unveiled recently at the Zoomtopia conference.
Designed to be a productivity partner, the AI companion is central to Zoom’s “federated AI” strategy, which focuses on integrating multiple large language models.
Huang also introduces the concept of “AUI,” combining conversational AI and graphical user interfaces (GUIs) to streamline collaboration and supercharge business performance.