Roboflow Simplifies Computer Vision for Developers and the Enterprise – Ep. 248

AI transcript

🕒

Việt

中文

0:00:10 [MUSIC]
0:00:13 Hello, and welcome to the NVIDIA AI podcast.
0:00:15 I’m your host, Noah Kravitz.
0:00:19 90% of the information transmitted to human brains is visual.
0:00:22 So while advances related to large language models and
0:00:26 other language processing technology have pushed the frontier of AI forward in
0:00:28 a hurry over the past few years.
0:00:32 Visual information is integral for AI to act with the physical world,
0:00:34 which is where computer vision comes in.
0:00:37 Roboflow empowers developers of all skill sets and
0:00:41 experience levels to build their own computer vision applications.
0:00:44 The company’s platform addresses the universal pain points developers face
0:00:47 when building CV models, data management to deployment.
0:00:51 Roboflow is currently used by over 16,000 organizations and
0:00:55 half the Fortune 100, totaling over 1 million developers.
0:00:59 And they’re a member of NVIDIA’s Inception program for startups.
0:01:03 Roboflow co-founder and CEO, Joseph Nelson, is with us today to talk about
0:01:08 his company’s mission to transform industries by democratizing computer vision.
0:01:09 So let’s jump into it.
0:01:13 Joseph, welcome, and thank you for joining the NVIDIA AI podcast.
0:01:14 >> Thanks so much for having me.
0:01:19 >> I’m excited to talk CV, there’s nothing against language, love language.
0:01:22 But there’s been a lot of language stuff lately, which is great.
0:01:25 But I’m excited to hear about Roboflow.
0:01:26 So let’s jump into it.
0:01:30 Maybe you can start by talking a little bit more about your mission and
0:01:34 what democratizing computer vision means and making the world programmable.
0:01:37 >> At the highest level, as you just described,
0:01:40 the vast majority of information that humans process happens to be visual
0:01:41 information.
0:01:45 In fact, I mean, humans, we had our sense of sight before we even created language.
0:01:46 It’s how we understand the world.
0:01:50 It’s how we understand the things around us, it’s how we engage with the world.
0:01:56 And because of that, I think there’s this massive untapped potential
0:01:59 to have technology and systems have visual understanding to
0:02:04 a similar way that humans do all across, really, we say the universe.
0:02:07 So when we say our North Star is to make the world programmable,
0:02:09 what we really mean is that any scene and
0:02:13 anything will have software that understands it.
0:02:14 And when you have software that understands something,
0:02:16 you can improve that system.
0:02:18 You can make it be more efficient, you can make it be more entertaining,
0:02:20 you can make it be more engaging.
0:02:23 I mean, at Roboflow, we’ve seen folks build things from understanding
0:02:27 cell populations under a microscope all the way to discovering new galaxies
0:02:28 through a telescope.
0:02:31 And everything in between is where vision and video and
0:02:33 understanding comes into play.
0:02:36 So if this AI revolution is to reach its full potential,
0:02:38 it needs to make contact with the real world.
0:02:42 And it turns out the real world is one that is very visually rich and
0:02:43 needs to be understood.
0:02:47 So we build the tools, the platform, and the community to accelerate that transition.
0:02:51 So maybe you can tell us a little bit about the platform and
0:02:55 kind of within that how your mission and North Star kind of shape the way you
0:02:57 develop products and build out user experiences.
0:03:02 And I should mention a great shout out from Jensen in the CES keynote earlier
0:03:04 this year for you guys.
0:03:06 And you raised Series B late last year.
0:03:10 So I want to congratulate you on that as well before I forget.
0:03:10 >> I appreciate it.
0:03:12 Yeah, it’s a good start.
0:03:15 I mean, as you said, a million developers, but there’s 70 million developers out there.
0:03:18 There’s billions that will benefit from having visual understanding.
0:03:21 I mean, in fact, in that Jensen shout out, just like maybe the sentence or
0:03:25 two before he described some of the visual partners that are fortunate to work
0:03:26 with the NVIDIA team.
0:03:30 He described that the way NVIDIA sees it is that global GDP is $100 trillion,
0:03:35 and he describes that visual understanding is like $50 trillion of that opportunity.
0:03:39 So basically half of all global GDP is predicated on these
0:03:43 operationally intensive, visual, centric, autonomy-based sorts of use cases.
0:03:47 And so, the level of impact that visual understanding will have in the world is
0:03:50 just a fraction of what it will look like as we progress.
0:03:53 Now, in terms of like how we think about doing that, so
0:03:56 it’s really about empowering the builders and giving the certainty and
0:03:58 capability to the enterprise.
0:04:02 So for example, anyone that’s building a system for visual understanding often
0:04:07 needs to have some form of visual input, camera, video, something like this.
0:04:08 >> Sure.
0:04:10 >> You need to have a model because the model is going to help you act,
0:04:15 understand, and react to whatever maybe actual insight you want to understand.
0:04:17 And then you want to run that model some more, you want to deploy it.
0:04:21 And commonly, you even want to chain together models or
0:04:23 have a system that triggers some alerts or
0:04:26 some results based on information that it’s understanding.
0:04:29 So Robofill provides the building blocks, the platform, and
0:04:34 the solutions so that over a million developers and half the Fortune 100
0:04:37 have what they need to deploy these tools to production.
0:04:41 And you’re doing it kind of mentioned in the intro.
0:04:45 Trying to make the platform available for folks who are deep into this,
0:04:48 have been doing CV and working with machine learning for a while.
0:04:51 And then also folks who might be new to this, they can get up and
0:04:55 running and work with CV, build that into their toolkit.
0:05:00 >> Yeah, the emphasis has always been kind of on someone that wants to be a builder.
0:05:04 That definition is expanding with the capabilities of code generation,
0:05:05 prompting to apps.
0:05:08 We’ve always kind of been bent on this idea that those that
0:05:11 want to create, use, and distribute software.
0:05:14 What’s funny is that when we very first launched some of our first products,
0:05:16 ML teams initially were kind of like, I don’t know,
0:05:18 this seems pretty pedestrian, I know exactly what to do.
0:05:21 And fast forward now, and it’s like, whoa, a platform that’s fully
0:05:25 featured that has immediate access to the latest models to use on my data in
0:05:28 the contexts of where I can even anticipate.
0:05:31 So it’s been kind of, as the platforms become more feature rich,
0:05:35 we’ve been able to certainly enable a broader swath of both maybe
0:05:37 domain experts of a capability.
0:05:41 But I think broadly speaking, the places that we see the rapid,
0:05:44 most impactful adoption in some ways is actually bringing vision to others that
0:05:46 otherwise may not have had it.
0:05:51 Like what used to be maybe like a multi-quarter PhD thesis level investment
0:05:54 now can be something that a team spends up in an afternoon.
0:05:58 And that really has this kind of demand-begets-demand paradigm.
0:06:01 I mean, for example, one of our customers, they produce electric vehicles.
0:06:05 And when you produce an EV inside their general assembly facility,
0:06:08 there’s all sorts of things that you need to make sure you do correctly as you
0:06:12 produce that vehicle, from the worker safety who are doing the work to
0:06:16 the machines that are outputting, say, when you do what’s called stamping,
0:06:19 where you take a piece of steel or aluminum and you press it into the shape
0:06:23 of the outline of the vehicle, and you get up potential tears or fissures or
0:06:26 when you actually assemble the batteries out of the correct number of screws.
0:06:30 Basically, every part of building a car is about visually
0:06:33 validating that the thing has been built correctly so that when a customer
0:06:36 drives it, they can do so without any cause for pause.
0:06:40 And just a little bit of computer vision goes a really long way in enabling
0:06:44 that company and many others to very quickly accelerate their goals.
0:06:47 In fact, this company had the goal of producing 1,000 vehicles three years ago,
0:06:50 and they barely did it at 1,012 in that year.
0:06:53 And then they scaled up to 25,000 and now 50,000.
0:06:56 And a lot of that is on the backs of having things that they know they’re
0:06:57 building correctly.
0:07:01 And so we’re really fortunate to be a part of enabling things like this.
0:07:04 So it’s kind of like you could think about it of a system that doesn’t have
0:07:07 a sense of visual understanding, adding even just a little bit of visual
0:07:11 context totally rewrites the way by which you manufacture a car.
0:07:14 And that same revolution is going to take place for
0:07:16 lots of operationally intensive processes, but
0:07:19 any kind of place that you interact with the world each day.
0:07:23 >> So kind of along those lines, what to use the phrase untapped opportunities
0:07:27 you see out there, what’s the low hanging fruit or maybe the high hanging fruit,
0:07:30 but you’re just excited about when it comes to developing and
0:07:33 deploying computer vision applications.
0:07:37 And we’ve been talking about it, but obviously talk about robo flows.
0:07:40 Work not just supporting, but helping developers kind of unlike helping
0:07:42 builders unlock what’s next.
0:07:46 >> The amazing thing is actually the expanse of the creativity of developers
0:07:49 and engineers, it’s like if you give someone a new capability,
0:07:53 you almost can’t anticipate all the ways by which they’ll bring that capability
0:07:53 to bear.
0:07:57 So for example, I mean, we have folks that hobbyists that’ll make things,
0:08:00 like the number of folks that make things that like measure the size of fish.
0:08:03 Cuz I think they’re trying to prove to their friends that they caught the biggest
0:08:06 fish, and then you separately have like government agencies that have wanted to
0:08:09 validate the size of salmon during migration patterns.
0:08:12 And so this primitive of like understanding size of fish both has
0:08:14 what seems to be fun and very serious implications.
0:08:17 Or folks that I don’t know, like a friend of mine recently was like,
0:08:21 hey, I wonder how many cars out in San Francisco are actually Waymo’s,
0:08:22 versus like other sorts of cars.
0:08:23 And what does that look like?
0:08:24 What does that track over time?
0:08:28 And so they had a pretty simple like Raspberry Pi camera parked down their
0:08:31 windowsill and in an afternoon, now they have a thing that’s counting,
0:08:34 tracking, and keeping a tabulation on how many self-driving cars are making
0:08:37 their way on the road, at least sampled in front of their house each day.
0:08:40 >> Right, no, I don’t wanna call anybody out, but that’s not the same person
0:08:45 who had the video of all the Waymo’s in the parking lot in the middle of the
0:08:46 night in San Francisco in circles.
0:08:47 No, okay.
0:08:49 >> It wasn’t that guy.
0:08:53 >> Yeah, but I mean, like the use cases are expansive because I don’t know,
0:08:57 like the way we got into this, right, is like we were making AR apps, actually.
0:09:00 And computer vision was critical to the augmented reality,
0:09:02 understanding the system around us.
0:09:04 And we’ve since had folks that make, you know,
0:09:08 board game understanding, technology, D&D dice counters,
0:09:10 telling you the best first movie you should play in Catan.
0:09:12 And so basically, like you have this like creative population of folks.
0:09:15 Or this one guy during the pandemic, you know, he’s really bored, locked inside.
0:09:19 And he thought maybe his cat needed to get some more exercise.
0:09:22 And he created this system that like attached a laser pointer to a robotic arm.
0:09:26 And with a little bit of vision, he made it so the robotic arm consistently points
0:09:28 the laser pointer 10 feet away from the cat.
0:09:31 That’s like jumping around the living room and makes this whole YouTube tutorial.
0:09:33 But then like the thing that’s really interesting, right, is that like,
0:09:37 you know what technology has arrived when like a hacker can just like build
0:09:40 something in a single setting or maybe like in a weekend.
0:09:46 You know that what used to be this far difficult to access capability is now
0:09:50 broadly accessible. And that fuels like a lot of like the similar sort of enterprise
0:09:53 use cases. Like we got to have a joke at Rebelflow that like one person’s
0:09:56 hobbyist projects is another person’s entire business.
0:09:59 So the low hanging fruit, frankly, is like everywhere around us.
0:10:02 Like any sort of visual feed is untapped.
0:10:05 This sort of images that someone might be collection and gathering.
0:10:07 I mean, the similar things that give rise to machine learning certainly apply
0:10:11 to vision where the amount of visual inputs doubling year on year and petabytes
0:10:14 of visual information to be extracted.
0:10:16 So it’s kind of like, if you think about it, you can do it.
0:10:21 That makes me think of an episode we did recently with a surgeon who founded
0:10:26 a surgical data collective and they were using just these, I say stacks.
0:10:28 They weren’t actually videotapes, I’m sure.
0:10:33 But all of this unwatched unused footage from surgeries to train a model
0:10:36 to help train surgeons how to do their jobs better.
0:10:40 But if you want to ask, are the inputs that could, the visual inputs that can go
0:10:43 into RoboFlow, it doesn’t have to be a live camera stream.
0:10:46 You can also use, you know, archive footage images.
0:10:47 That’s correct.
0:10:47 Yep.
0:10:48 Yep.
0:10:51 So someone may have like a backlog of a bunch of videos.
0:10:54 I mean, for example, we actually had a professional baseball team where
0:10:58 they had a history of a bunch of their videos of pitching sessions and they
0:11:03 wanted to run through and do a key point model to identify various poses and time
0:11:06 of release of the pitch and how does that impact someone’s performance over time.
0:11:09 And so they had all these videos that from the past that they wanted to run through.
0:11:12 And then pretty soon they started to do this for like minor leagues where you
0:11:15 might not have scouts, omnipresent, and you certainly don’t have broadcasts.
0:11:16 Right, right, right.
0:11:19 You just have like this like kind of low quality footage on like cameras from
0:11:22 various places and being able to produce like sports analytics out of, you know,
0:11:26 this information that’s just locked up otherwise and this unstructured visual
0:11:30 capture is now available for, in this case, building a better baseball team.
0:11:31 Yeah, that’s amazing.
0:11:36 Building community, you know, is something that is both vital to a lot
0:11:39 of companies like tech companies and developer platforms and such.
0:11:43 But it also can be a really hard thing to do, especially to build, you know,
0:11:48 an organic, genuine, robust community serving enterprise clients, let alone
0:11:54 across this seemingly endless sort of swath of industries and use cases and such.
0:11:57 You know, also pretty resource intensive.
0:11:58 So how do you balance both?
0:12:02 How is Roboflow approaching, you know, building that community that you’re
0:12:08 talking about just now with serving, you know, these I’m sure demanding in a good
0:12:10 way, but, you know, demanding enterprise clients.
0:12:13 I think the two actually go hand in hand more than many would anticipate.
0:12:14 Okay.
0:12:18 When you build a community and you build a large set of people that are
0:12:23 interested in creating and using a given platform, you actually give a company
0:12:26 leverage basically like the number of people that are building, creating and
0:12:30 sharing examples of Roboflow from a very early day made us seem probably much
0:12:33 bigger than maybe we were or are.
0:12:36 And so that gives a lot of trust to enterprises.
0:12:40 Like, you know, you want to use something that has gone through its paces and
0:12:43 battle tested something that might be like an industry standard.
0:12:47 And you don’t become an industry standard by only limiting your technology to a
0:12:49 very small swath of people.
0:12:53 You enable anyone to kind of build, learn the paradigm and create.
0:12:58 Now, you’re right that both take a different type of thoughtfulness to be
0:12:59 able to execute on.
0:13:03 So in the context of like community building and making products for developers.
0:13:06 A lot of that I think stems from, you know, as an engineer, there’s
0:13:09 products that I like using in the ways that I like to use those products.
0:13:12 And I want to enable others to be able to have a similar experience of the
0:13:12 products that we make.
0:13:16 So it’s, it’s things like providing value before asking for value, having a very
0:13:19 generous free chair, having the ability to highlight the top use cases.
0:13:22 I mean, we have a whole like research plan where if someone’s doing stuff on a
0:13:26 .edu domain, then they have increased access to GPUs.
0:13:30 Roboflow has actually given away over a million dollars of compute and GPU usage
0:13:33 for open source computer vision projects.
0:13:37 And, you know, we actually have this, it’s kind of a funny stat, but 2.1
0:13:40 research papers are published every day, citing Roboflow.
0:13:43 And those are things like people are doing all these sorts of things.
0:13:44 That’s a super cool stat, I think.
0:13:48 Yeah, I mean, it just gives it, it just gives you the context to like, yeah,
0:13:52 like that is someone’s maybe six or 12 month thesis that they’ve spent trying
0:13:57 to inject and to be able to empower folks to realize what’s possible.
0:14:01 And it’s really kind of the fulfillment of like our mission at its core, like
0:14:05 the impact of visual understanding is bigger than anyone company and anything
0:14:09 that we can do to allow the world to see, expose and deploy that is important.
0:14:12 Now, on the enterprise side, what we were really talking about is building
0:14:15 a successful kind of go to market motion and making money to invest
0:14:19 further in our mission and enterprises, as you alluded, are very resource
0:14:23 intensive in terms of being able to service those needs successfully.
0:14:26 Even there, though, you actually get leveraged by seeing the sorts of
0:14:30 problems, seeing the sorts of fundamental building blocks and then productizing
0:14:33 those building blocks, you know, there have been companies that have come
0:14:37 before Roboflow who have done a great job of be very hands on with enterprise
0:14:40 customers and productizing those capabilities, company like Pivotal or
0:14:45 Palantir or these large companies that have gone from, hey, let’s kind of do
0:14:48 like a bespoke way of making something possible and deploy it more broadly.
0:14:52 Now, we’re not fully, you know, like for like with those businesses.
0:14:56 I more give that as an example to show as someone that is building tooling
0:15:00 and capabilities, worst case is you’re giving the enterprise substantially
0:15:03 more leverage and certainly best case is there’s actually a symbiotic
0:15:06 relationship between enterprises being able to discover how to use the
0:15:10 technology, be able to find guides from the community, be able to find models
0:15:11 they want to start from.
0:15:15 I mean, Roboflow Universe, which is the open source collection of data sets
0:15:18 and models, is the largest collection of computer vision projects on the web.
0:15:22 There’s about 500 million user labeled and shared images and over 200,000
0:15:23 pre-trained models.
0:15:26 And that’s used for the community just as much as enterprise, right?
0:15:28 Like when you say enterprise, like enterprise is people.
0:15:31 And so there’s people inside those companies that are creating and building
0:15:32 some of those capabilities.
0:15:35 Now, operationalizing and ensuring that we deliver the service quality, it’s
0:15:39 just the types of teams you build and the way that you prioritize companies
0:15:39 to be successful.
0:15:42 But we’re really fortunate that, you know, we’re not writing the playbook here.
0:15:47 There’s been a lot of companies that, you know, Mongo or Elastic or Twilio or
0:15:51 lots of post IPO businesses that have shown the pathway to both building
0:15:55 really high quality products that developers and builders love to use
0:15:59 and ensuring that they’re enterprise ready and meeting the needs of high
0:16:02 scale, high complexity, high value use cases.
0:16:04 So you use the word complexities.
0:16:09 And, you know, one of the things that I hear all the time from, and I’m
0:16:12 sure you more than me from people are trying to build anything is sort of how
0:16:17 do you balance, you know, creativity and coming up with ways to solve problems.
0:16:20 And particularly if you get into kind of a unique situation and you need to find
0:16:24 a creative answer with things, with not letting things get too complex.
0:16:27 And, you know, in something like computer vision, I’m sure the technical
0:16:30 complexities can, can spin up in a hurry.
0:16:33 What’s been, you know, your approach and what success is?
0:16:35 How have you found success and balancing that?
0:16:40 Complexity for a product or global flow is always a balance.
0:16:44 You want to offer users the capability in the advanced settings and the ability
0:16:46 to make things their own.
0:16:50 Well, also part of the core value proposition is simplifying.
0:16:53 And so you often think, oh man, those two things must be at odds.
0:16:55 How do you simplify something, but also serve complexity?
0:17:00 And in fact, they’re not, especially for products like Roboflow, where it is
0:17:03 four builders, you make it very easy to extend.
0:17:07 You make it very interoperable, meaning there’s open APIs and open SDKs, where
0:17:11 if there’s a certain part of the tool chain that you want to integrate with or
0:17:13 there’s a certain specific enterprise system where you want to read or write
0:17:16 data to, that’s all supported on day one.
0:17:19 And so if you try to kind of boil the ocean of being everything in the
0:17:24 platform all at once on day one, then you can find yourself in a spot where you
0:17:28 may not be able to service the needs of your customers well.
0:17:30 In fact, it’s a bit more step by step.
0:17:33 And that’s where, you know, the devil’s in the details of execution of which
0:17:36 steps you pick first, which sort of problems you best nail for your customers.
0:17:39 But philosophically, it’s really important to us that, for example, when
0:17:43 someone is building a workflow, which is, you know, the combination of a model
0:17:47 and some inputs and outputs, you might ingest maybe like an RTSP stream from
0:17:49 like a live feed of a camera.
0:17:51 Then you might have like a first model that’s, once you’re the problem that
0:17:57 we’re solving is we’re an inventory company and we’re concerned about worker safety.
0:18:00 You might have a first model that’s just like constantly watching all frames to
0:18:03 see if there’s a presence of a person, a very lightweight model kind of run in
0:18:04 the edge.
0:18:07 And then maybe a second model of when there’s a person, you ask a large vision
0:18:10 language model, large VLM, Hey, is there any risks here?
0:18:11 Is there anything to consider?
0:18:13 Should we like look more closely at this?
0:18:17 And then after the VLM, you might have a another specific model that’s going to
0:18:22 do validation of the type of danger that is interesting, or maybe the specific
0:18:25 area within your, your store, maybe you’re going to connect to another
0:18:26 database that exists.
0:18:29 And then like, based on that, you’re going to write some results somewhere.
0:18:32 And maybe you’re going to write that result to have insights of how frequently
0:18:37 there was a cause for concern within the process that you’re monitoring, just as
0:18:40 much as maybe you’re flagging an alert and maybe send in a text or an email or
0:18:43 writing to an enterprise system like SAP to keep track.
0:18:50 And at each step of that juncture, any one of those nodes, since it’s built for
0:18:54 us on an open source platform, which we call inference, you can actually mix
0:18:58 and match, write your own custom Python, write an API in way or one way or another.
0:19:02 And so let’s imagine like a future where someone wanted the ability to write
0:19:04 to a system that we didn’t support yet, like first party.
0:19:05 You’re actually not out of luck.
0:19:09 As long as that system accepts a post request, you’re fine.
0:19:11 And so you have the ability to extend the system.
0:19:15 Yeah. And so it’s this sort of paradigm of like interoperability and making
0:19:17 it easy to use with alongside other tools.
0:19:20 And it gets back to your point around servicing builders, just as much as the
0:19:25 enterprise, I actually think those things are really closely interlinked because
0:19:29 you provide the flexibility and choice and ability to make something mine and
0:19:33 build a competency inside the company of what it is I wanted to create and deploy.
0:19:37 Right. The way you frame that makes a lot of sense and makes that link very clear.
0:19:40 We’re speaking with Joseph Nelson.
0:19:43 Joseph is the co-founder and CEO of Roboflow.
0:19:48 And as he’s been detailing Roboflow provides a platform for builders to use
0:19:50 computer vision in what they’re building.
0:19:55 Joseph, you know, I in the intro sort of alluded to all the all the advances
0:19:58 and buzz around large language models and that kind of thing over the past couple
0:20:02 of years. And I meant to ask Roboflow was founded in 2020.
0:20:05 Roboflow Inc. was incorporated in 2020.
0:20:06 That’s right. Got it.
0:20:09 And so anyway, kind of fast forwarding to, you know, more recently, the past, I
0:20:15 don’t know, six months, year, whatever it’s been, a lot of buzz around agents,
0:20:17 the idea of agentic AI.
0:20:21 And then, you know, there was buzz, I guess, that the word multimodal was being
0:20:26 flung around kind of more frequently, at least in circles I run in for a while.
0:20:29 And then it sort of dropped off just as people, you know, there were the
0:20:33 consumer models that the Clause and ChatGPTs and Geminis and what have you
0:20:37 in the world, just started incorporating visual capabilities, both to, you know,
0:20:43 ingest and understand and then to create visual output, voice models, you know,
0:20:45 now getting into short video clips, all that kind of stuff.
0:20:51 What’s your take on the role of multimodal AI integration when it comes to advancing
0:20:55 CV, you know, how is Roboflow kind of positioned to support this?
0:21:02 So multimodality allows an AI system to have even more context than from a
0:21:03 single modality, right?
0:21:07 So if one of our customers is monitoring an industrial process, and let’s say
0:21:12 they’re looking for potentially a leak, maybe in an oil and gas facility, that
0:21:16 leak can manifest itself as, yes, you see something, a product that’s dripping
0:21:17 out and you didn’t expect it to.
0:21:23 It also can manifest itself as you heard a noise or maybe there’s something
0:21:27 about the time dimension of the video that you’re watching as another modality
0:21:28 beyond just the individual images.
0:21:33 Right. And those additional dimensions of data allow the system that you’re
0:21:35 building to have more intelligence.
0:21:39 And so that’s why you see kind of like all these modalities crashing together.
0:21:42 And what it does is it enables our customers to have even more context.
0:21:47 The way we’ve thought about that is we’ve actually been built on and using
0:21:50 multimodality as early as 2021.
0:21:54 So in 2021, there was a model that came out from open AI called the clip,
0:21:57 contrastive language image per training, which introduced this idea of training
0:21:59 on 400 million image text pairs.
0:22:03 Can we just associate some words of text with some images?
0:22:06 What this really unlocked for our customers was the ability to do semantic
0:22:10 search, like I could just describe a concept and then I can get back the images
0:22:11 from a video frame or from a given image.
0:22:15 Now it’d be interesting for me for the purposes of building out my model.
0:22:20 Increasingly, we’ve been excited by increases of models that have more
0:22:22 multimodal capabilities on day one.
0:22:26 That comes with its own form of challenges, though, the data preparation,
0:22:30 the evaluation systems, the incorporation of those systems into the other parts
0:22:32 of the pipeline that you’re building.
0:22:36 And so where there’s opportunity to have even more intelligence, there’s also
0:22:41 challenge to incorporating that intelligence, adapting it to your context,
0:22:42 passing it to other sorts of systems.
0:22:47 And so Roboflow and being deep believers in multimodal capabilities very early
0:22:53 on have continued to make it so that users can capture, use and process other
0:22:54 modalities of data.
0:22:58 So for example, we support the ability for folks to use vision language models,
0:23:01 BLMs, in the context of problems they’re working, which is typically like
0:23:02 an image text pair.
0:23:08 So if you’re using, you know, Quen VL 2.5, which came out last week, or
0:23:11 Florence 2 for Microsoft, which came out maybe about six months ago, or PolyGemma
0:23:16 2 from Google, these are all multimodal models that have very rich text
0:23:20 understandings and have visual understandings, which makes them very good at,
0:23:22 for example, document understanding.
0:23:26 Like if you just pass a document, there’s both text in the document and a
0:23:27 position in the document.
0:23:30 And so Roboflow is one of the only places, maybe the only place where you can
0:23:35 fine tune and adapt, say, Quen VL today, which means preparing the data and running
0:23:37 it in the context of the rest of your systems.
0:23:40 And those sorts of capabilities, I think, should only increase and enable our
0:23:44 customers to get more context more quickly from the types of problems that
0:23:44 they’re solving.
0:23:44 Right.
0:23:48 So I think a lot of these things kind of like are crashing together into just
0:23:52 like AI, like amorphous AI that like has all these capabilities, like you’d expect
0:23:57 it, but as that happens, what’s important is there’s actually still unique parts
0:24:00 of visual needs, right?
0:24:03 Like visual needs require visual tooling in our opinion.
0:24:05 Like you want to see, you want to validate.
0:24:09 You need to do, you know, the famous adage of a picture being worth a thousand
0:24:11 words is extremely instructive here.
0:24:14 Like you almost can’t anticipate all the ways that the world’s going to look
0:24:17 different than how you drew it up.
0:24:21 Like self-driving cars are kind of this example one-on-one where, yeah, you think
0:24:24 you can drive, like you have a very simple way of describing what the world’s
0:24:25 going to look like, but I don’t know.
0:24:29 Like let’s take a very narrow part of a self-driving car, stop signs, right?
0:24:31 So go stop signs look universal.
0:24:32 They’re always octagons.
0:24:35 They’re red and they’re really well-mounted on the right side of streets.
0:24:39 Well, what about a school bus where the stop sign kind of flips off where it
0:24:43 comes on or what about like a gate of where like the stop signs mounted on a
0:24:44 gate and the gate could open and close?
0:24:48 And pretty soon you’re like, wait a second, there’s a lot of cases where a
0:24:50 stop sign isn’t really just a stop sign.
0:24:55 And seeing those cases and triaging and debugging and validating, we think
0:24:59 inherently calls for some specific needs for processing the visual information.
0:25:04 And so we’re laser focused on enabling our customers to benefit from as many
0:25:08 modalities as help them solve their problem while ensuring the visual
0:25:11 dimension in particular is best capitalized on.
0:25:12 Right.
0:25:17 And does that, and I may be showing the limits of my technical understanding here.
0:25:19 So, you know, have added if so.
0:25:25 But does that exist as, you know, RoboFlow creating these, you know, sort of, as
0:25:30 you said, amorphous AI all crash together models that have this focus and these
0:25:31 sort of advanced visual capabilities?
0:25:38 Or is it more of a like chaining a RoboFlow specific model, you know, onto other models?
0:25:43 Commonly you’re in a position where you’re chaining things together or you
0:25:46 wanted things to work in your context or you wanted to work in a compute
0:25:47 constrained environment.
0:25:51 Okay, so visions, visions pretty unique in that unlike language and a lot of
0:25:55 other places where AI exists, actually vision is almost where humans are not.
0:25:58 Basically, like you want to observe parts of the world where a person is present.
0:26:02 Like if you return to our example of like an oil and gas facility where
0:26:06 you’re monitoring pipelines, I mean, there’s tens of thousands of miles of pipeline
0:26:09 and you’re certainly not going to have a person stationed every hundred yards
0:26:11 along it’s just an S9 idea.
0:26:15 And so instead you could have a video theater visual understanding of maybe key
0:26:20 points where you’re most likely to have pressure changes and to monitor those
0:26:23 key points, you know, that you’re not necessarily in an internet connected
0:26:27 environment, you’re in an operationally intensive environment that even if you
0:26:30 did have internet and might not make sense to stream the video to the cloud.
0:26:33 So basically where you get to is you’re probably running something at the edge
0:26:36 because it makes sense to co-locate your compute and that’s where like a lot of
0:26:39 our customers, for example, using video Jetsons, they’re very excited about the
0:26:44 digits that was announced at CES to make it so that you can bring these highly
0:26:48 capable models to co-locate alongside where their problem kind of exists.
0:26:50 Now, why does that matter?
0:26:54 That matters because you can’t always have the largest, most general model
0:26:56 running in those environments at real time.
0:27:00 I think this is part of, you know, a statement of like the way the world
0:27:04 looks today versus how we’ll look at 24, 36 and 48 months.
0:27:07 But I do think that over time, even as model capabilities advance and you can
0:27:10 get more and more distilled at the edge, there’s I think always going to be
0:27:14 somewhat of a lag between if I’m operating in an environment where I’m
0:27:17 fully compute unbounded or these comparatively unbounded in the cloud
0:27:20 versus an environment where I am a bit more compute bounded.
0:27:25 And so that capability gap requires specialization and capability to work best
0:27:27 for that domain context problem.
0:27:31 So a lot of Roboflow users and a lot of customers and a lot of deployments tend
0:27:34 to be in environments like those, not all, but certainly some.
0:27:39 All right, shift gears here for a moment before we wrap up.
0:27:41 Joseph, you’re a multi-time founder, correct?
0:27:45 Yeah, maybe to kind of set this up, you can just kind of run through a little
0:27:48 bit your experience as an entrepreneur.
0:27:49 What was the first company you founded?
0:27:53 Well, the very first company was a T-shirt business in high school.
0:27:56 Nice. I don’t know that it was founded, there’s never an LLC.
0:27:58 I don’t even know my parents knew about it.
0:28:02 But there is that in a university.
0:28:08 I ran a satirical newspaper and sold ads on the ad space for it and date myself here.
0:28:11 But Uber was just rolling out to campuses at that time.
0:28:14 So I had my Uber referral code and had like free Ubers for a year for like all
0:28:16 the number of folks that discovered it.
0:28:20 I kind of joked my first company that maybe the closest thing to a real business
0:28:24 beyond these side projects was a business that I started my last year of university
0:28:27 and ran for three years before a larger company acquired it.
0:28:29 And I went to school in Washington, D.C.
0:28:34 I had interned on Capitol Hill once upon a time and I was working at Facebook
0:28:38 my last year of university and was brought back to Capitol Hill and realized
0:28:41 that like a lot of the technical problem or a lot of the problems,
0:28:44 operational problems that could be solved with technology still existed.
0:28:48 One of those is Congress gets 80 million messages a year and interns sort through
0:28:51 that mail. And this was, you know, 2015.
0:28:55 So we said, Hey, what if we use natural language processing to accelerate
0:28:57 the rate at which Congress hears from its constituents?
0:29:01 And in doing so, we improve the world’s most powerful democracies,
0:29:02 customer success center.
0:29:07 And so that grew into business that I ran for about three years and we had a tight
0:29:10 integration with another product that was a CRM for these congressional offices
0:29:13 and that company called Fireside 21 acquired the business and rolled it out
0:29:15 to all of their their customers.
0:29:19 That was a bootstrap company, you know, as nine employees at PE can relatively
0:29:23 mission driven thing that we wanted to build and solve a problem that we knew
0:29:26 should be solved, which is improving the efficacy of Congress.
0:29:28 How big is Roboflow?
0:29:28 How many employees?
0:29:31 Well, I tell the team, whenever I answer that question, I start with,
0:29:33 we’ve helped a million developers so far.
0:29:37 So that’s how that’s how big we are team wise, team wise.
0:29:40 Team doesn’t necessarily mean, you know, come in any, any number of things.
0:29:43 Yeah, yeah, we’re growing quickly.
0:29:43 Excellent.
0:29:49 As we’re recording this and this one’s going to get out before GTC 2025 coming
0:29:52 up in mid-March down in San Jose, as always.
0:29:54 And Joseph Roboflow is going to be there.
0:29:55 Yeah, we’ll be there.
0:29:56 I mean, GTC has become the Super Bowl of AI.
0:29:57 Right.
0:30:01 Any hints, any teasers you can give of what you’ll be showing off?
0:30:05 We have a few announcements of some things that we’ll be releasing.
0:30:08 I can give listeners a sneak peek to a couple of them.
0:30:13 One thing that we’ve been working pretty heavily on is the ability to chain models
0:30:16 together, understand their outputs, connect to other systems.
0:30:21 And from following our customers, it turns out what we kind of built is a system
0:30:26 for building visual agents and increasingly as there’s a strong drive around
0:30:29 agentic systems, which is, you know, more than just a model.
0:30:33 It’s also memory and action and tool use and loops.
0:30:38 Users can now create and build and deploy visual agents to monitor a camera feed
0:30:42 or process a bunch of images or make sense of any sort of visual input in a
0:30:45 very streamlined, straightforward way using our open source tooling in a
0:30:46 loginless way.
0:30:50 And so that’s one area that we’re excited to show more about soon.
0:30:55 In partnership with NVIDIA and the inception program, we’re actually
0:30:59 releasing a couple of new advancements in the research field.
0:31:03 So without giving exactly what those are, I’ll give you some parameters of what
0:31:09 to expect at CBPR in 2023, Robofo released something called RF 100, which
0:31:12 the premise is for computer vision to realize its full potential, the models
0:31:15 need to be able to understand novel environments.
0:31:15 Right.
0:31:18 So if you think about a scene, you think about maybe people on a restaurant
0:31:21 or you think about like a given football game or something like this.
0:31:22 Yeah, yeah.
0:31:24 But the world is much bigger than just where people are.
0:31:25 Like you have like documents to understand.
0:31:27 You have aerial images.
0:31:28 You have things under microscopes.
0:31:30 You have agricultural problems.
0:31:31 You have galaxies.
0:31:36 You have digital environments and RF 100, which we released is sampling
0:31:40 from the Robofo universe, a basket of a hundred data sets that allows
0:31:45 researchers to benchmark how well does my model do in novel contexts.
0:31:47 And so we really sat in 23.
0:31:52 And since then labs like Facebook, Apple, Baidu, Microsoft, NVIDIA, Omniverse
0:31:55 team have benchmarked on what is possible.
0:31:59 Now, universe has, the Robofo universe has grown precipitously since then as
0:32:02 I have the types of challenges that people are trying to solve with computer
0:32:07 vision. And so we’re ready to show what the next evolution of advancing
0:32:11 visual understanding and benchmarking understanding might look like at GTC.
0:32:16 And then a second thing we’ve been thinking a lot about is the advent of
0:32:21 transformers and the ability for models to have really rich pre trainings
0:32:24 allows you to kind of start at the end, so to speak, with a model and its
0:32:31 understanding, but that hasn’t fully made its way as impactful as it can to vision,
0:32:35 meaning like how can you use a lot of the pre trained capabilities and especially
0:32:37 to vision models running on the edge.
0:32:41 And so we’ve been pretty excited about how do you marry the benefits of
0:32:45 pre trained models, which allow you to generalize better with the benefits of
0:32:46 running things real time.
0:32:51 And so actually this is where NVIDIA and Robofo have been able to pair up
0:32:55 pretty closely on something that we’ll introduce and I’ll leave it at that for
0:32:57 folks to see and do an interesting to learn more.
0:33:00 All right, I’m signed up.
0:33:01 I’m interested, can’t wait.
0:33:05 So you’ve done this a few times and you know, one way or another, I’m sure
0:33:09 you’ll do it again going forward and you know, scaled up and all that good stuff.
0:33:14 Lessons learned advice you can share, you know, for founders, for people out
0:33:18 there thinking about and you know, whether it’s CV related or not.
0:33:19 What does it take?
0:33:22 What goes into being, you know, being a good leader, building a business,
0:33:27 taking an idea, seeing it through to a product that, you know, serves humans
0:33:29 as well as solving a problem.
0:33:32 What wisdom can you drop here on listeners thinking about their own
0:33:34 entrepreneurial pursuits?
0:33:36 One thing that I’ll know is you said you’ll do it again.
0:33:40 I’m actually very vocal about the fact that Robofo is the last company
0:33:44 that I’ll ever need to start like the lifetime’s worth of work by itself.
0:33:48 As soon as I said it, I was like, I don’t know that.
0:33:49 He doesn’t know that.
0:33:50 And what if that comes off?
0:33:52 Like Roboflow is not going to, I was thinking about, oh, your last
0:33:55 company got acquired and so on and so forth, but that’s great.
0:33:59 I mean, that’s like in and of itself, you know, I suppose that could be
0:34:03 turned into something of a motto for aspiring entrepreneurs or what have you.
0:34:07 But that’s instructive actually for your question because I think a lot of people,
0:34:11 you know, you should think about the mission and the challenge that you’re,
0:34:13 you know, people say commonly like, oh, you’re marrying yourself to you
0:34:17 for 10 years, but I think even that is perhaps too short of a time horizon.
0:34:21 It’s what is something that you like a promise face that you can work on
0:34:25 excitedly in the world is different as a result of your efforts.
0:34:28 I will also note that, you know, what does it take?
0:34:29 How does it figure it out?
0:34:30 I’m still figuring it out myself.
0:34:32 There’s like new stuff to learn every single day.
0:34:37 And I can’t wait for like every two years when I look back and just sort
0:34:40 of cringe at the ways that I did things at that point in time.
0:34:44 But I think that, you know, the attributes that allow people to do well in startups,
0:34:49 whether they’re working in one, starting one, interacting with one is a deep sense
0:34:55 of grit and diligence and passion for the thing that you’re you’re working on.
0:35:00 Like the world doesn’t change by itself and it’s also quite malleable place.
0:35:05 And so having the wherewithal and the aptitude and the excitement and vigor
0:35:12 to shape the world the way by which one thinks is possible requires a lot of drive
0:35:16 and determination. And so, you know, it’s work with people, work in environments,
0:35:22 work on problems where if you have that problem changed with that team and the
0:35:27 result that that company that you’re working with continues to be realized.
0:35:28 What does that world look like?
0:35:29 Does that excite you?
0:35:33 And does it give you the ability to say independently, I would want to day in
0:35:37 and day out, give it my best to ensure and realize the full potential here.
0:35:41 And when you start to think about your time that way of something that is a
0:35:46 mission and important and time that you want to enjoy with the team, with the
0:35:50 customers, with the problems to be solved, the journey becomes the destination
0:35:51 in a lot of ways.
0:35:53 And so that allows you to play infinite games.
0:35:57 It allows you to just be really focused on the key things that matter and
0:36:01 delivering customer value and making products people love to use.
0:36:03 And so I think that’s fairly universal.
0:36:06 Now, in terms of specific advice, one things or another, there’s a funny
0:36:11 paradox of like advice needs to be adjusted to the prior of one situation.
0:36:14 It’s almost like the more universally useful the piece of advice is perhaps
0:36:17 like the less novel and insightful it might be.
0:36:17 Right.
0:36:21 Here I’ll note that I pretty regularly learn from those that are a few stages
0:36:23 ahead of me and aim to pay that favor forward.
0:36:27 So I’m always happy to be a resource for folks that are building or navigating
0:36:30 career decisions or thinking about what to work on and build next.
0:36:32 So I’m pretty findable online and welcome that from listeners.
0:36:33 Fantastic.
0:36:38 So let’s just go with that segue then for folks listening who want to learn
0:36:43 more about Roboflow, want to try Roboflow, want to hit you up for advice
0:36:45 on working at or with a startup.
0:36:47 Where should they go online?
0:36:50 Company sites, social medias, where can listeners go to learn more?
0:36:54 Roboflow.com is where you can sign up from the build of the platform.
0:36:55 We have a careers page.
0:36:59 If you’re generally interested in startups, work@astardup.com is YC’s
0:37:02 job support and we’ve hired a lot of folks from there.
0:37:03 So that’s a great resource.
0:37:10 I’m accessible online on Twitter of our ex @JosephofIowa and regularly share
0:37:11 a bit about what we’re working on.
0:37:13 And I’m very happy to be a resource.
0:37:16 If you’re in San Francisco and you’re listening to this, you might be surprised
0:37:18 that sometimes I’ll randomly tweet out when we’re welcoming folks to come
0:37:21 co-work out of our office on some Saturdays and Sundays.
0:37:23 So feel free to reach out.
0:37:24 Excellent.
0:37:25 Just Nelson, Roboflow.
0:37:27 This is a great conversation.
0:37:29 Thank you so much for taking the time.
0:37:33 And, you know, as you well articulated, the work that you and your teams are
0:37:39 doing is not only fascinating, but it applies to so much of what we do on
0:37:40 the earth, right, and beyond the earth.
0:37:45 So all the best of luck in everything that you and your growing community are doing.
0:37:46 Really appreciate it.
0:37:48 [MUSIC PLAYING]
0:37:50 [MUSIC PLAYING]
0:37:52 [MUSIC PLAYING]
0:37:54 [MUSIC PLAYING]
0:37:56 [MUSIC PLAYING]
0:37:58 [MUSIC PLAYING]
0:38:00 [MUSIC PLAYING]
0:38:02 [MUSIC PLAYING]
0:38:04 [MUSIC PLAYING]
0:38:06 [MUSIC PLAYING]
0:38:08 [MUSIC PLAYING]
0:38:10 [MUSIC PLAYING]
0:38:12 [MUSIC PLAYING]
0:38:14 [MUSIC PLAYING]
0:38:16 [MUSIC PLAYING]
0:38:18 [MUSIC PLAYING]
0:38:22 [MUSIC PLAYING]
0:38:24 [MUSIC PLAYING]
0:38:26 [MUSIC PLAYING]
0:38:28 [MUSIC PLAYING]
0:38:30 [MUSIC PLAYING]
0:38:32 [MUSIC PLAYING]
0:38:34 [MUSIC PLAYING]
0:38:36 [MUSIC PLAYING]
0:38:46 [BLANK_AUDIO]

Joseph Nelson, co-founder and CEO of Roboflow, discusses how the company is making computer vision accessible to millions of developers and industries, from manufacturing to healthcare and more.

Learn. Share. Evolve…

Leave a Reply Cancel reply

More posts

No Mercy / No Malice: Last Laugh

The President’s Golden Share in U.S. Steel

The President’s Golden Share in U.S. Steel

642. How to Wage Peace, According to Tony Blinken