Roboflow Simplifies Computer Vision for Developers and the Enterprise – Ep. 248

Leave a Reply

AI transcript
0:00:10 [MUSIC]
0:00:13 Hello, and welcome to the NVIDIA AI podcast.
0:00:15 I’m your host, Noah Kravitz.
0:00:19 90% of the information transmitted to human brains is visual.
0:00:22 So while advances related to large language models and
0:00:26 other language processing technology have pushed the frontier of AI forward in
0:00:28 a hurry over the past few years.
0:00:32 Visual information is integral for AI to act with the physical world,
0:00:34 which is where computer vision comes in.
0:00:37 Roboflow empowers developers of all skill sets and
0:00:41 experience levels to build their own computer vision applications.
0:00:44 The company’s platform addresses the universal pain points developers face
0:00:47 when building CV models, data management to deployment.
0:00:51 Roboflow is currently used by over 16,000 organizations and
0:00:55 half the Fortune 100, totaling over 1 million developers.
0:00:59 And they’re a member of NVIDIA’s Inception program for startups.
0:01:03 Roboflow co-founder and CEO, Joseph Nelson, is with us today to talk about
0:01:08 his company’s mission to transform industries by democratizing computer vision.
0:01:09 So let’s jump into it.
0:01:13 Joseph, welcome, and thank you for joining the NVIDIA AI podcast.
0:01:14 >> Thanks so much for having me.
0:01:19 >> I’m excited to talk CV, there’s nothing against language, love language.
0:01:22 But there’s been a lot of language stuff lately, which is great.
0:01:25 But I’m excited to hear about Roboflow.
0:01:26 So let’s jump into it.
0:01:30 Maybe you can start by talking a little bit more about your mission and
0:01:34 what democratizing computer vision means and making the world programmable.
0:01:37 >> At the highest level, as you just described,
0:01:40 the vast majority of information that humans process happens to be visual
0:01:41 information.
0:01:45 In fact, I mean, humans, we had our sense of sight before we even created language.
0:01:46 It’s how we understand the world.
0:01:50 It’s how we understand the things around us, it’s how we engage with the world.
0:01:56 And because of that, I think there’s this massive untapped potential
0:01:59 to have technology and systems have visual understanding to
0:02:04 a similar way that humans do all across, really, we say the universe.
0:02:07 So when we say our North Star is to make the world programmable,
0:02:09 what we really mean is that any scene and
0:02:13 anything will have software that understands it.
0:02:14 And when you have software that understands something,
0:02:16 you can improve that system.
0:02:18 You can make it be more efficient, you can make it be more entertaining,
0:02:20 you can make it be more engaging.
0:02:23 I mean, at Roboflow, we’ve seen folks build things from understanding
0:02:27 cell populations under a microscope all the way to discovering new galaxies
0:02:28 through a telescope.
0:02:31 And everything in between is where vision and video and
0:02:33 understanding comes into play.
0:02:36 So if this AI revolution is to reach its full potential,
0:02:38 it needs to make contact with the real world.
0:02:42 And it turns out the real world is one that is very visually rich and
0:02:43 needs to be understood.
0:02:47 So we build the tools, the platform, and the community to accelerate that transition.
0:02:51 So maybe you can tell us a little bit about the platform and
0:02:55 kind of within that how your mission and North Star kind of shape the way you
0:02:57 develop products and build out user experiences.
0:03:02 And I should mention a great shout out from Jensen in the CES keynote earlier
0:03:04 this year for you guys.
0:03:06 And you raised Series B late last year.
0:03:10 So I want to congratulate you on that as well before I forget.
0:03:10 >> I appreciate it.
0:03:12 Yeah, it’s a good start.
0:03:15 I mean, as you said, a million developers, but there’s 70 million developers out there.
0:03:18 There’s billions that will benefit from having visual understanding.
0:03:21 I mean, in fact, in that Jensen shout out, just like maybe the sentence or
0:03:25 two before he described some of the visual partners that are fortunate to work
0:03:26 with the NVIDIA team.
0:03:30 He described that the way NVIDIA sees it is that global GDP is $100 trillion,
0:03:35 and he describes that visual understanding is like $50 trillion of that opportunity.
0:03:39 So basically half of all global GDP is predicated on these
0:03:43 operationally intensive, visual, centric, autonomy-based sorts of use cases.
0:03:47 And so, the level of impact that visual understanding will have in the world is
0:03:50 just a fraction of what it will look like as we progress.
0:03:53 Now, in terms of like how we think about doing that, so
0:03:56 it’s really about empowering the builders and giving the certainty and
0:03:58 capability to the enterprise.
0:04:02 So for example, anyone that’s building a system for visual understanding often
0:04:07 needs to have some form of visual input, camera, video, something like this.
0:04:08 >> Sure.
0:04:10 >> You need to have a model because the model is going to help you act,
0:04:15 understand, and react to whatever maybe actual insight you want to understand.
0:04:17 And then you want to run that model some more, you want to deploy it.
0:04:21 And commonly, you even want to chain together models or
0:04:23 have a system that triggers some alerts or
0:04:26 some results based on information that it’s understanding.
0:04:29 So Robofill provides the building blocks, the platform, and
0:04:34 the solutions so that over a million developers and half the Fortune 100
0:04:37 have what they need to deploy these tools to production.
0:04:41 And you’re doing it kind of mentioned in the intro.
0:04:45 Trying to make the platform available for folks who are deep into this,
0:04:48 have been doing CV and working with machine learning for a while.
0:04:51 And then also folks who might be new to this, they can get up and
0:04:55 running and work with CV, build that into their toolkit.
0:05:00 >> Yeah, the emphasis has always been kind of on someone that wants to be a builder.
0:05:04 That definition is expanding with the capabilities of code generation,
0:05:05 prompting to apps.
0:05:08 We’ve always kind of been bent on this idea that those that
0:05:11 want to create, use, and distribute software.
0:05:14 What’s funny is that when we very first launched some of our first products,
0:05:16 ML teams initially were kind of like, I don’t know,
0:05:18 this seems pretty pedestrian, I know exactly what to do.
0:05:21 And fast forward now, and it’s like, whoa, a platform that’s fully
0:05:25 featured that has immediate access to the latest models to use on my data in
0:05:28 the contexts of where I can even anticipate.
0:05:31 So it’s been kind of, as the platforms become more feature rich,
0:05:35 we’ve been able to certainly enable a broader swath of both maybe
0:05:37 domain experts of a capability.
0:05:41 But I think broadly speaking, the places that we see the rapid,
0:05:44 most impactful adoption in some ways is actually bringing vision to others that
0:05:46 otherwise may not have had it.
0:05:51 Like what used to be maybe like a multi-quarter PhD thesis level investment
0:05:54 now can be something that a team spends up in an afternoon.
0:05:58 And that really has this kind of demand-begets-demand paradigm.
0:06:01 I mean, for example, one of our customers, they produce electric vehicles.
0:06:05 And when you produce an EV inside their general assembly facility,
0:06:08 there’s all sorts of things that you need to make sure you do correctly as you
0:06:12 produce that vehicle, from the worker safety who are doing the work to
0:06:16 the machines that are outputting, say, when you do what’s called stamping,
0:06:19 where you take a piece of steel or aluminum and you press it into the shape
0:06:23 of the outline of the vehicle, and you get up potential tears or fissures or
0:06:26 when you actually assemble the batteries out of the correct number of screws.
0:06:30 Basically, every part of building a car is about visually
0:06:33 validating that the thing has been built correctly so that when a customer
0:06:36 drives it, they can do so without any cause for pause.
0:06:40 And just a little bit of computer vision goes a really long way in enabling
0:06:44 that company and many others to very quickly accelerate their goals.
0:06:47 In fact, this company had the goal of producing 1,000 vehicles three years ago,
0:06:50 and they barely did it at 1,012 in that year.
0:06:53 And then they scaled up to 25,000 and now 50,000.
0:06:56 And a lot of that is on the backs of having things that they know they’re
0:06:57 building correctly.
0:07:01 And so we’re really fortunate to be a part of enabling things like this.
0:07:04 So it’s kind of like you could think about it of a system that doesn’t have
0:07:07 a sense of visual understanding, adding even just a little bit of visual
0:07:11 context totally rewrites the way by which you manufacture a car.
0:07:14 And that same revolution is going to take place for
0:07:16 lots of operationally intensive processes, but
0:07:19 any kind of place that you interact with the world each day.
0:07:23 >> So kind of along those lines, what to use the phrase untapped opportunities
0:07:27 you see out there, what’s the low hanging fruit or maybe the high hanging fruit,
0:07:30 but you’re just excited about when it comes to developing and
0:07:33 deploying computer vision applications.
0:07:37 And we’ve been talking about it, but obviously talk about robo flows.
0:07:40 Work not just supporting, but helping developers kind of unlike helping
0:07:42 builders unlock what’s next.
0:07:46 >> The amazing thing is actually the expanse of the creativity of developers
0:07:49 and engineers, it’s like if you give someone a new capability,
0:07:53 you almost can’t anticipate all the ways by which they’ll bring that capability
0:07:53 to bear.
0:07:57 So for example, I mean, we have folks that hobbyists that’ll make things,
0:08:00 like the number of folks that make things that like measure the size of fish.
0:08:03 Cuz I think they’re trying to prove to their friends that they caught the biggest
0:08:06 fish, and then you separately have like government agencies that have wanted to
0:08:09 validate the size of salmon during migration patterns.
0:08:12 And so this primitive of like understanding size of fish both has
0:08:14 what seems to be fun and very serious implications.
0:08:17 Or folks that I don’t know, like a friend of mine recently was like,
0:08:21 hey, I wonder how many cars out in San Francisco are actually Waymo’s,
0:08:22 versus like other sorts of cars.
0:08:23 And what does that look like?
0:08:24 What does that track over time?
0:08:28 And so they had a pretty simple like Raspberry Pi camera parked down their
0:08:31 windowsill and in an afternoon, now they have a thing that’s counting,
0:08:34 tracking, and keeping a tabulation on how many self-driving cars are making
0:08:37 their way on the road, at least sampled in front of their house each day.
0:08:40 >> Right, no, I don’t wanna call anybody out, but that’s not the same person
0:08:45 who had the video of all the Waymo’s in the parking lot in the middle of the
0:08:46 night in San Francisco in circles.
0:08:47 No, okay.
0:08:49 >> It wasn’t that guy.
0:08:53 >> Yeah, but I mean, like the use cases are expansive because I don’t know,
0:08:57 like the way we got into this, right, is like we were making AR apps, actually.
0:09:00 And computer vision was critical to the augmented reality,
0:09:02 understanding the system around us.
0:09:04 And we’ve since had folks that make, you know,
0:09:08 board game understanding, technology, D&D dice counters,
0:09:10 telling you the best first movie you should play in Catan.
0:09:12 And so basically, like you have this like creative population of folks.
0:09:15 Or this one guy during the pandemic, you know, he’s really bored, locked inside.
0:09:19 And he thought maybe his cat needed to get some more exercise.
0:09:22 And he created this system that like attached a laser pointer to a robotic arm.
0:09:26 And with a little bit of vision, he made it so the robotic arm consistently points
0:09:28 the laser pointer 10 feet away from the cat.
0:09:31 That’s like jumping around the living room and makes this whole YouTube tutorial.
0:09:33 But then like the thing that’s really interesting, right, is that like,
0:09:37 you know what technology has arrived when like a hacker can just like build
0:09:40 something in a single setting or maybe like in a weekend.
0:09:46 You know that what used to be this far difficult to access capability is now
0:09:50 broadly accessible. And that fuels like a lot of like the similar sort of enterprise
0:09:53 use cases. Like we got to have a joke at Rebelflow that like one person’s
0:09:56 hobbyist projects is another person’s entire business.
0:09:59 So the low hanging fruit, frankly, is like everywhere around us.
0:10:02 Like any sort of visual feed is untapped.
0:10:05 This sort of images that someone might be collection and gathering.
0:10:07 I mean, the similar things that give rise to machine learning certainly apply
0:10:11 to vision where the amount of visual inputs doubling year on year and petabytes
0:10:14 of visual information to be extracted.
0:10:16 So it’s kind of like, if you think about it, you can do it.
0:10:21 That makes me think of an episode we did recently with a surgeon who founded
0:10:26 a surgical data collective and they were using just these, I say stacks.
0:10:28 They weren’t actually videotapes, I’m sure.
0:10:33 But all of this unwatched unused footage from surgeries to train a model
0:10:36 to help train surgeons how to do their jobs better.
0:10:40 But if you want to ask, are the inputs that could, the visual inputs that can go
0:10:43 into RoboFlow, it doesn’t have to be a live camera stream.
0:10:46 You can also use, you know, archive footage images.
0:10:47 That’s correct.
0:10:47 Yep.
0:10:48 Yep.
0:10:51 So someone may have like a backlog of a bunch of videos.
0:10:54 I mean, for example, we actually had a professional baseball team where
0:10:58 they had a history of a bunch of their videos of pitching sessions and they
0:11:03 wanted to run through and do a key point model to identify various poses and time
0:11:06 of release of the pitch and how does that impact someone’s performance over time.
0:11:09 And so they had all these videos that from the past that they wanted to run through.
0:11:12 And then pretty soon they started to do this for like minor leagues where you
0:11:15 might not have scouts, omnipresent, and you certainly don’t have broadcasts.
0:11:16 Right, right, right.
0:11:19 You just have like this like kind of low quality footage on like cameras from
0:11:22 various places and being able to produce like sports analytics out of, you know,
0:11:26 this information that’s just locked up otherwise and this unstructured visual
0:11:30 capture is now available for, in this case, building a better baseball team.
0:11:31 Yeah, that’s amazing.
0:11:36 Building community, you know, is something that is both vital to a lot
0:11:39 of companies like tech companies and developer platforms and such.
0:11:43 But it also can be a really hard thing to do, especially to build, you know,
0:11:48 an organic, genuine, robust community serving enterprise clients, let alone
0:11:54 across this seemingly endless sort of swath of industries and use cases and such.
0:11:57 You know, also pretty resource intensive.
0:11:58 So how do you balance both?
0:12:02 How is Roboflow approaching, you know, building that community that you’re
0:12:08 talking about just now with serving, you know, these I’m sure demanding in a good
0:12:10 way, but, you know, demanding enterprise clients.
0:12:13 I think the two actually go hand in hand more than many would anticipate.
0:12:14 Okay.
0:12:18 When you build a community and you build a large set of people that are
0:12:23 interested in creating and using a given platform, you actually give a company
0:12:26 leverage basically like the number of people that are building, creating and
0:12:30 sharing examples of Roboflow from a very early day made us seem probably much
0:12:33 bigger than maybe we were or are.
0:12:36 And so that gives a lot of trust to enterprises.
0:12:40 Like, you know, you want to use something that has gone through its paces and
0:12:43 battle tested something that might be like an industry standard.
0:12:47 And you don’t become an industry standard by only limiting your technology to a
0:12:49 very small swath of people.
0:12:53 You enable anyone to kind of build, learn the paradigm and create.
0:12:58 Now, you’re right that both take a different type of thoughtfulness to be
0:12:59 able to execute on.
0:13:03 So in the context of like community building and making products for developers.
0:13:06 A lot of that I think stems from, you know, as an engineer, there’s
0:13:09 products that I like using in the ways that I like to use those products.
0:13:12 And I want to enable others to be able to have a similar experience of the
0:13:12 products that we make.
0:13:16 So it’s, it’s things like providing value before asking for value, having a very
0:13:19 generous free chair, having the ability to highlight the top use cases.
0:13:22 I mean, we have a whole like research plan where if someone’s doing stuff on a
0:13:26 .edu domain, then they have increased access to GPUs.
0:13:30 Roboflow has actually given away over a million dollars of compute and GPU usage
0:13:33 for open source computer vision projects.
0:13:37 And, you know, we actually have this, it’s kind of a funny stat, but 2.1
0:13:40 research papers are published every day, citing Roboflow.
0:13:43 And those are things like people are doing all these sorts of things.
0:13:44 That’s a super cool stat, I think.
0:13:48 Yeah, I mean, it just gives it, it just gives you the context to like, yeah,
0:13:52 like that is someone’s maybe six or 12 month thesis that they’ve spent trying
0:13:57 to inject and to be able to empower folks to realize what’s possible.
0:14:01 And it’s really kind of the fulfillment of like our mission at its core, like
0:14:05 the impact of visual understanding is bigger than anyone company and anything
0:14:09 that we can do to allow the world to see, expose and deploy that is important.
0:14:12 Now, on the enterprise side, what we were really talking about is building
0:14:15 a successful kind of go to market motion and making money to invest
0:14:19 further in our mission and enterprises, as you alluded, are very resource
0:14:23 intensive in terms of being able to service those needs successfully.
0:14:26 Even there, though, you actually get leveraged by seeing the sorts of
0:14:30 problems, seeing the sorts of fundamental building blocks and then productizing
0:14:33 those building blocks, you know, there have been companies that have come
0:14:37 before Roboflow who have done a great job of be very hands on with enterprise
0:14:40 customers and productizing those capabilities, company like Pivotal or
0:14:45 Palantir or these large companies that have gone from, hey, let’s kind of do
0:14:48 like a bespoke way of making something possible and deploy it more broadly.
0:14:52 Now, we’re not fully, you know, like for like with those businesses.
0:14:56 I more give that as an example to show as someone that is building tooling
0:15:00 and capabilities, worst case is you’re giving the enterprise substantially
0:15:03 more leverage and certainly best case is there’s actually a symbiotic
0:15:06 relationship between enterprises being able to discover how to use the
0:15:10 technology, be able to find guides from the community, be able to find models
0:15:11 they want to start from.
0:15:15 I mean, Roboflow Universe, which is the open source collection of data sets
0:15:18 and models, is the largest collection of computer vision projects on the web.
0:15:22 There’s about 500 million user labeled and shared images and over 200,000
0:15:23 pre-trained models.
0:15:26 And that’s used for the community just as much as enterprise, right?
0:15:28 Like when you say enterprise, like enterprise is people.
0:15:31 And so there’s people inside those companies that are creating and building
0:15:32 some of those capabilities.
0:15:35 Now, operationalizing and ensuring that we deliver the service quality, it’s
0:15:39 just the types of teams you build and the way that you prioritize companies
0:15:39 to be successful.
0:15:42 But we’re really fortunate that, you know, we’re not writing the playbook here.
0:15:47 There’s been a lot of companies that, you know, Mongo or Elastic or Twilio or
0:15:51 lots of post IPO businesses that have shown the pathway to both building
0:15:55 really high quality products that developers and builders love to use
0:15:59 and ensuring that they’re enterprise ready and meeting the needs of high
0:16:02 scale, high complexity, high value use cases.
0:16:04 So you use the word complexities.
0:16:09 And, you know, one of the things that I hear all the time from, and I’m
0:16:12 sure you more than me from people are trying to build anything is sort of how
0:16:17 do you balance, you know, creativity and coming up with ways to solve problems.
0:16:20 And particularly if you get into kind of a unique situation and you need to find
0:16:24 a creative answer with things, with not letting things get too complex.
0:16:27 And, you know, in something like computer vision, I’m sure the technical
0:16:30 complexities can, can spin up in a hurry.
0:16:33 What’s been, you know, your approach and what success is?
0:16:35 How have you found success and balancing that?
0:16:40 Complexity for a product or global flow is always a balance.
0:16:44 You want to offer users the capability in the advanced settings and the ability
0:16:46 to make things their own.
0:16:50 Well, also part of the core value proposition is simplifying.
0:16:53 And so you often think, oh man, those two things must be at odds.
0:16:55 How do you simplify something, but also serve complexity?
0:17:00 And in fact, they’re not, especially for products like Roboflow, where it is
0:17:03 four builders, you make it very easy to extend.
0:17:07 You make it very interoperable, meaning there’s open APIs and open SDKs, where
0:17:11 if there’s a certain part of the tool chain that you want to integrate with or
0:17:13 there’s a certain specific enterprise system where you want to read or write
0:17:16 data to, that’s all supported on day one.
0:17:19 And so if you try to kind of boil the ocean of being everything in the
0:17:24 platform all at once on day one, then you can find yourself in a spot where you
0:17:28 may not be able to service the needs of your customers well.
0:17:30 In fact, it’s a bit more step by step.
0:17:33 And that’s where, you know, the devil’s in the details of execution of which
0:17:36 steps you pick first, which sort of problems you best nail for your customers.
0:17:39 But philosophically, it’s really important to us that, for example, when
0:17:43 someone is building a workflow, which is, you know, the combination of a model
0:17:47 and some inputs and outputs, you might ingest maybe like an RTSP stream from
0:17:49 like a live feed of a camera.
0:17:51 Then you might have like a first model that’s, once you’re the problem that
0:17:57 we’re solving is we’re an inventory company and we’re concerned about worker safety.
0:18:00 You might have a first model that’s just like constantly watching all frames to
0:18:03 see if there’s a presence of a person, a very lightweight model kind of run in
0:18:04 the edge.
0:18:07 And then maybe a second model of when there’s a person, you ask a large vision
0:18:10 language model, large VLM, Hey, is there any risks here?
0:18:11 Is there anything to consider?
0:18:13 Should we like look more closely at this?
0:18:17 And then after the VLM, you might have a another specific model that’s going to
0:18:22 do validation of the type of danger that is interesting, or maybe the specific
0:18:25 area within your, your store, maybe you’re going to connect to another
0:18:26 database that exists.
0:18:29 And then like, based on that, you’re going to write some results somewhere.
0:18:32 And maybe you’re going to write that result to have insights of how frequently
0:18:37 there was a cause for concern within the process that you’re monitoring, just as
0:18:40 much as maybe you’re flagging an alert and maybe send in a text or an email or
0:18:43 writing to an enterprise system like SAP to keep track.
0:18:50 And at each step of that juncture, any one of those nodes, since it’s built for
0:18:54 us on an open source platform, which we call inference, you can actually mix
0:18:58 and match, write your own custom Python, write an API in way or one way or another.
0:19:02 And so let’s imagine like a future where someone wanted the ability to write
0:19:04 to a system that we didn’t support yet, like first party.
0:19:05 You’re actually not out of luck.
0:19:09 As long as that system accepts a post request, you’re fine.
0:19:11 And so you have the ability to extend the system.
0:19:15 Yeah. And so it’s this sort of paradigm of like interoperability and making
0:19:17 it easy to use with alongside other tools.
0:19:20 And it gets back to your point around servicing builders, just as much as the
0:19:25 enterprise, I actually think those things are really closely interlinked because
0:19:29 you provide the flexibility and choice and ability to make something mine and
0:19:33 build a competency inside the company of what it is I wanted to create and deploy.
0:19:37 Right. The way you frame that makes a lot of sense and makes that link very clear.
0:19:40 We’re speaking with Joseph Nelson.
0:19:43 Joseph is the co-founder and CEO of Roboflow.
0:19:48 And as he’s been detailing Roboflow provides a platform for builders to use
0:19:50 computer vision in what they’re building.
0:19:55 Joseph, you know, I in the intro sort of alluded to all the all the advances
0:19:58 and buzz around large language models and that kind of thing over the past couple
0:20:02 of years. And I meant to ask Roboflow was founded in 2020.
0:20:05 Roboflow Inc. was incorporated in 2020.
0:20:06 That’s right. Got it.
0:20:09 And so anyway, kind of fast forwarding to, you know, more recently, the past, I
0:20:15 don’t know, six months, year, whatever it’s been, a lot of buzz around agents,
0:20:17 the idea of agentic AI.
0:20:21 And then, you know, there was buzz, I guess, that the word multimodal was being
0:20:26 flung around kind of more frequently, at least in circles I run in for a while.
0:20:29 And then it sort of dropped off just as people, you know, there were the
0:20:33 consumer models that the Clause and ChatGPTs and Geminis and what have you
0:20:37 in the world, just started incorporating visual capabilities, both to, you know,
0:20:43 ingest and understand and then to create visual output, voice models, you know,
0:20:45 now getting into short video clips, all that kind of stuff.
0:20:51 What’s your take on the role of multimodal AI integration when it comes to advancing
0:20:55 CV, you know, how is Roboflow kind of positioned to support this?
0:21:02 So multimodality allows an AI system to have even more context than from a
0:21:03 single modality, right?
0:21:07 So if one of our customers is monitoring an industrial process, and let’s say
0:21:12 they’re looking for potentially a leak, maybe in an oil and gas facility, that
0:21:16 leak can manifest itself as, yes, you see something, a product that’s dripping
0:21:17 out and you didn’t expect it to.
0:21:23 It also can manifest itself as you heard a noise or maybe there’s something
0:21:27 about the time dimension of the video that you’re watching as another modality
0:21:28 beyond just the individual images.
0:21:33 Right. And those additional dimensions of data allow the system that you’re
0:21:35 building to have more intelligence.
0:21:39 And so that’s why you see kind of like all these modalities crashing together.
0:21:42 And what it does is it enables our customers to have even more context.
0:21:47 The way we’ve thought about that is we’ve actually been built on and using
0:21:50 multimodality as early as 2021.
0:21:54 So in 2021, there was a model that came out from open AI called the clip,
0:21:57 contrastive language image per training, which introduced this idea of training
0:21:59 on 400 million image text pairs.
0:22:03 Can we just associate some words of text with some images?
0:22:06 What this really unlocked for our customers was the ability to do semantic
0:22:10 search, like I could just describe a concept and then I can get back the images
0:22:11 from a video frame or from a given image.
0:22:15 Now it’d be interesting for me for the purposes of building out my model.
0:22:20 Increasingly, we’ve been excited by increases of models that have more
0:22:22 multimodal capabilities on day one.
0:22:26 That comes with its own form of challenges, though, the data preparation,
0:22:30 the evaluation systems, the incorporation of those systems into the other parts
0:22:32 of the pipeline that you’re building.
0:22:36 And so where there’s opportunity to have even more intelligence, there’s also
0:22:41 challenge to incorporating that intelligence, adapting it to your context,
0:22:42 passing it to other sorts of systems.
0:22:47 And so Roboflow and being deep believers in multimodal capabilities very early
0:22:53 on have continued to make it so that users can capture, use and process other
0:22:54 modalities of data.
0:22:58 So for example, we support the ability for folks to use vision language models,
0:23:01 BLMs, in the context of problems they’re working, which is typically like
0:23:02 an image text pair.
0:23:08 So if you’re using, you know, Quen VL 2.5, which came out last week, or
0:23:11 Florence 2 for Microsoft, which came out maybe about six months ago, or PolyGemma
0:23:16 2 from Google, these are all multimodal models that have very rich text
0:23:20 understandings and have visual understandings, which makes them very good at,
0:23:22 for example, document understanding.
0:23:26 Like if you just pass a document, there’s both text in the document and a
0:23:27 position in the document.
0:23:30 And so Roboflow is one of the only places, maybe the only place where you can
0:23:35 fine tune and adapt, say, Quen VL today, which means preparing the data and running
0:23:37 it in the context of the rest of your systems.
0:23:40 And those sorts of capabilities, I think, should only increase and enable our
0:23:44 customers to get more context more quickly from the types of problems that
0:23:44 they’re solving.
0:23:44 Right.
0:23:48 So I think a lot of these things kind of like are crashing together into just
0:23:52 like AI, like amorphous AI that like has all these capabilities, like you’d expect
0:23:57 it, but as that happens, what’s important is there’s actually still unique parts
0:24:00 of visual needs, right?
0:24:03 Like visual needs require visual tooling in our opinion.
0:24:05 Like you want to see, you want to validate.
0:24:09 You need to do, you know, the famous adage of a picture being worth a thousand
0:24:11 words is extremely instructive here.
0:24:14 Like you almost can’t anticipate all the ways that the world’s going to look
0:24:17 different than how you drew it up.
0:24:21 Like self-driving cars are kind of this example one-on-one where, yeah, you think
0:24:24 you can drive, like you have a very simple way of describing what the world’s
0:24:25 going to look like, but I don’t know.
0:24:29 Like let’s take a very narrow part of a self-driving car, stop signs, right?
0:24:31 So go stop signs look universal.
0:24:32 They’re always octagons.
0:24:35 They’re red and they’re really well-mounted on the right side of streets.
0:24:39 Well, what about a school bus where the stop sign kind of flips off where it
0:24:43 comes on or what about like a gate of where like the stop signs mounted on a
0:24:44 gate and the gate could open and close?
0:24:48 And pretty soon you’re like, wait a second, there’s a lot of cases where a
0:24:50 stop sign isn’t really just a stop sign.
0:24:55 And seeing those cases and triaging and debugging and validating, we think
0:24:59 inherently calls for some specific needs for processing the visual information.
0:25:04 And so we’re laser focused on enabling our customers to benefit from as many
0:25:08 modalities as help them solve their problem while ensuring the visual
0:25:11 dimension in particular is best capitalized on.
0:25:12 Right.
0:25:17 And does that, and I may be showing the limits of my technical understanding here.
0:25:19 So, you know, have added if so.
0:25:25 But does that exist as, you know, RoboFlow creating these, you know, sort of, as
0:25:30 you said, amorphous AI all crash together models that have this focus and these
0:25:31 sort of advanced visual capabilities?
0:25:38 Or is it more of a like chaining a RoboFlow specific model, you know, onto other models?
0:25:43 Commonly you’re in a position where you’re chaining things together or you
0:25:46 wanted things to work in your context or you wanted to work in a compute
0:25:47 constrained environment.
0:25:51 Okay, so visions, visions pretty unique in that unlike language and a lot of
0:25:55 other places where AI exists, actually vision is almost where humans are not.
0:25:58 Basically, like you want to observe parts of the world where a person is present.
0:26:02 Like if you return to our example of like an oil and gas facility where
0:26:06 you’re monitoring pipelines, I mean, there’s tens of thousands of miles of pipeline
0:26:09 and you’re certainly not going to have a person stationed every hundred yards
0:26:11 along it’s just an S9 idea.
0:26:15 And so instead you could have a video theater visual understanding of maybe key
0:26:20 points where you’re most likely to have pressure changes and to monitor those
0:26:23 key points, you know, that you’re not necessarily in an internet connected
0:26:27 environment, you’re in an operationally intensive environment that even if you
0:26:30 did have internet and might not make sense to stream the video to the cloud.
0:26:33 So basically where you get to is you’re probably running something at the edge
0:26:36 because it makes sense to co-locate your compute and that’s where like a lot of
0:26:39 our customers, for example, using video Jetsons, they’re very excited about the
0:26:44 digits that was announced at CES to make it so that you can bring these highly
0:26:48 capable models to co-locate alongside where their problem kind of exists.
0:26:50 Now, why does that matter?
0:26:54 That matters because you can’t always have the largest, most general model
0:26:56 running in those environments at real time.
0:27:00 I think this is part of, you know, a statement of like the way the world
0:27:04 looks today versus how we’ll look at 24, 36 and 48 months.
0:27:07 But I do think that over time, even as model capabilities advance and you can
0:27:10 get more and more distilled at the edge, there’s I think always going to be
0:27:14 somewhat of a lag between if I’m operating in an environment where I’m
0:27:17 fully compute unbounded or these comparatively unbounded in the cloud
0:27:20 versus an environment where I am a bit more compute bounded.
0:27:25 And so that capability gap requires specialization and capability to work best
0:27:27 for that domain context problem.
0:27:31 So a lot of Roboflow users and a lot of customers and a lot of deployments tend
0:27:34 to be in environments like those, not all, but certainly some.
0:27:39 All right, shift gears here for a moment before we wrap up.
0:27:41 Joseph, you’re a multi-time founder, correct?
0:27:45 Yeah, maybe to kind of set this up, you can just kind of run through a little
0:27:48 bit your experience as an entrepreneur.
0:27:49 What was the first company you founded?
0:27:53 Well, the very first company was a T-shirt business in high school.
0:27:56 Nice. I don’t know that it was founded, there’s never an LLC.
0:27:58 I don’t even know my parents knew about it.
0:28:02 But there is that in a university.
0:28:08 I ran a satirical newspaper and sold ads on the ad space for it and date myself here.
0:28:11 But Uber was just rolling out to campuses at that time.
0:28:14 So I had my Uber referral code and had like free Ubers for a year for like all
0:28:16 the number of folks that discovered it.
0:28:20 I kind of joked my first company that maybe the closest thing to a real business
0:28:24 beyond these side projects was a business that I started my last year of university
0:28:27 and ran for three years before a larger company acquired it.
0:28:29 And I went to school in Washington, D.C.
0:28:34 I had interned on Capitol Hill once upon a time and I was working at Facebook
0:28:38 my last year of university and was brought back to Capitol Hill and realized
0:28:41 that like a lot of the technical problem or a lot of the problems,
0:28:44 operational problems that could be solved with technology still existed.
0:28:48 One of those is Congress gets 80 million messages a year and interns sort through
0:28:51 that mail. And this was, you know, 2015.
0:28:55 So we said, Hey, what if we use natural language processing to accelerate
0:28:57 the rate at which Congress hears from its constituents?
0:29:01 And in doing so, we improve the world’s most powerful democracies,
0:29:02 customer success center.
0:29:07 And so that grew into business that I ran for about three years and we had a tight
0:29:10 integration with another product that was a CRM for these congressional offices
0:29:13 and that company called Fireside 21 acquired the business and rolled it out
0:29:15 to all of their their customers.
0:29:19 That was a bootstrap company, you know, as nine employees at PE can relatively
0:29:23 mission driven thing that we wanted to build and solve a problem that we knew
0:29:26 should be solved, which is improving the efficacy of Congress.
0:29:28 How big is Roboflow?
0:29:28 How many employees?
0:29:31 Well, I tell the team, whenever I answer that question, I start with,
0:29:33 we’ve helped a million developers so far.
0:29:37 So that’s how that’s how big we are team wise, team wise.
0:29:40 Team doesn’t necessarily mean, you know, come in any, any number of things.
0:29:43 Yeah, yeah, we’re growing quickly.
0:29:43 Excellent.
0:29:49 As we’re recording this and this one’s going to get out before GTC 2025 coming
0:29:52 up in mid-March down in San Jose, as always.
0:29:54 And Joseph Roboflow is going to be there.
0:29:55 Yeah, we’ll be there.
0:29:56 I mean, GTC has become the Super Bowl of AI.
0:29:57 Right.
0:30:01 Any hints, any teasers you can give of what you’ll be showing off?
0:30:05 We have a few announcements of some things that we’ll be releasing.
0:30:08 I can give listeners a sneak peek to a couple of them.
0:30:13 One thing that we’ve been working pretty heavily on is the ability to chain models
0:30:16 together, understand their outputs, connect to other systems.
0:30:21 And from following our customers, it turns out what we kind of built is a system
0:30:26 for building visual agents and increasingly as there’s a strong drive around
0:30:29 agentic systems, which is, you know, more than just a model.
0:30:33 It’s also memory and action and tool use and loops.
0:30:38 Users can now create and build and deploy visual agents to monitor a camera feed
0:30:42 or process a bunch of images or make sense of any sort of visual input in a
0:30:45 very streamlined, straightforward way using our open source tooling in a
0:30:46 loginless way.
0:30:50 And so that’s one area that we’re excited to show more about soon.
0:30:55 In partnership with NVIDIA and the inception program, we’re actually
0:30:59 releasing a couple of new advancements in the research field.
0:31:03 So without giving exactly what those are, I’ll give you some parameters of what
0:31:09 to expect at CBPR in 2023, Robofo released something called RF 100, which
0:31:12 the premise is for computer vision to realize its full potential, the models
0:31:15 need to be able to understand novel environments.
0:31:15 Right.
0:31:18 So if you think about a scene, you think about maybe people on a restaurant
0:31:21 or you think about like a given football game or something like this.
0:31:22 Yeah, yeah.
0:31:24 But the world is much bigger than just where people are.
0:31:25 Like you have like documents to understand.
0:31:27 You have aerial images.
0:31:28 You have things under microscopes.
0:31:30 You have agricultural problems.
0:31:31 You have galaxies.
0:31:36 You have digital environments and RF 100, which we released is sampling
0:31:40 from the Robofo universe, a basket of a hundred data sets that allows
0:31:45 researchers to benchmark how well does my model do in novel contexts.
0:31:47 And so we really sat in 23.
0:31:52 And since then labs like Facebook, Apple, Baidu, Microsoft, NVIDIA, Omniverse
0:31:55 team have benchmarked on what is possible.
0:31:59 Now, universe has, the Robofo universe has grown precipitously since then as
0:32:02 I have the types of challenges that people are trying to solve with computer
0:32:07 vision. And so we’re ready to show what the next evolution of advancing
0:32:11 visual understanding and benchmarking understanding might look like at GTC.
0:32:16 And then a second thing we’ve been thinking a lot about is the advent of
0:32:21 transformers and the ability for models to have really rich pre trainings
0:32:24 allows you to kind of start at the end, so to speak, with a model and its
0:32:31 understanding, but that hasn’t fully made its way as impactful as it can to vision,
0:32:35 meaning like how can you use a lot of the pre trained capabilities and especially
0:32:37 to vision models running on the edge.
0:32:41 And so we’ve been pretty excited about how do you marry the benefits of
0:32:45 pre trained models, which allow you to generalize better with the benefits of
0:32:46 running things real time.
0:32:51 And so actually this is where NVIDIA and Robofo have been able to pair up
0:32:55 pretty closely on something that we’ll introduce and I’ll leave it at that for
0:32:57 folks to see and do an interesting to learn more.
0:33:00 All right, I’m signed up.
0:33:01 I’m interested, can’t wait.
0:33:05 So you’ve done this a few times and you know, one way or another, I’m sure
0:33:09 you’ll do it again going forward and you know, scaled up and all that good stuff.
0:33:14 Lessons learned advice you can share, you know, for founders, for people out
0:33:18 there thinking about and you know, whether it’s CV related or not.
0:33:19 What does it take?
0:33:22 What goes into being, you know, being a good leader, building a business,
0:33:27 taking an idea, seeing it through to a product that, you know, serves humans
0:33:29 as well as solving a problem.
0:33:32 What wisdom can you drop here on listeners thinking about their own
0:33:34 entrepreneurial pursuits?
0:33:36 One thing that I’ll know is you said you’ll do it again.
0:33:40 I’m actually very vocal about the fact that Robofo is the last company
0:33:44 that I’ll ever need to start like the lifetime’s worth of work by itself.
0:33:48 As soon as I said it, I was like, I don’t know that.
0:33:49 He doesn’t know that.
0:33:50 And what if that comes off?
0:33:52 Like Roboflow is not going to, I was thinking about, oh, your last
0:33:55 company got acquired and so on and so forth, but that’s great.
0:33:59 I mean, that’s like in and of itself, you know, I suppose that could be
0:34:03 turned into something of a motto for aspiring entrepreneurs or what have you.
0:34:07 But that’s instructive actually for your question because I think a lot of people,
0:34:11 you know, you should think about the mission and the challenge that you’re,
0:34:13 you know, people say commonly like, oh, you’re marrying yourself to you
0:34:17 for 10 years, but I think even that is perhaps too short of a time horizon.
0:34:21 It’s what is something that you like a promise face that you can work on
0:34:25 excitedly in the world is different as a result of your efforts.
0:34:28 I will also note that, you know, what does it take?
0:34:29 How does it figure it out?
0:34:30 I’m still figuring it out myself.
0:34:32 There’s like new stuff to learn every single day.
0:34:37 And I can’t wait for like every two years when I look back and just sort
0:34:40 of cringe at the ways that I did things at that point in time.
0:34:44 But I think that, you know, the attributes that allow people to do well in startups,
0:34:49 whether they’re working in one, starting one, interacting with one is a deep sense
0:34:55 of grit and diligence and passion for the thing that you’re you’re working on.
0:35:00 Like the world doesn’t change by itself and it’s also quite malleable place.
0:35:05 And so having the wherewithal and the aptitude and the excitement and vigor
0:35:12 to shape the world the way by which one thinks is possible requires a lot of drive
0:35:16 and determination. And so, you know, it’s work with people, work in environments,
0:35:22 work on problems where if you have that problem changed with that team and the
0:35:27 result that that company that you’re working with continues to be realized.
0:35:28 What does that world look like?
0:35:29 Does that excite you?
0:35:33 And does it give you the ability to say independently, I would want to day in
0:35:37 and day out, give it my best to ensure and realize the full potential here.
0:35:41 And when you start to think about your time that way of something that is a
0:35:46 mission and important and time that you want to enjoy with the team, with the
0:35:50 customers, with the problems to be solved, the journey becomes the destination
0:35:51 in a lot of ways.
0:35:53 And so that allows you to play infinite games.
0:35:57 It allows you to just be really focused on the key things that matter and
0:36:01 delivering customer value and making products people love to use.
0:36:03 And so I think that’s fairly universal.
0:36:06 Now, in terms of specific advice, one things or another, there’s a funny
0:36:11 paradox of like advice needs to be adjusted to the prior of one situation.
0:36:14 It’s almost like the more universally useful the piece of advice is perhaps
0:36:17 like the less novel and insightful it might be.
0:36:17 Right.
0:36:21 Here I’ll note that I pretty regularly learn from those that are a few stages
0:36:23 ahead of me and aim to pay that favor forward.
0:36:27 So I’m always happy to be a resource for folks that are building or navigating
0:36:30 career decisions or thinking about what to work on and build next.
0:36:32 So I’m pretty findable online and welcome that from listeners.
0:36:33 Fantastic.
0:36:38 So let’s just go with that segue then for folks listening who want to learn
0:36:43 more about Roboflow, want to try Roboflow, want to hit you up for advice
0:36:45 on working at or with a startup.
0:36:47 Where should they go online?
0:36:50 Company sites, social medias, where can listeners go to learn more?
0:36:54 Roboflow.com is where you can sign up from the build of the platform.
0:36:55 We have a careers page.
0:36:59 If you’re generally interested in startups, work@astardup.com is YC’s
0:37:02 job support and we’ve hired a lot of folks from there.
0:37:03 So that’s a great resource.
0:37:10 I’m accessible online on Twitter of our ex @JosephofIowa and regularly share
0:37:11 a bit about what we’re working on.
0:37:13 And I’m very happy to be a resource.
0:37:16 If you’re in San Francisco and you’re listening to this, you might be surprised
0:37:18 that sometimes I’ll randomly tweet out when we’re welcoming folks to come
0:37:21 co-work out of our office on some Saturdays and Sundays.
0:37:23 So feel free to reach out.
0:37:24 Excellent.
0:37:25 Just Nelson, Roboflow.
0:37:27 This is a great conversation.
0:37:29 Thank you so much for taking the time.
0:37:33 And, you know, as you well articulated, the work that you and your teams are
0:37:39 doing is not only fascinating, but it applies to so much of what we do on
0:37:40 the earth, right, and beyond the earth.
0:37:45 So all the best of luck in everything that you and your growing community are doing.
0:37:46 Really appreciate it.
0:37:48 [MUSIC PLAYING]
0:37:50 [MUSIC PLAYING]
0:37:52 [MUSIC PLAYING]
0:37:54 [MUSIC PLAYING]
0:37:56 [MUSIC PLAYING]
0:37:58 [MUSIC PLAYING]
0:38:00 [MUSIC PLAYING]
0:38:02 [MUSIC PLAYING]
0:38:04 [MUSIC PLAYING]
0:38:06 [MUSIC PLAYING]
0:38:08 [MUSIC PLAYING]
0:38:10 [MUSIC PLAYING]
0:38:12 [MUSIC PLAYING]
0:38:14 [MUSIC PLAYING]
0:38:16 [MUSIC PLAYING]
0:38:18 [MUSIC PLAYING]
0:38:22 [MUSIC PLAYING]
0:38:24 [MUSIC PLAYING]
0:38:26 [MUSIC PLAYING]
0:38:28 [MUSIC PLAYING]
0:38:30 [MUSIC PLAYING]
0:38:32 [MUSIC PLAYING]
0:38:34 [MUSIC PLAYING]
0:38:36 [MUSIC PLAYING]
0:38:46 [BLANK_AUDIO]

Joseph Nelson, co-founder and CEO of Roboflow, discusses how the company is making computer vision accessible to millions of developers and industries, from manufacturing to healthcare and more. 

The AI PodcastThe AI Podcast
0
Let's Evolve Together
Logo