AI transcript
0:00:10 pushkin this is an iHeart podcast
0:00:18 run a business and not thinking about podcasting think again more americans listen to podcasts
0:00:22 then add supported streaming music from spotify and pandora and as the number one podcaster
0:00:26 iHeart’s twice as large as the next two combined learn how podcasting can help your business call
0:00:35 844-844-iHeart in a metaphorical sense ai is everywhere it can write essays it can do your
0:00:44 taxes it can design drugs it can make movies but in a literal sense ai is not everywhere you know
0:00:49 a large language model can tell you whatever 27 ways to fold your shirts and put them in the drawer
0:00:55 but there’s no robot that you can buy that can actually fold your shirts and put them in the
0:01:02 drawer at some point though maybe at some point in the not that distant future there will be a robot
0:01:08 that can use ai to learn how to fold your shirts and put them in the drawer or you know cook lasagna
0:01:15 pack boxes plug in cables in other words there will be a robot that can use ai to learn how to do
0:01:20 basically anything
0:01:27 i’m jacob goldstein and this is what’s your problem the show where i talk to people who are trying to make
0:01:33 technological progress my guest today is chelsea finn she’s a professor at stanford and the co-founder of a
0:01:42 company called physical intelligence aka pi chelsea’s problem is this can you build an ai model that will
0:01:48 bring ai to robots or as she puts it we’re trying to develop a model that can control
0:01:56 any robot to do any task anywhere physical intelligence was founded just last year but the company has already
0:02:05 raised over 400 million dollars investors include jeff bezos and open ai the company has raised so much money in part
0:02:11 because what they’re trying to do is so hard motor skills the ability to move in fine ways to fold
0:02:19 a shirt to plug in a cable they feel simple to us easy basic but chelsea told me basic motor skills are in fact
0:02:27 wildly complex all of the motor control that we do with our body with our hands with our legs our feet
0:02:33 a lot of it we don’t think about when we do it it actually is incredibly complicated what we do
0:02:40 this is actually like a really really hard problem to develop in ai systems and robots uh despite it
0:02:44 being so simple and the reasons for that are because actually it is inherently very complex
0:02:52 and second that we don’t have tons and tons of data of doing this in part because it’s so basic to humans
0:02:59 right let’s talk about the data side because that seems like really the story right the big challenge
0:03:06 and it’s particularly interesting in the context of large language models and computer vision which
0:03:13 really seem to have emerged in a weird way as a consequence of the internet right just because we
0:03:20 happen to have this crazy amount of data of words and pictures on the internet we were able to train
0:03:27 language models and computer vision models but we don’t have that for uh for robots right there is no
0:03:33 data set of of training data for robots which is like the big challenge for for you and for robotics in
0:03:39 general it seems yeah so we don’t have an open internet of how to control motors to to do like even really
0:03:45 basic things maybe the closest thing we have is we have videos of people doing things and perhaps that could
0:03:50 be useful but at the same time if i watch like videos of like roger federer playing tennis
0:03:55 you can’t just become an amazing tennis player as a result of that and likewise just with videos of
0:04:00 people doing things um it’s very hard to actually extract the motor control behind that and so that
0:04:08 lack of data that scarcity of data makes it a in some ways a very different problem uh than in language and
0:04:11 computer vision and i think that we should still learn a lot of things from language and computer vision
0:04:18 and collect large data sets like that it opens up new new challenges new possibilities on that front and i think
0:04:23 that in the long run we should be able to get large amounts of data uh just like how in autonomous driving
0:04:29 we have lots of data of cars driving around very effectively robots too could be in the world collecting
0:04:35 data learning about how to pick up mustard and put it on a hot dog bun or learning how to open a cabinet
0:04:40 to put some objects away uh we can get that sort of data but it’s not given to us for free
0:04:52 um you still have this core problem which is there is no giant trove of physical reality data that you
0:04:57 can train your model on right that’s the great big challenge it seems what do you do about that how do
0:05:04 you start to approach that yeah so we’re starting off by collecting data through teleoperation where
0:05:11 you are people are controlling the robot to do tasks and then you don’t just get video data you get the
0:05:16 videos alongside what are the actions or the motor commands needed to actually accomplish those tasks
0:05:23 uh we’ve collected data in our own office we’ve also collected data in homes across san francisco
0:05:30 and we also have a very modest warehouse uh it’ll in some ways actually like our current operation is
0:05:36 is rather small given that we’re a little over a year old at this point like what what’s actually
0:05:39 happening like if i went into your warehouse and somebody was doing teleoperation what would i see
0:05:47 what would it look like yeah so we it’s a little bit like controlling a puppet so the the person who’s
0:05:53 operating the robot they are holding um in some ways a set of robot arms but they’re very
0:05:58 lightweight robot arms and we use those to measure the positions of joints it’s almost like an
0:06:03 elaborate control for a video game or something it’s like that it’s it’s not actually a robot arm
0:06:07 right it’s a thing you yeah control to sort of play the robot to be to make the robot move yeah
0:06:14 exactly exactly and then uh we record that and then directly translate those controls over to the
0:06:19 robot we have some robots that are just robot arms where you’re only just controlling the robot arm
0:06:23 it’s mounted to a table or something like that but we also have what we call mobile
0:06:28 manipulators that have wheels and robot arms and you can control both how the robot drives around as
0:06:36 well as how the arms move and we’re doing tasks like wiping down counters folding laundry putting
0:06:43 dishes into dishwashers plugging cables into like data center racks assembling cardboard boxes lots and
0:06:49 lots of different tasks that might be useful for robots to do and recording all the data so we have
0:06:55 cameras on the robots there are sensors on the joints on the motors of the robots as well um and we record
0:07:01 that in like a synchronized way across time so when you do it are it’s like kind of like a real world
0:07:06 video game like you’re moving your arms in these things and and in basically real time the robot arm
0:07:12 is moving and picking up the thing you want it to pick up and like what’s it like is is there like a curve
0:07:18 like at the beginning it’s really bad sort of tell me talk me through an instance it actually depends on
0:07:22 the person so some people can pick it up really really quickly some people are a bit slower to pick
0:07:29 it up i pride myself in being a pretty good operator okay and so i have done tasks as complex as peeling a
0:07:34 hard-boiled egg with the robot no uh which is how are you how are you at peeling a hard-boiled egg
0:07:41 a hard-boiled egg with your hands uh it’s pretty hard with my own hands too yeah and with the robot
0:07:44 it’s even harder tell me about the robot peeling a hard-boiled egg because that sounds like a hard one
0:07:50 yeah so the the robots basically all the robots that we’re using are like kind of pincher grippers
0:07:53 they’re called parallel jaw grippers yeah where you there’s just one degree of freedom like open
0:07:59 close two pincers it’s basically two pincers like two pincers two arms yeah exactly and and i’ve i’ve used
0:08:06 that exact setup um there’s six different joints on the arm so it can move as kind of full basically
0:08:12 full range of motion in 3d space and 3d rotation and you can use that to feel a hard-boiled egg you
0:08:16 don’t have any tactile feedback so you can’t actually feel the egg and that’s actually one of the things
0:08:23 that makes it more difficult but you could actually you can use visual feedback to compensate for that and
0:08:28 so just by looking at the egg myself i’m able to figure out if you’re like in contact with something
0:08:33 and you just use one prong of the claw like what i could say you squeeze it a little to crack it and
0:08:39 then use like one prong of the claw to get the shell off yeah exactly so you can you you want to
0:08:45 crack it initially and then hold it with one gripper and then use basically one of the two fingers in the
0:08:52 gripper to get pieces of shell off when we did this we hard-boiled only two eggs and uh the first egg
0:08:57 this actually is stanford the first egg a graduate student ended up breaking and so then i did the
0:09:02 second egg and i was able to successfully not break it and and fully peel it it took some patience
0:09:07 certainly and i wasn’t able to do it as quickly as with my own hands but it i guess goes to show the
0:09:15 extent to which we’re able to control robots to do pretty complicated things yeah and so obviously i mean
0:09:21 that is a a stunt or a game or something fun to do with the robot but presumably in that instance as in
0:09:28 other instances of uh folding clothes and vacuuming it like there is learning right the idea is that
0:09:33 you do it some number of times and then the robot can do it and then presumably there’s also generalization
0:09:40 but just to start with learning like you know reductively how many times you got to do it for the robot to learn
0:09:48 yeah so it really depends on the extent to which you want the robot to handle different conditions so
0:09:55 uh in some of our research we’ve been able to show the robot how to do something like 30 times or 50 times
0:10:00 and just with that maybe sounds like a bit but you can do that in like typically less than an hour if
0:10:06 it’s a simple task and from that the robot can under the circumstances if you only kind of demonstrate it
0:10:12 in a narrow set of circumstances like a single environment a single particular object the robot
0:10:18 can learn just from like less than an hour of data what is an example of a thing that the robot learned
0:10:24 in less than an hour of data oh um yeah we put a shoe on a foot we we tear it off a piece of tape
0:10:32 and put it on a box uh we’ve also hung up a shirt on a hanger so that’s not that much i mean especially
0:10:37 especially because you say the robot but what you really mean is the model so every robot right
0:10:42 presumably or every robot that’s built more or less like that one right like that’s one of the key things
0:10:47 it’s like you’re not teaching one robot you’re teaching every robot ever because it’s it’s software
0:10:53 fundamentally it’s an am model it’s not hardware yeah yes with the caveat that if you want to be this data
0:10:59 efficient um it works best if it’s like in the same like the same color of the table the same kind of
0:11:03 rough initial conditions of where the objects are starting right and the same shirt for example so
0:11:08 this is just with like a single shirt uh and not like any shirt so so there’s there’s like concentric
0:11:13 circles of generalizability right like exact same shirt exact same spot exact same table versus like
0:11:23 fold a shirt versus fold clothes right and versus and so is that just infinitely harder like how does that work
0:11:28 that’s your big that’s your big challenge at some level right yeah so generalization is one of the big
0:11:34 one of the big challenges not the only one but it’s one of the big challenges and in some ways i mean the
0:11:38 the first unlock there is just to make sure that you’re collecting data not just for one shirt but
0:11:43 collecting it for lots of shirts or collecting for lots of clothing items and ideally also collecting
0:11:49 data with lots of tables with different textures and and also like not just visual like appearances but
0:11:54 also like if you’re folding on a surface that has very low friction like it’s very smooth versus a
0:11:59 surface that like maybe on top of carpet or something that’s going to behave differently uh when
0:12:06 you’re trying to move the shirt across the table so having variability in the scenarios in which the robot is
0:12:13 experiencing in the data set is important and the we’ve seen evidence that if you set things up
0:12:18 correctly and collect data under lots of scenarios you can’t actually generalize to completely new scenarios
0:12:25 and in like the pi 05 release for example we found that if we collected data in roughly like 100
0:12:33 different rooms then the robot is able to do some tasks in rooms that it’s never been in before
0:12:40 so so you mentioned pi 05 so pi 0.5 that’s your that’s your latest model that you’ve released right
0:12:48 um tell me about that like what what does that model allow robots to do like what robots in what settings
0:12:55 and what tasks yeah yeah definitely so we we’re focusing on generalization so the previous um model
0:13:00 we were focusing on capability and we did a really complicated task of laundry folding from there we
0:13:05 wanted to answer like okay that model worked in one environment it’s fairly brittle if you put it in a
0:13:09 new environment it wouldn’t work and so we wanted to see if we put robots in new environments with new
0:13:16 objects new lighting conditions new furniture can the robot be successful and to do that we collected
0:13:24 data on these mobile manipulators which is feels like a terrible name but uh robots with two arms and
0:13:29 wheels that can drive around kind of like a humanoid but we’re using wheels instead of legs a bit more
0:13:38 practical in that regard and we train the robot to do things like tidying a bed or wiping spills off of a
0:13:44 surface or putting dishes into a sink or putting away items into drawers taking items of clothing
0:13:49 dirty clothing off the floor and putting them into a laundry basket things like that and then we tested
0:13:55 whether or not after collecting data like that in lots of environments aggregated with other data
0:14:01 including data on the internet can the robot then do those things in a home that has never been in
0:14:10 before and in some ways that sounds kind of basic like people have no problem with if you can do it
0:14:14 something in in like one home probably could do the same thing in another home it’s not really doesn’t
0:14:19 seem like a complicated thing for humans but for robots that are trained on data if they’re only trained
0:14:24 on data in one place their whole universe is that one place they haven’t ever seen in the other place
0:14:29 this is actually kind of a big challenge for existing methods and yeah it was a step forward
0:14:35 we were able to see that it definitely isn’t perfect by any means and that kind of comes to another
0:14:40 challenge which is reliability but we’re able to see the robot do things in homes it’s never been in
0:14:45 before where we set it up ask it to do things and it does some things that are useful so like in the
0:14:50 classical setting where a robot is trained in one room like it doesn’t even know that room is a room
0:14:55 that’s just like the whole world to the robot is that world right and if you put it in another room
0:15:01 it’s in a completely unfamiliar world exactly and so for example what we were talking about like
0:15:07 hanging up a shirt its whole world was like that one like like a black tabletop that smooth that one
0:15:12 blue shirt that one coat hanger and it doesn’t know about this like entire universe of other shirts and other
0:15:16 it doesn’t know that there is a category called shirt it only knows yeah it doesn’t even know what
0:15:21 shirts are yeah it doesn’t even know what shirts are in for pi 0.5 like what did you ask the robot to do
0:15:28 and how well did it work yeah so we we trained a model uh we took actually a pre-trained language model
0:15:35 uh with also like a vision component and we fine-tuned it on a lot of data including data from different
0:15:40 homes across san francisco but actually a lot of other data too so actually only two percent of the data was
0:15:46 on these like mobile robots with arms so we can store how the motors were all moving in all of our
0:15:52 previous data yeah um and then train the model to mimic that data that we’ve stored it’s like it’s
0:15:55 like predicting the next word but instead of predicting the next word it’s like predicting the
0:16:01 next movement or something like yes exactly um we’ve kind of trained it to predict next actions or next
0:16:07 motor commands instead of next words uh we do an additional training process to have it focus on
0:16:13 and and be good at the the mobile robot data in homes then we set up the robot in a new home and
0:16:19 we give it language commands so uh we can give it low level language commands or we can actually
0:16:25 also give it higher level commands so the highest level of command might be clean the bedroom and one
0:16:28 of the things that we’ve also been thinking about more recently is can you give it a more detailed
0:16:32 description of how you want it to clean the bedroom but we’re not quite there yet so we can say clean
0:16:37 the bedroom we can also tell it put the dirty clothes in the laundry basket uh so that would be
0:16:44 kind of a a subtask or we can tell it like commands like pick up the shirt put the shirt in the laundry
0:16:52 basket then after we tell it that command then it will go off and follow that command and actually
0:17:00 in most cases realize that command successfully in the real world how did it do so it depends on the
0:17:07 the average success rate was around 80 so definitely room for improvement uh and in many scenarios it was
0:17:13 able to be quite successful we also saw some some failure modes where uh for example if you’re trying
0:17:18 to put dishes into a sink sometimes one of the dishes was a cutting board and picking up a cutting
0:17:22 board is actually pretty tricky for the robot because you either need to slide it to the edge of the
0:17:29 counter and then grasp it or somehow get the kind of get the finger underneath the cutting board and so
0:17:34 sometimes it was able to do that successfully sometimes it struggled and got stuck the exciting
0:17:38 thing though was that it was able to um we were able to kind of drop it in places that had never been
0:17:44 before and i was doing things that were quite reasonable so what are you doing now like what’s the
0:17:50 next thing you’re trying to get to yeah absolutely so the next thing we’re focusing on is reliability
0:17:58 and and speed so i mentioned like around 80 for these tasks uh how do we get that to 99 and i think that
0:18:05 if we can get the reliability up that’s kind of in my mind the main missing ingredient before we can
0:18:13 like really have these being like useful in real world scenarios so getting to 99 is interesting i mean
0:18:21 i think of self-driving cars right where it seemed some time ago i don’t know 10 years ago 15 years ago
0:18:26 like they were almost there and i know they’re more almost there now i know in san francisco there
0:18:32 really are self-driving cars but they’re still very much at the margin of cars in the world right and
0:18:37 it does seem like almost there means different things in different settings but
0:18:45 i don’t know is it super hard to get from 80 to 99 does the self-driving car example
0:18:54 teach us anything for your uh work the self-driving car analogy is is pretty good uh i do think that
0:19:00 fortunately we may not need there there are scenarios where we may not need it to be quite as reliable
0:19:08 as cars cars there’s a much much higher safety risk it’s much easier to hurt people and in robots there
0:19:13 are safety risks because you are in the physical world but it’s easier to put in software precautions
0:19:18 in place and even hardware precautions in place to prevent that as well so that makes it a little bit
0:19:24 easier i mean 99 probably isn’t good enough for cars right they probably need more nines than that
0:19:29 whereas it may well be good enough for a house cleaning robot yeah in certain circumstances and
0:19:34 yeah like we’re also thinking about scenarios where maybe even less than that is fine and if we view
0:19:40 humans and robots working together it’s more about kind of helping the person complete the task faster
0:19:46 or complete the task like more effectively uh so i think there might be scenarios like that uh but still
0:19:51 we need the performance and reliability to be higher for the robots to be faster in order to accomplish
0:19:52 that
0:19:56 we’ll be back in just a minute
0:20:13 run a business and not thinking about podcasting think again more americans listen to podcasts than ad
0:20:18 supported streaming music from spotify and pandora and as the number one podcaster iheart’s twice as large
0:20:24 as the next two combined so whatever your customers listen to they’ll hear your message plus only iheart
0:20:29 can extend your message to audiences across broadcast radio think podcasting can help your business
0:20:37 think iheart streaming radio and podcasting let us show you at iheartadvertising.com that’s iheartadvertising.com
0:20:43 what do you imagine as the initial real world use cases
0:20:48 what do you imagine as the initial real world use cases iheartadvertising.com that’s the initial real world use cases
0:20:54 that have attempted to kind of start with an application and hone in on that
0:20:59 and so i think the lesson from watching those companies is that
0:21:05 the you end up then spending a lot of time on the problems of that specific application
0:21:10 and less on developing the sort of generalist systems that we think in the long run will be more effective
0:21:15 and so we’re very focused on understanding like what are the core
0:21:19 bottlenecks and the core missing pieces for developing these generalist models
0:21:23 and we think that if we had picked an application now we would kind of lose sight of that bigger problem
0:21:26 because we need to solve things that are specific to that application
0:21:30 so we’re very focused on what we think are like the core
0:21:32 technological challenges
0:21:36 we have certain tasks that we’re working on some of them have been home cleaning tasks
0:21:40 we also have some more kind of industrial like tasks as well
0:21:44 just to instantiate and actually be iterating on robots
0:21:48 and applications could range from things in homes
0:21:53 to things in workplaces to industrial settings
0:21:56 there’s lots and lots of use cases for
0:21:59 intelligent robots and intelligent kind of physical machines
0:22:04 what are some of the industrial tasks you’ve been working on
0:22:07 one example that i mentioned before is inserting cables
0:22:11 there’s lots of use cases in like data centers for example
0:22:14 where that’s a challenging task
0:22:20 another example is is like constructing cardboard boxes and filling them with items
0:22:21 we’ve also done some packaging tasks
0:22:25 highly relevant to lots of different kind of shipping operations
0:22:29 and then even folding clothes it seems like a very home task
0:22:35 but it turns out that there are companies that need to fold very like very large amounts of clothing
0:22:42 and so that’s also something that in the long term could be used in in larger scale settings
0:22:45 so um i’ve i’ve read that you
0:22:49 have open sourced your model weights
0:22:52 and given designs of robots to hardware companies
0:22:53 and i’m interested in that
0:22:55 in that set of decisions right
0:22:58 that set of sort of strategic decisions
0:22:59 tell me about that
0:23:01 sort of giving away ip basically right
0:23:03 yeah yeah definitely so
0:23:04 this is a really hard problem
0:23:07 especially this longer term problem of developing a generalist system
0:23:11 we think that the field is very young
0:23:15 and there’s like a couple of reasons
0:23:18 one is that we think that the field needs to mature
0:23:22 and we think that having more people being kind of competent with using robots
0:23:24 and using this kind of technology
0:23:27 will be beneficial in the long term for the company
0:23:30 and by open sourcing things we make it easier for people to do that
0:23:32 and then the second thing is like
0:23:34 the models that we develop right now
0:23:37 they’re very early and the models that we’ll be developing
0:23:40 one two three years from now
0:23:44 are going to be far far more capable than the ones that we have now
0:23:47 and so um it’s kind of like like equivalent to like open ai
0:23:51 open sourcing uh gpt2 gpt3
0:23:53 um they actually didn’t open source gpt3
0:23:58 but like i think that they would still be in an excellent spot today if they had
0:24:05 uh like what could go wrong that would either prevent you as a company from succeeding
0:24:07 or even hold back the field in general
0:24:11 i don’t think we entirely know the scale of data
0:24:16 that we need for getting really capable models
0:24:18 and there’s a little bit of a chicken and egg problem
0:24:20 where it’s a lot easier to collect data
0:24:22 once you have a really good model
0:24:24 uh it took like large amounts of data
0:24:27 right or if there were thousands of robots out of the world running your model
0:24:30 there would just be an incredible amount of data coming into you every day
0:24:30 right
0:24:34 yeah yeah exactly so that’s that’s one thing
0:24:35 i’m actually less
0:24:38 maybe less a little bit less concerned about that myself
0:24:39 and then i think the other thing is just that
0:24:41 there are technological challenges
0:24:43 to getting these things to work really well
0:24:44 i think that
0:24:48 i think we’ve had incredible progress uh over the last
0:24:51 uh year and two months over the last like 14 months i think
0:24:55 since we’ve started probably more progress than than i was expecting
0:24:58 uh honestly compared to when we started the company
0:25:04 i think it’s like wild that we were able to get a robot to like unload and fold laundry
0:25:06 like a 10 minute long task
0:25:11 and folding laundry is like a famously hard robot problem right
0:25:13 like it’s the one that people in robotics talk about
0:25:18 when they talk about things people think are easy are actually hard for robots right
0:25:22 yeah absolutely absolutely i mean you have to deal with all sorts of variability and how
0:25:24 clothes can be crumpled on each other and
0:25:28 also it’s like there’s even like really small minor things you need to do in order to like
0:25:32 actually get it to be flat on the table and uh and folded nicely and even stacked and
0:25:38 as the task gets longer as well there are more opportunities to make mistakes more opportunities to get stuck
0:25:42 and so if you’re doing a task that takes 10 minutes in those 10 minutes there’s many many
0:25:45 times where the robot can make a mistake that it can’t recover from
0:25:50 or uh just get stuck or something like that and so being able to do such a long task
0:25:55 starts to kind of point at the resilience of that these models can have by recovering from those mistakes
0:26:01 uh-huh so when we were first trying to fold laundry like one of the common failure modes
0:26:06 is that it would fold the laundry like very well by my standards at the time
0:26:11 i would be very very happy with the robot and then it would push the entire stack of laundry onto the ground
0:26:16 uh-huh sort of sort of like teaching a toddler to fold clothes
0:26:25 yeah yeah exactly was there a particular moment when you saw a robot using your model
0:26:31 fold clothes for 10 minutes and it worked yeah um first off we started with just folding a shirt
0:26:35 starting flat on the table we got that to work pretty quickly that it turns out to be pretty easy
0:26:41 um and i wasn’t too surprised by that and then we moved from that to starting in like just a random ball
0:26:45 like some sort of crumpled position on the table and then you have to flatten and then fold it
0:26:50 and that makes the problem dramatically harder because of all the variability and having to
0:26:56 figure out how to flatten it we were kind of stuck on that problem for like at least a couple months
0:27:02 uh where everything we were trying the success rate of the robot was zero percent it wasn’t able to really
0:27:11 make progress on it and we started to see signs of life i think in like august or september of last year
0:27:19 we tried a new recipe where we were continue to train the model on a curated part of the data that was
0:27:25 following a consistent strategy and that sort of high quality post training is what really seemed to
0:27:30 make the model work better and then the moment that that i was most excited about was the first time
0:27:37 that i saw the model flatten and fold and stack five items in a row yeah i just remember going home that
0:27:42 night and being like so excited it seemed like we had just like figured out this this big missing
0:27:48 puzzle piece so i was asking you why why might it not work or what might slow the field down and then we
0:27:53 talked about the happy shirt story but if in five years things didn’t progress as quickly as you thought
0:27:59 what what what might have happened i mentioned that i think that incorporating practice like allowing the
0:28:07 robot to practice the task should be really helpful for allowing robots to get better we don’t know what
0:28:14 exactly that recipe will look like and so it’s it’s like a research problem uh and with any sort of
0:28:19 research problem you don’t know exactly how hard the solution is going to be and i think that there are
0:28:25 some other more nuanced unknowns as well that are somewhat similar to that and we have a large number of
0:28:30 very talented researchers on our team because we think that there are some of these unsolved
0:28:37 breakthroughs that are going to be needed to like really truly solve this problem so if if it does work
0:28:45 well uh and things progress in that universe what would you be worried about
0:28:53 question i mean if things work well i i shouldn’t be too worried in general uh i do think that it’s
0:28:59 very easy in general to underestimate the challenges around actually deploying and disseminating technology
0:29:05 that takes time and when the technology doesn’t exist yet that that means that like the world is not in
0:29:11 a place that is like ready for that technology i think that there’s a lot of unknowns there i mean one of the
0:29:18 striking things to me about say language models is the people who know the most about them seem to be
0:29:23 the most worried about them which is generally not the case i think historically with technology right
0:29:30 the possible exception of the atomic bomb uh and and so i’m curious i mean those kinds of worries like
0:29:35 do you share them are there worries you have about developing a foundation model for robots
0:29:44 about bad actors using it even i do think that like yeah there’s plenty of technology that has dual uses
0:29:55 uh and i think there are applications of technologies that are harmful i think that a lot of the concerns in
0:30:03 the language model community stem from imbuing these systems with greater autonomy
0:30:13 and i think that i so i work like hands-on with the robots quite a bit and i don’t see a world in
0:30:19 which they will be taking over in any way uh it’s very easy to just like well with our current iteration
0:30:26 of robots to just like if we threw some water on it the robot wouldn’t be uh in trouble so uh that might
0:30:30 be a problem for you but we i’m sure you could solve that one we’re we’re we’re working on it so we
0:30:34 actually do have a new iteration that that is actually a lot more waterproof but this is not
0:30:40 a concern that i show okay interesting basically just because you think we can whatever turn it off
0:30:45 if we need to yeah and yeah and i i think yeah there’s always going to be dual use concerns but i
0:30:51 think that uh the the pros of the technology outweigh outweigh some of the job times well give me the happy
0:30:56 story then like in what what number of years should we choose for a happy story 10 is 10 too soon
0:31:03 i don’t want to put a number to it i think that the okay with research you don’t know exactly how long
0:31:11 things will take and i i envision a world where the when you’re developing hardware uh it’s it’s not too
0:31:18 hard to actually teach it to do something and teach it to do something useful rather than just having
0:31:25 machines that are not particularly intelligent like dishwashers and and laundry machines and so forth
0:31:32 go bigger if you would like what like what what what would people be teaching robots to do in that
0:31:38 world i guess if we were to go bigger i think that there’s a lot of challenges around helping
0:31:44 helping people as they age allowing them to be more independent i think that that’s like a huge one
0:31:49 i think that i don’t know manufacturing there’s all sorts of places where like there’s abusive labor
0:31:56 practices and we can maybe like be able to eliminate those if it’s a robot instead of a human yeah many
0:32:00 many many examples and i think that there’s also even things that are even hard to imagine because the
0:32:06 technology doesn’t exist so a lot of the things that i’m thinking about are robots helping humans and in
0:32:11 different circumstances to allow them to be more productive uh but once something exists like you
0:32:19 often like people are creative and come up with new ways of of how that’s used we’ll be back in a minute
0:32:33 run a business and not thinking about podcasting think again more americans listen to podcasts than
0:32:38 ad-supported streaming music from spotify and pandora and as the number one podcaster iheart’s twice as
0:32:43 large as the next two combined so whatever your customers listen to they’ll hear your message plus
0:32:48 only iheart can extend your message to audiences across broadcast radio think podcasting can help
0:32:55 your business think iheart streaming radio and podcasting let us show you at iheartadvertising.com
0:33:05 that’s iheartadvertising.com um great let’s finish with lightning round um what’s one thing that working
0:33:15 with robots has caused you to appreciate about the human body our skin is pretty amazing uh-huh well so
0:33:23 so we didn’t talk about uh i mean a sense of touch or of of heat or of cold right i mean presumably the
0:33:28 models you’re building the robots you’re using don’t have that but they could right they could have
0:33:35 a sense of touch is anyone working on that is that of interest to you lots of people working on it i
0:33:40 think it’s pretty interesting i think that the hardware technology is not super mature compared
0:33:45 to where i’d like for it to be in terms of how robust it is and the cheapness and the resolution
0:33:52 that said like we actually put cameras on the wrists of our robot uh to help it get some sort of tactile and
0:33:57 for example if you can if you like visually look at your finger as you make contact with an object
0:34:03 uh you can see it deform around that object and you could actually just by looking at your finger
0:34:08 get some notion of tactile feedback similar to what our skin gets yeah and cameras are cheap really
0:34:13 robust um way more robust and cheap than than existing technology for tactile sensing
0:34:21 um i’ve heard you say that humanoid robots are overrated and i’m curious why you think that
0:34:29 simplicity is really helpful and important when trying to develop technology uh when you introduce
0:34:35 more complexity than it’s needed it slows you down a lot and i think that the complexity that humanoids
0:34:41 introduce yeah i think that if all of the robots we were working with were humanoids i think that we
0:34:46 wouldn’t have made anywhere near the progress that we made because we’d be dealing with additional
0:34:51 challenges uh i also think that optimizing for ease of data collection is really important in a world
0:34:59 where we need data and it’s a lot harder to collect and operate all of the different joints and motors of
0:35:07 a humanoid than it is to control a simpler robot um do you anthropomorphize robots i hate it when people
0:35:14 anthropomorphize robots uh i think that it is misleading because the failure modes that robots
0:35:20 have are very different from the failure modes that people have and it misleads people into thinking that
0:35:27 it’s going to behave in the way that people behave uh like like in what way oh like if you see a robot
0:35:32 doing something like doing a backflip like or or even folding laundry you kind of assume that anything
0:35:35 like like if you saw a person do that then they probably could do a lot of other things too
0:35:40 and if you anthropomorphize the robot then you assume that it like the capabilities that you see
0:35:47 are representative as if it were like a human uh-huh and that it could do a backflip anywhere or that it
0:35:52 could fold laundry anywhere uh with anything any item of clothing or surely you would think a robot that
0:36:00 could do a backflip could fold a shirt but no exactly exactly so sometimes it’s fun to like assign
0:36:05 emotions to some of the things or say the robot’s having a bad day uh because certainly it feels
0:36:11 like that sometime but when it kind of moves beyond fun uh and jokes it might have consequences that i
0:36:19 don’t think make sense um i read that there was a researcher who said they would retire if a robot
0:36:27 tied a shoelace and then one of your robots tied a shoelace and i guess they didn’t retire but i’m curious
0:36:33 what would you need to see a robot do to retire
0:36:41 hmm i don’t know i guess one example that i’ve given before that i would love to see a robot do i
0:36:46 don’t think this is quite retirement level but being able to go into a kitchen that it’s never been in
0:36:54 before and make a bowl of cereal pretty basic especially compared to doing a backflip i cannot do a backflip
0:36:58 myself but i can make a bowl of cereal but it requires being able to find objects in the
0:37:04 environment being able to interact with delicate objects like a cereal box uh maybe even use tools
0:37:10 in order to open the cereal box pouring liquids yeah so that’s a task that i love and i could actually
0:37:17 even see us being able to show a demo of that without too much difficulty actually uh if we put our mind to
0:37:22 it and collected data for it so it actually is i think more within reach than than maybe i imagined
0:37:28 a few years ago just just as you’re thinking about it it’s it’s it’s getting closer you’re like oh wait
0:37:33 we could do that yeah i mean we’ve actually collected data of pouring cereal um like opening a cereal box
0:37:38 and pouring it into a bowl we haven’t yet done liquid handling uh and pouring but i think we’re actually
0:37:45 going to do it this week on the robot i asked the hardware team to to make a waterproof robot so
0:37:51 we’re not too far a lot of the pieces are coming together i also i love working with robots and so
0:37:58 and i’m also fairly young i think uh not too old uh and so i don’t imagine myself retiring anytime soon
0:38:09 chelsea finn is a stanford professor and the co-founder of physical intelligence
0:38:16 you can email us at problem at pushkin.fm and please do email us i read all the emails
0:38:23 today’s show was produced by gabriel hunter chang edited by alexandra garrettin and engineered by
0:38:27 sarah bruguer i’m jacob goldstein and we’ll be back next week with another episode of what’s your problem
0:38:40 this is an iheart podcast
0:00:18 run a business and not thinking about podcasting think again more americans listen to podcasts
0:00:22 then add supported streaming music from spotify and pandora and as the number one podcaster
0:00:26 iHeart’s twice as large as the next two combined learn how podcasting can help your business call
0:00:35 844-844-iHeart in a metaphorical sense ai is everywhere it can write essays it can do your
0:00:44 taxes it can design drugs it can make movies but in a literal sense ai is not everywhere you know
0:00:49 a large language model can tell you whatever 27 ways to fold your shirts and put them in the drawer
0:00:55 but there’s no robot that you can buy that can actually fold your shirts and put them in the
0:01:02 drawer at some point though maybe at some point in the not that distant future there will be a robot
0:01:08 that can use ai to learn how to fold your shirts and put them in the drawer or you know cook lasagna
0:01:15 pack boxes plug in cables in other words there will be a robot that can use ai to learn how to do
0:01:20 basically anything
0:01:27 i’m jacob goldstein and this is what’s your problem the show where i talk to people who are trying to make
0:01:33 technological progress my guest today is chelsea finn she’s a professor at stanford and the co-founder of a
0:01:42 company called physical intelligence aka pi chelsea’s problem is this can you build an ai model that will
0:01:48 bring ai to robots or as she puts it we’re trying to develop a model that can control
0:01:56 any robot to do any task anywhere physical intelligence was founded just last year but the company has already
0:02:05 raised over 400 million dollars investors include jeff bezos and open ai the company has raised so much money in part
0:02:11 because what they’re trying to do is so hard motor skills the ability to move in fine ways to fold
0:02:19 a shirt to plug in a cable they feel simple to us easy basic but chelsea told me basic motor skills are in fact
0:02:27 wildly complex all of the motor control that we do with our body with our hands with our legs our feet
0:02:33 a lot of it we don’t think about when we do it it actually is incredibly complicated what we do
0:02:40 this is actually like a really really hard problem to develop in ai systems and robots uh despite it
0:02:44 being so simple and the reasons for that are because actually it is inherently very complex
0:02:52 and second that we don’t have tons and tons of data of doing this in part because it’s so basic to humans
0:02:59 right let’s talk about the data side because that seems like really the story right the big challenge
0:03:06 and it’s particularly interesting in the context of large language models and computer vision which
0:03:13 really seem to have emerged in a weird way as a consequence of the internet right just because we
0:03:20 happen to have this crazy amount of data of words and pictures on the internet we were able to train
0:03:27 language models and computer vision models but we don’t have that for uh for robots right there is no
0:03:33 data set of of training data for robots which is like the big challenge for for you and for robotics in
0:03:39 general it seems yeah so we don’t have an open internet of how to control motors to to do like even really
0:03:45 basic things maybe the closest thing we have is we have videos of people doing things and perhaps that could
0:03:50 be useful but at the same time if i watch like videos of like roger federer playing tennis
0:03:55 you can’t just become an amazing tennis player as a result of that and likewise just with videos of
0:04:00 people doing things um it’s very hard to actually extract the motor control behind that and so that
0:04:08 lack of data that scarcity of data makes it a in some ways a very different problem uh than in language and
0:04:11 computer vision and i think that we should still learn a lot of things from language and computer vision
0:04:18 and collect large data sets like that it opens up new new challenges new possibilities on that front and i think
0:04:23 that in the long run we should be able to get large amounts of data uh just like how in autonomous driving
0:04:29 we have lots of data of cars driving around very effectively robots too could be in the world collecting
0:04:35 data learning about how to pick up mustard and put it on a hot dog bun or learning how to open a cabinet
0:04:40 to put some objects away uh we can get that sort of data but it’s not given to us for free
0:04:52 um you still have this core problem which is there is no giant trove of physical reality data that you
0:04:57 can train your model on right that’s the great big challenge it seems what do you do about that how do
0:05:04 you start to approach that yeah so we’re starting off by collecting data through teleoperation where
0:05:11 you are people are controlling the robot to do tasks and then you don’t just get video data you get the
0:05:16 videos alongside what are the actions or the motor commands needed to actually accomplish those tasks
0:05:23 uh we’ve collected data in our own office we’ve also collected data in homes across san francisco
0:05:30 and we also have a very modest warehouse uh it’ll in some ways actually like our current operation is
0:05:36 is rather small given that we’re a little over a year old at this point like what what’s actually
0:05:39 happening like if i went into your warehouse and somebody was doing teleoperation what would i see
0:05:47 what would it look like yeah so we it’s a little bit like controlling a puppet so the the person who’s
0:05:53 operating the robot they are holding um in some ways a set of robot arms but they’re very
0:05:58 lightweight robot arms and we use those to measure the positions of joints it’s almost like an
0:06:03 elaborate control for a video game or something it’s like that it’s it’s not actually a robot arm
0:06:07 right it’s a thing you yeah control to sort of play the robot to be to make the robot move yeah
0:06:14 exactly exactly and then uh we record that and then directly translate those controls over to the
0:06:19 robot we have some robots that are just robot arms where you’re only just controlling the robot arm
0:06:23 it’s mounted to a table or something like that but we also have what we call mobile
0:06:28 manipulators that have wheels and robot arms and you can control both how the robot drives around as
0:06:36 well as how the arms move and we’re doing tasks like wiping down counters folding laundry putting
0:06:43 dishes into dishwashers plugging cables into like data center racks assembling cardboard boxes lots and
0:06:49 lots of different tasks that might be useful for robots to do and recording all the data so we have
0:06:55 cameras on the robots there are sensors on the joints on the motors of the robots as well um and we record
0:07:01 that in like a synchronized way across time so when you do it are it’s like kind of like a real world
0:07:06 video game like you’re moving your arms in these things and and in basically real time the robot arm
0:07:12 is moving and picking up the thing you want it to pick up and like what’s it like is is there like a curve
0:07:18 like at the beginning it’s really bad sort of tell me talk me through an instance it actually depends on
0:07:22 the person so some people can pick it up really really quickly some people are a bit slower to pick
0:07:29 it up i pride myself in being a pretty good operator okay and so i have done tasks as complex as peeling a
0:07:34 hard-boiled egg with the robot no uh which is how are you how are you at peeling a hard-boiled egg
0:07:41 a hard-boiled egg with your hands uh it’s pretty hard with my own hands too yeah and with the robot
0:07:44 it’s even harder tell me about the robot peeling a hard-boiled egg because that sounds like a hard one
0:07:50 yeah so the the robots basically all the robots that we’re using are like kind of pincher grippers
0:07:53 they’re called parallel jaw grippers yeah where you there’s just one degree of freedom like open
0:07:59 close two pincers it’s basically two pincers like two pincers two arms yeah exactly and and i’ve i’ve used
0:08:06 that exact setup um there’s six different joints on the arm so it can move as kind of full basically
0:08:12 full range of motion in 3d space and 3d rotation and you can use that to feel a hard-boiled egg you
0:08:16 don’t have any tactile feedback so you can’t actually feel the egg and that’s actually one of the things
0:08:23 that makes it more difficult but you could actually you can use visual feedback to compensate for that and
0:08:28 so just by looking at the egg myself i’m able to figure out if you’re like in contact with something
0:08:33 and you just use one prong of the claw like what i could say you squeeze it a little to crack it and
0:08:39 then use like one prong of the claw to get the shell off yeah exactly so you can you you want to
0:08:45 crack it initially and then hold it with one gripper and then use basically one of the two fingers in the
0:08:52 gripper to get pieces of shell off when we did this we hard-boiled only two eggs and uh the first egg
0:08:57 this actually is stanford the first egg a graduate student ended up breaking and so then i did the
0:09:02 second egg and i was able to successfully not break it and and fully peel it it took some patience
0:09:07 certainly and i wasn’t able to do it as quickly as with my own hands but it i guess goes to show the
0:09:15 extent to which we’re able to control robots to do pretty complicated things yeah and so obviously i mean
0:09:21 that is a a stunt or a game or something fun to do with the robot but presumably in that instance as in
0:09:28 other instances of uh folding clothes and vacuuming it like there is learning right the idea is that
0:09:33 you do it some number of times and then the robot can do it and then presumably there’s also generalization
0:09:40 but just to start with learning like you know reductively how many times you got to do it for the robot to learn
0:09:48 yeah so it really depends on the extent to which you want the robot to handle different conditions so
0:09:55 uh in some of our research we’ve been able to show the robot how to do something like 30 times or 50 times
0:10:00 and just with that maybe sounds like a bit but you can do that in like typically less than an hour if
0:10:06 it’s a simple task and from that the robot can under the circumstances if you only kind of demonstrate it
0:10:12 in a narrow set of circumstances like a single environment a single particular object the robot
0:10:18 can learn just from like less than an hour of data what is an example of a thing that the robot learned
0:10:24 in less than an hour of data oh um yeah we put a shoe on a foot we we tear it off a piece of tape
0:10:32 and put it on a box uh we’ve also hung up a shirt on a hanger so that’s not that much i mean especially
0:10:37 especially because you say the robot but what you really mean is the model so every robot right
0:10:42 presumably or every robot that’s built more or less like that one right like that’s one of the key things
0:10:47 it’s like you’re not teaching one robot you’re teaching every robot ever because it’s it’s software
0:10:53 fundamentally it’s an am model it’s not hardware yeah yes with the caveat that if you want to be this data
0:10:59 efficient um it works best if it’s like in the same like the same color of the table the same kind of
0:11:03 rough initial conditions of where the objects are starting right and the same shirt for example so
0:11:08 this is just with like a single shirt uh and not like any shirt so so there’s there’s like concentric
0:11:13 circles of generalizability right like exact same shirt exact same spot exact same table versus like
0:11:23 fold a shirt versus fold clothes right and versus and so is that just infinitely harder like how does that work
0:11:28 that’s your big that’s your big challenge at some level right yeah so generalization is one of the big
0:11:34 one of the big challenges not the only one but it’s one of the big challenges and in some ways i mean the
0:11:38 the first unlock there is just to make sure that you’re collecting data not just for one shirt but
0:11:43 collecting it for lots of shirts or collecting for lots of clothing items and ideally also collecting
0:11:49 data with lots of tables with different textures and and also like not just visual like appearances but
0:11:54 also like if you’re folding on a surface that has very low friction like it’s very smooth versus a
0:11:59 surface that like maybe on top of carpet or something that’s going to behave differently uh when
0:12:06 you’re trying to move the shirt across the table so having variability in the scenarios in which the robot is
0:12:13 experiencing in the data set is important and the we’ve seen evidence that if you set things up
0:12:18 correctly and collect data under lots of scenarios you can’t actually generalize to completely new scenarios
0:12:25 and in like the pi 05 release for example we found that if we collected data in roughly like 100
0:12:33 different rooms then the robot is able to do some tasks in rooms that it’s never been in before
0:12:40 so so you mentioned pi 05 so pi 0.5 that’s your that’s your latest model that you’ve released right
0:12:48 um tell me about that like what what does that model allow robots to do like what robots in what settings
0:12:55 and what tasks yeah yeah definitely so we we’re focusing on generalization so the previous um model
0:13:00 we were focusing on capability and we did a really complicated task of laundry folding from there we
0:13:05 wanted to answer like okay that model worked in one environment it’s fairly brittle if you put it in a
0:13:09 new environment it wouldn’t work and so we wanted to see if we put robots in new environments with new
0:13:16 objects new lighting conditions new furniture can the robot be successful and to do that we collected
0:13:24 data on these mobile manipulators which is feels like a terrible name but uh robots with two arms and
0:13:29 wheels that can drive around kind of like a humanoid but we’re using wheels instead of legs a bit more
0:13:38 practical in that regard and we train the robot to do things like tidying a bed or wiping spills off of a
0:13:44 surface or putting dishes into a sink or putting away items into drawers taking items of clothing
0:13:49 dirty clothing off the floor and putting them into a laundry basket things like that and then we tested
0:13:55 whether or not after collecting data like that in lots of environments aggregated with other data
0:14:01 including data on the internet can the robot then do those things in a home that has never been in
0:14:10 before and in some ways that sounds kind of basic like people have no problem with if you can do it
0:14:14 something in in like one home probably could do the same thing in another home it’s not really doesn’t
0:14:19 seem like a complicated thing for humans but for robots that are trained on data if they’re only trained
0:14:24 on data in one place their whole universe is that one place they haven’t ever seen in the other place
0:14:29 this is actually kind of a big challenge for existing methods and yeah it was a step forward
0:14:35 we were able to see that it definitely isn’t perfect by any means and that kind of comes to another
0:14:40 challenge which is reliability but we’re able to see the robot do things in homes it’s never been in
0:14:45 before where we set it up ask it to do things and it does some things that are useful so like in the
0:14:50 classical setting where a robot is trained in one room like it doesn’t even know that room is a room
0:14:55 that’s just like the whole world to the robot is that world right and if you put it in another room
0:15:01 it’s in a completely unfamiliar world exactly and so for example what we were talking about like
0:15:07 hanging up a shirt its whole world was like that one like like a black tabletop that smooth that one
0:15:12 blue shirt that one coat hanger and it doesn’t know about this like entire universe of other shirts and other
0:15:16 it doesn’t know that there is a category called shirt it only knows yeah it doesn’t even know what
0:15:21 shirts are yeah it doesn’t even know what shirts are in for pi 0.5 like what did you ask the robot to do
0:15:28 and how well did it work yeah so we we trained a model uh we took actually a pre-trained language model
0:15:35 uh with also like a vision component and we fine-tuned it on a lot of data including data from different
0:15:40 homes across san francisco but actually a lot of other data too so actually only two percent of the data was
0:15:46 on these like mobile robots with arms so we can store how the motors were all moving in all of our
0:15:52 previous data yeah um and then train the model to mimic that data that we’ve stored it’s like it’s
0:15:55 like predicting the next word but instead of predicting the next word it’s like predicting the
0:16:01 next movement or something like yes exactly um we’ve kind of trained it to predict next actions or next
0:16:07 motor commands instead of next words uh we do an additional training process to have it focus on
0:16:13 and and be good at the the mobile robot data in homes then we set up the robot in a new home and
0:16:19 we give it language commands so uh we can give it low level language commands or we can actually
0:16:25 also give it higher level commands so the highest level of command might be clean the bedroom and one
0:16:28 of the things that we’ve also been thinking about more recently is can you give it a more detailed
0:16:32 description of how you want it to clean the bedroom but we’re not quite there yet so we can say clean
0:16:37 the bedroom we can also tell it put the dirty clothes in the laundry basket uh so that would be
0:16:44 kind of a a subtask or we can tell it like commands like pick up the shirt put the shirt in the laundry
0:16:52 basket then after we tell it that command then it will go off and follow that command and actually
0:17:00 in most cases realize that command successfully in the real world how did it do so it depends on the
0:17:07 the average success rate was around 80 so definitely room for improvement uh and in many scenarios it was
0:17:13 able to be quite successful we also saw some some failure modes where uh for example if you’re trying
0:17:18 to put dishes into a sink sometimes one of the dishes was a cutting board and picking up a cutting
0:17:22 board is actually pretty tricky for the robot because you either need to slide it to the edge of the
0:17:29 counter and then grasp it or somehow get the kind of get the finger underneath the cutting board and so
0:17:34 sometimes it was able to do that successfully sometimes it struggled and got stuck the exciting
0:17:38 thing though was that it was able to um we were able to kind of drop it in places that had never been
0:17:44 before and i was doing things that were quite reasonable so what are you doing now like what’s the
0:17:50 next thing you’re trying to get to yeah absolutely so the next thing we’re focusing on is reliability
0:17:58 and and speed so i mentioned like around 80 for these tasks uh how do we get that to 99 and i think that
0:18:05 if we can get the reliability up that’s kind of in my mind the main missing ingredient before we can
0:18:13 like really have these being like useful in real world scenarios so getting to 99 is interesting i mean
0:18:21 i think of self-driving cars right where it seemed some time ago i don’t know 10 years ago 15 years ago
0:18:26 like they were almost there and i know they’re more almost there now i know in san francisco there
0:18:32 really are self-driving cars but they’re still very much at the margin of cars in the world right and
0:18:37 it does seem like almost there means different things in different settings but
0:18:45 i don’t know is it super hard to get from 80 to 99 does the self-driving car example
0:18:54 teach us anything for your uh work the self-driving car analogy is is pretty good uh i do think that
0:19:00 fortunately we may not need there there are scenarios where we may not need it to be quite as reliable
0:19:08 as cars cars there’s a much much higher safety risk it’s much easier to hurt people and in robots there
0:19:13 are safety risks because you are in the physical world but it’s easier to put in software precautions
0:19:18 in place and even hardware precautions in place to prevent that as well so that makes it a little bit
0:19:24 easier i mean 99 probably isn’t good enough for cars right they probably need more nines than that
0:19:29 whereas it may well be good enough for a house cleaning robot yeah in certain circumstances and
0:19:34 yeah like we’re also thinking about scenarios where maybe even less than that is fine and if we view
0:19:40 humans and robots working together it’s more about kind of helping the person complete the task faster
0:19:46 or complete the task like more effectively uh so i think there might be scenarios like that uh but still
0:19:51 we need the performance and reliability to be higher for the robots to be faster in order to accomplish
0:19:52 that
0:19:56 we’ll be back in just a minute
0:20:13 run a business and not thinking about podcasting think again more americans listen to podcasts than ad
0:20:18 supported streaming music from spotify and pandora and as the number one podcaster iheart’s twice as large
0:20:24 as the next two combined so whatever your customers listen to they’ll hear your message plus only iheart
0:20:29 can extend your message to audiences across broadcast radio think podcasting can help your business
0:20:37 think iheart streaming radio and podcasting let us show you at iheartadvertising.com that’s iheartadvertising.com
0:20:43 what do you imagine as the initial real world use cases
0:20:48 what do you imagine as the initial real world use cases iheartadvertising.com that’s the initial real world use cases
0:20:54 that have attempted to kind of start with an application and hone in on that
0:20:59 and so i think the lesson from watching those companies is that
0:21:05 the you end up then spending a lot of time on the problems of that specific application
0:21:10 and less on developing the sort of generalist systems that we think in the long run will be more effective
0:21:15 and so we’re very focused on understanding like what are the core
0:21:19 bottlenecks and the core missing pieces for developing these generalist models
0:21:23 and we think that if we had picked an application now we would kind of lose sight of that bigger problem
0:21:26 because we need to solve things that are specific to that application
0:21:30 so we’re very focused on what we think are like the core
0:21:32 technological challenges
0:21:36 we have certain tasks that we’re working on some of them have been home cleaning tasks
0:21:40 we also have some more kind of industrial like tasks as well
0:21:44 just to instantiate and actually be iterating on robots
0:21:48 and applications could range from things in homes
0:21:53 to things in workplaces to industrial settings
0:21:56 there’s lots and lots of use cases for
0:21:59 intelligent robots and intelligent kind of physical machines
0:22:04 what are some of the industrial tasks you’ve been working on
0:22:07 one example that i mentioned before is inserting cables
0:22:11 there’s lots of use cases in like data centers for example
0:22:14 where that’s a challenging task
0:22:20 another example is is like constructing cardboard boxes and filling them with items
0:22:21 we’ve also done some packaging tasks
0:22:25 highly relevant to lots of different kind of shipping operations
0:22:29 and then even folding clothes it seems like a very home task
0:22:35 but it turns out that there are companies that need to fold very like very large amounts of clothing
0:22:42 and so that’s also something that in the long term could be used in in larger scale settings
0:22:45 so um i’ve i’ve read that you
0:22:49 have open sourced your model weights
0:22:52 and given designs of robots to hardware companies
0:22:53 and i’m interested in that
0:22:55 in that set of decisions right
0:22:58 that set of sort of strategic decisions
0:22:59 tell me about that
0:23:01 sort of giving away ip basically right
0:23:03 yeah yeah definitely so
0:23:04 this is a really hard problem
0:23:07 especially this longer term problem of developing a generalist system
0:23:11 we think that the field is very young
0:23:15 and there’s like a couple of reasons
0:23:18 one is that we think that the field needs to mature
0:23:22 and we think that having more people being kind of competent with using robots
0:23:24 and using this kind of technology
0:23:27 will be beneficial in the long term for the company
0:23:30 and by open sourcing things we make it easier for people to do that
0:23:32 and then the second thing is like
0:23:34 the models that we develop right now
0:23:37 they’re very early and the models that we’ll be developing
0:23:40 one two three years from now
0:23:44 are going to be far far more capable than the ones that we have now
0:23:47 and so um it’s kind of like like equivalent to like open ai
0:23:51 open sourcing uh gpt2 gpt3
0:23:53 um they actually didn’t open source gpt3
0:23:58 but like i think that they would still be in an excellent spot today if they had
0:24:05 uh like what could go wrong that would either prevent you as a company from succeeding
0:24:07 or even hold back the field in general
0:24:11 i don’t think we entirely know the scale of data
0:24:16 that we need for getting really capable models
0:24:18 and there’s a little bit of a chicken and egg problem
0:24:20 where it’s a lot easier to collect data
0:24:22 once you have a really good model
0:24:24 uh it took like large amounts of data
0:24:27 right or if there were thousands of robots out of the world running your model
0:24:30 there would just be an incredible amount of data coming into you every day
0:24:30 right
0:24:34 yeah yeah exactly so that’s that’s one thing
0:24:35 i’m actually less
0:24:38 maybe less a little bit less concerned about that myself
0:24:39 and then i think the other thing is just that
0:24:41 there are technological challenges
0:24:43 to getting these things to work really well
0:24:44 i think that
0:24:48 i think we’ve had incredible progress uh over the last
0:24:51 uh year and two months over the last like 14 months i think
0:24:55 since we’ve started probably more progress than than i was expecting
0:24:58 uh honestly compared to when we started the company
0:25:04 i think it’s like wild that we were able to get a robot to like unload and fold laundry
0:25:06 like a 10 minute long task
0:25:11 and folding laundry is like a famously hard robot problem right
0:25:13 like it’s the one that people in robotics talk about
0:25:18 when they talk about things people think are easy are actually hard for robots right
0:25:22 yeah absolutely absolutely i mean you have to deal with all sorts of variability and how
0:25:24 clothes can be crumpled on each other and
0:25:28 also it’s like there’s even like really small minor things you need to do in order to like
0:25:32 actually get it to be flat on the table and uh and folded nicely and even stacked and
0:25:38 as the task gets longer as well there are more opportunities to make mistakes more opportunities to get stuck
0:25:42 and so if you’re doing a task that takes 10 minutes in those 10 minutes there’s many many
0:25:45 times where the robot can make a mistake that it can’t recover from
0:25:50 or uh just get stuck or something like that and so being able to do such a long task
0:25:55 starts to kind of point at the resilience of that these models can have by recovering from those mistakes
0:26:01 uh-huh so when we were first trying to fold laundry like one of the common failure modes
0:26:06 is that it would fold the laundry like very well by my standards at the time
0:26:11 i would be very very happy with the robot and then it would push the entire stack of laundry onto the ground
0:26:16 uh-huh sort of sort of like teaching a toddler to fold clothes
0:26:25 yeah yeah exactly was there a particular moment when you saw a robot using your model
0:26:31 fold clothes for 10 minutes and it worked yeah um first off we started with just folding a shirt
0:26:35 starting flat on the table we got that to work pretty quickly that it turns out to be pretty easy
0:26:41 um and i wasn’t too surprised by that and then we moved from that to starting in like just a random ball
0:26:45 like some sort of crumpled position on the table and then you have to flatten and then fold it
0:26:50 and that makes the problem dramatically harder because of all the variability and having to
0:26:56 figure out how to flatten it we were kind of stuck on that problem for like at least a couple months
0:27:02 uh where everything we were trying the success rate of the robot was zero percent it wasn’t able to really
0:27:11 make progress on it and we started to see signs of life i think in like august or september of last year
0:27:19 we tried a new recipe where we were continue to train the model on a curated part of the data that was
0:27:25 following a consistent strategy and that sort of high quality post training is what really seemed to
0:27:30 make the model work better and then the moment that that i was most excited about was the first time
0:27:37 that i saw the model flatten and fold and stack five items in a row yeah i just remember going home that
0:27:42 night and being like so excited it seemed like we had just like figured out this this big missing
0:27:48 puzzle piece so i was asking you why why might it not work or what might slow the field down and then we
0:27:53 talked about the happy shirt story but if in five years things didn’t progress as quickly as you thought
0:27:59 what what what might have happened i mentioned that i think that incorporating practice like allowing the
0:28:07 robot to practice the task should be really helpful for allowing robots to get better we don’t know what
0:28:14 exactly that recipe will look like and so it’s it’s like a research problem uh and with any sort of
0:28:19 research problem you don’t know exactly how hard the solution is going to be and i think that there are
0:28:25 some other more nuanced unknowns as well that are somewhat similar to that and we have a large number of
0:28:30 very talented researchers on our team because we think that there are some of these unsolved
0:28:37 breakthroughs that are going to be needed to like really truly solve this problem so if if it does work
0:28:45 well uh and things progress in that universe what would you be worried about
0:28:53 question i mean if things work well i i shouldn’t be too worried in general uh i do think that it’s
0:28:59 very easy in general to underestimate the challenges around actually deploying and disseminating technology
0:29:05 that takes time and when the technology doesn’t exist yet that that means that like the world is not in
0:29:11 a place that is like ready for that technology i think that there’s a lot of unknowns there i mean one of the
0:29:18 striking things to me about say language models is the people who know the most about them seem to be
0:29:23 the most worried about them which is generally not the case i think historically with technology right
0:29:30 the possible exception of the atomic bomb uh and and so i’m curious i mean those kinds of worries like
0:29:35 do you share them are there worries you have about developing a foundation model for robots
0:29:44 about bad actors using it even i do think that like yeah there’s plenty of technology that has dual uses
0:29:55 uh and i think there are applications of technologies that are harmful i think that a lot of the concerns in
0:30:03 the language model community stem from imbuing these systems with greater autonomy
0:30:13 and i think that i so i work like hands-on with the robots quite a bit and i don’t see a world in
0:30:19 which they will be taking over in any way uh it’s very easy to just like well with our current iteration
0:30:26 of robots to just like if we threw some water on it the robot wouldn’t be uh in trouble so uh that might
0:30:30 be a problem for you but we i’m sure you could solve that one we’re we’re we’re working on it so we
0:30:34 actually do have a new iteration that that is actually a lot more waterproof but this is not
0:30:40 a concern that i show okay interesting basically just because you think we can whatever turn it off
0:30:45 if we need to yeah and yeah and i i think yeah there’s always going to be dual use concerns but i
0:30:51 think that uh the the pros of the technology outweigh outweigh some of the job times well give me the happy
0:30:56 story then like in what what number of years should we choose for a happy story 10 is 10 too soon
0:31:03 i don’t want to put a number to it i think that the okay with research you don’t know exactly how long
0:31:11 things will take and i i envision a world where the when you’re developing hardware uh it’s it’s not too
0:31:18 hard to actually teach it to do something and teach it to do something useful rather than just having
0:31:25 machines that are not particularly intelligent like dishwashers and and laundry machines and so forth
0:31:32 go bigger if you would like what like what what what would people be teaching robots to do in that
0:31:38 world i guess if we were to go bigger i think that there’s a lot of challenges around helping
0:31:44 helping people as they age allowing them to be more independent i think that that’s like a huge one
0:31:49 i think that i don’t know manufacturing there’s all sorts of places where like there’s abusive labor
0:31:56 practices and we can maybe like be able to eliminate those if it’s a robot instead of a human yeah many
0:32:00 many many examples and i think that there’s also even things that are even hard to imagine because the
0:32:06 technology doesn’t exist so a lot of the things that i’m thinking about are robots helping humans and in
0:32:11 different circumstances to allow them to be more productive uh but once something exists like you
0:32:19 often like people are creative and come up with new ways of of how that’s used we’ll be back in a minute
0:32:33 run a business and not thinking about podcasting think again more americans listen to podcasts than
0:32:38 ad-supported streaming music from spotify and pandora and as the number one podcaster iheart’s twice as
0:32:43 large as the next two combined so whatever your customers listen to they’ll hear your message plus
0:32:48 only iheart can extend your message to audiences across broadcast radio think podcasting can help
0:32:55 your business think iheart streaming radio and podcasting let us show you at iheartadvertising.com
0:33:05 that’s iheartadvertising.com um great let’s finish with lightning round um what’s one thing that working
0:33:15 with robots has caused you to appreciate about the human body our skin is pretty amazing uh-huh well so
0:33:23 so we didn’t talk about uh i mean a sense of touch or of of heat or of cold right i mean presumably the
0:33:28 models you’re building the robots you’re using don’t have that but they could right they could have
0:33:35 a sense of touch is anyone working on that is that of interest to you lots of people working on it i
0:33:40 think it’s pretty interesting i think that the hardware technology is not super mature compared
0:33:45 to where i’d like for it to be in terms of how robust it is and the cheapness and the resolution
0:33:52 that said like we actually put cameras on the wrists of our robot uh to help it get some sort of tactile and
0:33:57 for example if you can if you like visually look at your finger as you make contact with an object
0:34:03 uh you can see it deform around that object and you could actually just by looking at your finger
0:34:08 get some notion of tactile feedback similar to what our skin gets yeah and cameras are cheap really
0:34:13 robust um way more robust and cheap than than existing technology for tactile sensing
0:34:21 um i’ve heard you say that humanoid robots are overrated and i’m curious why you think that
0:34:29 simplicity is really helpful and important when trying to develop technology uh when you introduce
0:34:35 more complexity than it’s needed it slows you down a lot and i think that the complexity that humanoids
0:34:41 introduce yeah i think that if all of the robots we were working with were humanoids i think that we
0:34:46 wouldn’t have made anywhere near the progress that we made because we’d be dealing with additional
0:34:51 challenges uh i also think that optimizing for ease of data collection is really important in a world
0:34:59 where we need data and it’s a lot harder to collect and operate all of the different joints and motors of
0:35:07 a humanoid than it is to control a simpler robot um do you anthropomorphize robots i hate it when people
0:35:14 anthropomorphize robots uh i think that it is misleading because the failure modes that robots
0:35:20 have are very different from the failure modes that people have and it misleads people into thinking that
0:35:27 it’s going to behave in the way that people behave uh like like in what way oh like if you see a robot
0:35:32 doing something like doing a backflip like or or even folding laundry you kind of assume that anything
0:35:35 like like if you saw a person do that then they probably could do a lot of other things too
0:35:40 and if you anthropomorphize the robot then you assume that it like the capabilities that you see
0:35:47 are representative as if it were like a human uh-huh and that it could do a backflip anywhere or that it
0:35:52 could fold laundry anywhere uh with anything any item of clothing or surely you would think a robot that
0:36:00 could do a backflip could fold a shirt but no exactly exactly so sometimes it’s fun to like assign
0:36:05 emotions to some of the things or say the robot’s having a bad day uh because certainly it feels
0:36:11 like that sometime but when it kind of moves beyond fun uh and jokes it might have consequences that i
0:36:19 don’t think make sense um i read that there was a researcher who said they would retire if a robot
0:36:27 tied a shoelace and then one of your robots tied a shoelace and i guess they didn’t retire but i’m curious
0:36:33 what would you need to see a robot do to retire
0:36:41 hmm i don’t know i guess one example that i’ve given before that i would love to see a robot do i
0:36:46 don’t think this is quite retirement level but being able to go into a kitchen that it’s never been in
0:36:54 before and make a bowl of cereal pretty basic especially compared to doing a backflip i cannot do a backflip
0:36:58 myself but i can make a bowl of cereal but it requires being able to find objects in the
0:37:04 environment being able to interact with delicate objects like a cereal box uh maybe even use tools
0:37:10 in order to open the cereal box pouring liquids yeah so that’s a task that i love and i could actually
0:37:17 even see us being able to show a demo of that without too much difficulty actually uh if we put our mind to
0:37:22 it and collected data for it so it actually is i think more within reach than than maybe i imagined
0:37:28 a few years ago just just as you’re thinking about it it’s it’s it’s getting closer you’re like oh wait
0:37:33 we could do that yeah i mean we’ve actually collected data of pouring cereal um like opening a cereal box
0:37:38 and pouring it into a bowl we haven’t yet done liquid handling uh and pouring but i think we’re actually
0:37:45 going to do it this week on the robot i asked the hardware team to to make a waterproof robot so
0:37:51 we’re not too far a lot of the pieces are coming together i also i love working with robots and so
0:37:58 and i’m also fairly young i think uh not too old uh and so i don’t imagine myself retiring anytime soon
0:38:09 chelsea finn is a stanford professor and the co-founder of physical intelligence
0:38:16 you can email us at problem at pushkin.fm and please do email us i read all the emails
0:38:23 today’s show was produced by gabriel hunter chang edited by alexandra garrettin and engineered by
0:38:27 sarah bruguer i’m jacob goldstein and we’ll be back next week with another episode of what’s your problem
0:38:40 this is an iheart podcast
AI is better than humans at a lot of things, but physical tasks – even seemingly simple ones like folding a shirt – routinely stump AI-powered robots. Chelsea Finn is a professor at Stanford and the co-founder of Physical Intelligence. Chelsea’s problem is this: Can you build an AI model that can teach any robot to do any task, anywhere?
Get early, ad-free access to episodes of What’s Your Problem? by subscribing to Pushkin+ on Apple Podcasts or Pushkin.fm. Pushkin+ subscribers can access ad-free episodes, full audiobooks, exclusive binges, and bonus content for all Pushkin shows.
Subscribe on Apple: apple.co/pushkin
Subscribe on Pushkin: pushkin.com/plus
See omnystudio.com/listener for privacy information.
Leave a Reply