#426 – Edward Gibson: Human Language, Psycholinguistics, Syntax, Grammar & LLMs

AI transcript

🕒

Việt

中文

0:00:00 The following is a conversation with Edward Gibson, or Ted, as everybody calls him. He is a
0:00:06 Psycho-Linguistics Professor at MIT. He heads the MIT Language Lab that investigates why human
0:00:12 languages look the way they do, the relationship between cultural language and how people represent,
0:00:18 process, and learn language. Also, he should have a book titled “Syntax, A Cognitive Approach”
0:00:26 published by MIT Press coming out this fall. So, look out for that.
0:00:30 And now, a quick few second mention of each sponsor. Check them out in the description.
0:00:36 It’s the best way to support this podcast. We’ve got Yahoo Finance for basically everything you’ve
0:00:41 ever needed. If you’re an investor, listening for listening to research papers, policy genius
0:00:48 for insurance, Shopify for selling stuff online, and Aidsleep for naps. Choose wisely, my friends.
0:00:55 Also, if you want to work with our amazing team, or just get in touch with me,
0:00:59 go to lexfreedmen.com/contact. And now, onto the full ad reads. As always, no ads in the middle.
0:01:05 I try to make this interesting, but if you must skip friends, please still check out the sponsors.
0:01:10 I enjoyed their stuff. Maybe you will too. This episode is brought to you by Yahoo Finance,
0:01:17 a new sponsor. And they got a new website that you should check out. It’s a website that provides
0:01:22 financial management, reports, information, and news for investors. Yahoo itself has been around
0:01:27 forever. Yahoo Finance has been around forever. I don’t know how long, but it must be over 20 years.
0:01:33 It survived so much. It evolved rapidly and quickly, adjusting, evolving, improving,
0:01:40 all of that. The thing I use it for now is there’s a portfolio that you can add your account to.
0:01:47 Ever since I had zero money, I used, boy, I think it’s called TD Ameritrade. I still use
0:01:55 that same thing. Just getting a basic mutual fund. And I think TD Ameritrade got bought
0:02:01 by Charles Schwab or acquired or merged. I don’t know. I don’t know how these things work.
0:02:05 All I know is that Yahoo Finance can integrate that and just show me everything I need to know
0:02:11 about my “portfolio.” I don’t have anything interesting going on, but it is still good.
0:02:17 To kind of monitor it, to stay in touch. Now, a lot of people I know have a lot more
0:02:24 interesting stuff going on investment-wise. So, all of that could be easily integrated
0:02:30 into Yahoo Finance. And you can look at all that stuff, the charts, blah, blah, blah. It looks
0:02:34 beautiful and sexy and just helps you be informed. Now, that’s about your own portfolio, but then
0:02:40 also for the entirety of the finance information for the entirety of the world. That’s all there.
0:02:46 The big news, the analysis of everything that’s going on, everything like that.
0:02:50 And I should also mention that I would like to do more and more financial episodes. I’ve done
0:02:55 a couple of conversations with Ray Dalio. A lot of that is about finance, but some of that is about
0:03:00 sort of geopolitics and the bigger context of finance. I just recently did a conversation with
0:03:06 Bill Ackman very much about finance. And I did a series of conversations on cryptocurrency.
0:03:14 Lots and lots of brilliant people. Michael Saylor, so on. Charles Hoskins and Vitalik,
0:03:20 and just lots of brilliant people in that space thinking about the future of money,
0:03:23 future of finance. Anyway, you can keep track of all of that with Yahoo Finance
0:03:28 for comprehensive financial news and analysis. Go to yahoofinance.com. That’s yahoofinance.com.
0:03:34 This episode is also brought to you by Listening, an app that allows you to listen to academic papers.
0:03:42 It’s the thing I’ve always wished existed. And I always kind of suspect that it’s very
0:03:47 difficult to pull off. But these guys pulled it off. Basically, it’s any kind of formatted text
0:03:54 brought to life through audio. Now for me, the thing I care about most, and I think that’s at
0:04:01 the foundation of listening, is academic papers. So I love to read academic papers. And there’s
0:04:07 several levels of rigor in the actual reading process. But listening to them, especially after
0:04:14 I skimmed it, or after I did a deep dive, listening to them, it’s just such a beautiful
0:04:20 experience. It solidifies the understanding. It brings to life all kinds of thoughts. And I’m
0:04:26 doing this while I’m cooking, while I’m running, I’m going to grab a coffee, all that kind of stuff.
0:04:33 It does require an elevated level of focus, especially the kind of papers I listen to,
0:04:39 which are computer science papers. But you can load in all kinds of stuff. You can do
0:04:43 philosophy papers, you could do psychology papers like this, very topic of linguistics.
0:04:49 I’ve listened to a few papers on linguistics. I went back to Chomsky and listened to papers.
0:04:53 It’s great. Papers, books, PDFs, webpages, articles, all that kind of stuff, even email
0:04:57 newsletters. And the voices they got are pretty sexy. It’s great. It’s pleasant to listen to.
0:05:03 I think that’s what’s ultimately what most important is it shouldn’t feel like a chore
0:05:08 to listen to it. Like I really enjoy it. Normally, you’d get a two week free trial,
0:05:13 but listeners of this podcast get one month free. So go to listening.com/lex. That’s listening.com/lex.
0:05:20 This episode is brought to you by Policy Genius, a marketplace for insurance,
0:05:26 life, auto, home, disability, all kinds of insurance. There’s really nice tools for comparison.
0:05:32 I’m a big fan of nice tools for comparison. Like I have to travel to harsh conditions
0:05:39 soon and I had to figure out how I need to update my equipment to make sure it’s weatherproof,
0:05:46 waterproof even. It’s just resilient to harsh conditions. And it would be nice to have
0:05:53 sort of comparisons. I have to resort to like Reddit posts or forum posts, kind of debating
0:06:00 different audio quarters and cabling and microphones and waterproof containers, all
0:06:06 that kind of stuff. I would love to be able to do like a rigorous comparison of them. Of course,
0:06:11 going to Amazon, you get the reviews and those are actually really, really solid. I saw I think
0:06:17 Amazon has been the giant gift of society in that way, that you kind of can lay out all the
0:06:22 different options and get a lot of structured analysis of how good this thing is. So Amazon
0:06:31 has been great at that. Now, what Policy Genius did is the Amazon thing, but for insurance. So the
0:06:38 tools for comparison is really my favorite thing. It’s just really easy to understand. The full
0:06:43 marketplace of insurance. With Policy Genius, you can find life insurance policies that start at just
0:06:49 $292 per year for $1 million of coverage at the policygenius.com/lex or click the link
0:06:57 in the description to get your free life insurance quotes and see how much you can save.
0:07:01 That’s policygenius.com/lex. This episode is also brought to you by Shopify,
0:07:08 a platform designed for anyone to sell anywhere with a great looking online store. I’m not name
0:07:15 dropping here, but I recently went on a hike with the CEO of Shopify, Toby. He’s brilliant.
0:07:22 I’ve been a fan of his for a long time, long before Shopify was a sponsor. I don’t even know
0:07:28 if he knows that Shopify sponsors this podcast. Now, just to clarify, it really doesn’t matter.
0:07:35 Nobody in this world can put pressure on me to have a sponsor, not to have a sponsor,
0:07:40 or for a sponsor to put pressure on me what I can and can’t say. I, when I wake up in the morning,
0:07:46 feel completely free to say what I want to say and to think what I want to think.
0:07:52 I’ve been very fortunate in that way in many dimensions of my life, and I also have always
0:07:58 lived a frugal life in a life of discipline, which is where the freedom of speech and the
0:08:05 freedom of thought truly comes from. I don’t need anybody. I don’t need a boss. I don’t need
0:08:10 money. I’m free to exist in this world in the way I see is right now. On top of that, of course,
0:08:16 I’m surrounded by incredible people, many of whom I disagree with and have arguments. So
0:08:21 I’m influenced by those conversations and those arguments that I’m always learning,
0:08:25 always challenging myself, always humbling myself. I have kind of intellectual humility.
0:08:31 I kind of suspect I’m kind of an idiot. I start my approach to the world of ideas from that place.
0:08:40 Assuming I’m an idiot and everybody has a lesson to teach me. Anyway, not sure why I
0:08:45 got on and off that tangent, but the hike was beautiful. Nature, friends, is beautiful. Anyway,
0:08:52 I have a Shopify store, lexfreedman.com/store. It’s very minimal, which is how I like, I think,
0:08:59 most things. If you want to set up a store, it’s super easy. It takes a few minutes,
0:09:05 even I figured out how to do it. Sign up for a $1 per month trial period at Shopify.com/lex.
0:09:11 That’s all lowercase. Go to Shopify.com/lex to take your business to the next level today.
0:09:16 This episode is also brought to you by A Sleep and it’s part of the three cover,
0:09:21 the source of my escape, the door when opened allows me to travel away from the troubles of the
0:09:29 world into this ethereal universe of calmness, a cold bed surface with a warm blanket, a perfect
0:09:38 20 minute nap. It doesn’t matter how dark the place my mind is in, a nap will pull me out
0:09:47 and I see the beauty of the world again. Technologically speaking, A Sleep is just really
0:09:53 cool. You can control temperature with an app. It’s become such an integral part of my life that
0:10:00 I have begun to take it for granted. Typical human. The app controls the temperature. I said it.
0:10:08 Currently, I’m setting it to a -5. It’s just a super nice cool surface. It’s something I really
0:10:14 look forward to, especially when I’m traveling. I don’t have one of those. It really makes me
0:10:20 feel like home. Check it out and get special savings when you go to atesleep.com/lex.
0:10:26 This is the Lex Riemann podcast. To support it, please check out our sponsors
0:10:31 in the description. And now, dear friends, here’s Edward Gibson.
0:10:36 [Music]
0:10:46 When did you first become fascinated with human language?
0:10:56 As a kid in school, when we had to structure sentences in English grammar, I found that
0:11:04 process interesting. I found it confusing as to what it was I was told to do. I didn’t
0:11:10 understand what the theory was behind it, but I found it very interesting.
0:11:14 So when you look at grammar, you’re almost thinking about like a puzzle,
0:11:17 like almost like a mathematical puzzle? Yeah, I think that’s right. I didn’t know I was going to
0:11:21 work on this at all at that point. I was really just, I was kind of a math geek person, computer
0:11:26 scientist. I really liked computer science. And then I found language as a neat puzzle to work on
0:11:33 from an engineering perspective. Actually, as I sort of accidentally, I decided after I finished
0:11:42 my undergraduate degree, which was computer science and math and Canada and Queens University,
0:11:46 I decided to go to grad school. That’s what I always thought I would do. And I went to Cambridge,
0:11:53 where they had a master’s in a master’s program in computational linguistics.
0:11:57 And I hadn’t taken a single language class before. All I’d taken was CS, computer science,
0:12:03 math classes, pretty much mostly as an undergrad. And I just thought this was an interesting thing
0:12:08 to do for a year, because it was a single year program. And then I ended up spending my whole
0:12:14 life doing it. So fundamentally, your journey through life was one of a mathematician and
0:12:19 computer scientist. And then you kind of discovered the puzzle, the problem of language and approached
0:12:25 it from that angle to try to understand it from that angle, almost like a mathematician or maybe
0:12:32 even an engineer. As an engineer, I’d say, I mean, to be frank, I had taken an AI class,
0:12:38 I guess it was 83 or 85, somewhere 84 in there a long time ago. And there was a natural language
0:12:43 section in there. And it didn’t impress me. I thought there must be more interesting things
0:12:48 we can do. It didn’t seem very, it seemed just a bunch of hacks to me. It didn’t seem like a real
0:12:56 theory of things in any way. And so I just thought this was, this seemed like an interesting area
0:13:01 where there wasn’t enough good work. Did you ever come across like the philosophy angle of logic?
0:13:07 So if you think about the 80s with AI, the expert systems, where you try to kind of
0:13:11 maybe sidestep the poetry of language and some of the syntax and the grammar and all that kind
0:13:18 of stuff and go to the underlying meaning that language is trying to communicate and try to
0:13:23 somehow compress that in a computer-representable way. Do you ever come across that in your studies?
0:13:29 I mean, I probably did, but I wasn’t as interested in it. I was trying to do the
0:13:34 easier problems first, the ones I could thought maybe were handleable, which is seems like the
0:13:40 syntax is easier, which is just the forms as opposed to the meaning. When you’re starting
0:13:44 talking about the meaning, that’s a very hard problem. And it still is a really, really hard
0:13:48 problem. But the forms is easier. And so I thought at least figuring out the forms of human language,
0:13:55 which sounds really hard, but is actually maybe more tractable.
0:13:59 So it’s interesting. You think there is a big divide, there’s a gap, there’s a distance between
0:14:05 form and meaning. Because that’s a question you have discussed a lot with LMS, because they’re
0:14:12 damn good at form. Yeah, I think it’s what they’re good at, is form. Exactly. And that’s why they’re
0:14:16 good, because they can do form, meanings are. Do you think there’s, oh, wow. I mean, it’s an
0:14:21 open question, right? How close form and meaning are. We’ll discuss it. But to me, studying form,
0:14:28 maybe it’s a romantic notion, gives you form is like the shadow of the bigger meaning thing
0:14:36 underlying language. Language is how we communicate ideas. We communicate with each other using
0:14:44 language. So in understanding the structure of that communication, I think you start to understand
0:14:50 the structure of thought and the structure of meaning behind those thoughts and communication.
0:14:55 To me. But to you, big gap. Yeah. What do you find most beautiful about human language?
0:15:02 Maybe the form of human language, the expression of human language.
0:15:07 What I find beautiful about human language is some of the generalizations that happen
0:15:14 across the human language, just within and across a language. So let me give you an example of
0:15:18 something which I find kind of remarkable, that is if like a language, if it has a word order
0:15:26 such that the verbs tend to come before their objects. And so that’s like English does that.
0:15:31 So we have the first, the subject comes first in a simple sentence. So I say, you know, the
0:15:37 dog chased the cat or Mary kicked the ball. So the subjects first, and then after the subject,
0:15:43 there’s the verb. And then we have objects. All these things come after in English. So it’s
0:15:48 generally a verb. And most of the stuff that we want to say comes after the subject. It’s the
0:15:53 objects. There’s a lot of things we want to say they come after. And there’s a lot of languages
0:15:57 like that. About 40% of the languages of the world look like that. They’re subject verb object
0:16:03 languages. And then these languages tend to have prepositions, these little markers on the nouns
0:16:12 that connect nouns to other nouns or nouns to verbs. So when I say a preposition like in or on
0:16:19 or of or about, I say I talk about something. The something is the object of that preposition that
0:16:25 we have these little markers come also just like verbs, they come before their nouns. Okay. And then
0:16:32 so now we look at other languages that like Japanese or or Hindi or some these are these are
0:16:37 so called verb final languages. Those is about maybe a little more than 40%. Maybe 45% of the
0:16:44 world’s languages are more I mean 50% of the world’s languages are verb final. Those tend to be
0:16:49 post positions, those markers, the same we have the states have the same kinds of markers
0:16:55 as we do in English, but they put them after. So sorry, they put them first, the markers come
0:17:01 first. So you say instead of, you know, talk about a book, you say a book about the opposite
0:17:09 order there in Japanese or in Hindi, you do the opposite and the and the talk comes at the end.
0:17:15 So the verb will come at the end as well. So instead of Mary kicked the ball, it’s Mary ball
0:17:21 kicked. And then if it’s Mary kicked the ball to John, it’s John to the to the marker there,
0:17:29 the preposition, it’s a post position in these languages. And so the interesting thing fascinating
0:17:33 thing to me is that within a language, this order aligns, it’s harmonic. And so if it’s one or the
0:17:43 other, it’s either verb initial or verb final, but then you then you’ll have prepositions,
0:17:48 prepositions or post positions. And so that and that’s across the languages that we we can look at,
0:17:54 we’ve got around 1000 languages for there’s around 7000 languages around on the on the earth right
0:17:58 now. But we have information about say word order on around 1000 of those pretty decent amount of
0:18:06 information. And for those 1000, which we know about, about 95% fit that pattern. So they will
0:18:13 have either verb and it’s about it’s about half and half or half of verb initial, like English and
0:18:17 half of verb final, like, like Japanese suggest to clarify verb initial is subject verb object.
0:18:24 That’s correct. Verb final is still subject, object verb. That’s correct. Yeah, the subject
0:18:30 is generally first. That’s so fascinating. I ate an apple or I apple eight. Yes. Okay. And it’s
0:18:38 fascinating that there’s a pretty even division in the world amongst those 4045%. Yeah, it’s
0:18:43 pretty, it’s pretty even. And those two are the most common by far. Those two orders, the subject
0:18:48 tends to be first. There’s so many interesting things. But these things are what thing I find
0:18:52 so fascinating is there are these generalizations within and across a language. And not only those
0:18:57 are the and there’s actually a simple explanation, I think, for a lot of that. And that is,
0:19:03 you’re trying to like, minimize dependencies between words. That’s basically the story,
0:19:10 I think behind a lot of why word order looks the way it is, is you were always connecting.
0:19:16 What is it? What is the thing I’m telling you? I’m talking to you in sentences, you’re talking
0:19:19 to me in sentences. These are sequences of words, which are connected. And the connections are
0:19:25 dependencies between the words. And it turns out that what we’re trying to do in a language is
0:19:31 actually minimize those dependency links. It’s easier for me to say things if the words that are
0:19:37 connecting for their meaning are close together. It’s easier for you in understanding if that’s
0:19:42 also true. If they’re far away, it’s hard to produce that and it’s hard for you to understand.
0:19:48 And the languages of the world within a language and across languages fit that generalization,
0:19:53 which is, so it turns out that having verbs initial and then having prepositions ends up
0:20:01 making dependencies shorter. And having verbs final and having post positions ends up making
0:20:07 dependencies shorter than if you cross them. If you cross them, it ends up, you just end up,
0:20:10 it’s possible, you can do it. You mean within a language? Within a language, you can do it.
0:20:15 It just ends up with longer dependencies than if you didn’t. And so languages tend to go that way.
0:20:20 They tend to, they call it harmonic. So it was observed a long time ago without the explanation
0:20:27 by a guy called Joseph Greenberg, who’s a famous typologist from Stanford. He observed a lot of
0:20:34 generalizations about how word order works. And these are some of the harmonic generalizations
0:20:38 that he observed. Harmonic generalizations about word order. There’s so many things I want to ask
0:20:44 you. Let me just, sometimes basics, you mentioned dependencies a few times. What do you mean by
0:20:50 dependencies? Well, what I mean is, in language, there’s kind of three structures to, three components
0:20:58 to the structure of language. One is the sounds. So cat is k-a-t-t in English. I’m not talking
0:21:04 about that part. I’m talking, then there’s two meaning parts. And those are the words. And
0:21:09 you were talking about meaning earlier. So words have a form and they have a meaning associated
0:21:13 with them. And so cat is a full form in English, and it has a meaning associated with whatever a cat
0:21:17 is. And then the combinations of words, that’s what I’ll call grammar or syntax. And that’s like
0:21:25 when I have a combination like the cat or two cats, okay? So where I take two different words
0:21:32 there and put them together, and I get a compositional meaning from putting those two different words
0:21:36 together. And so that’s the syntax. And in any sentence or utterance, whatever I’m talking to
0:21:43 you, you’re talking to me, we have a bunch of words and we’re putting them together in a sequence.
0:21:46 It turns out they are connected so that every word is connected to just one other word
0:21:54 in that sentence. And so you end up with what’s called technically a tree. It’s a tree structure.
0:22:00 So there’s a root of that utterance of that sentence. And then there’s a bunch of
0:22:06 dependence, like branches from that root that go down to the words. The words are the leaves in
0:22:12 this metaphor for a tree. So a tree is also sort of a mathematical construct. Yeah, it’s a graph
0:22:17 theoretical thing. Exactly. Yeah. So it’s fascinating that you can break down a sentence into a tree
0:22:24 and then when every word is hanging on to another, it’s depending on it. That’s right. And everyone
0:22:28 agrees on that. So all linguists will agree with that. No one is not controversial. That is not
0:22:32 controversial. There’s nobody sitting here. I do nothing mad at you. I don’t think so.
0:22:36 Okay. There’s no linguists sitting there mad at this. No, I think in every language,
0:22:39 I think everyone agrees that all sentences are trees at some level. Can I pause on that? Sure.
0:22:46 Because it’s to me, just as a layman, it’s surprising that you can break down sentences
0:22:54 in mostly all languages into a tree. I think so. I’ve never heard of anyone disagreeing with that.
0:23:01 That’s weird. The details of the trees are what people disagree about.
0:23:05 Well, okay. So what’s the root of a tree? How do you construct? How hard is it? What is the
0:23:10 process of constructing a tree from a sentence? Well, this is where, depending on what your
0:23:16 there’s different theoretical notions, I’m going to say the simplest thing, dependency grammar.
0:23:21 It’s like a bunch of people invented this. Tinier was the first French guy back in,
0:23:25 I mean, the paper was published in 1959, but he was working on the 30s and stuff.
0:23:29 So, and it goes back to, you know, philologist Pinini was doing this in ancient India. Okay.
0:23:37 And so, you know, doing something like this, the simplest thing we can think of is that there’s
0:23:42 just connections between the words to make the utterance. And so let’s just say I have like two
0:23:47 dogs entered a room. Okay. Here’s a sentence. And so we’re connecting two and dogs together.
0:23:55 That’s like, there’s some dependency between those words to make some bigger meaning.
0:23:58 And then we’re connecting dogs now to entered, right? And we connect a room somehow to entered.
0:24:06 And so I’m going to connect to room and then room back to entered. That’s the tree is I,
0:24:11 the root is entered. That’s the thing is like an entering event. That’s what we’re saying here.
0:24:15 And the subject, which is whatever that dog is, is two dogs, it was. And the connection goes back
0:24:21 to dogs, which goes back to them, then that goes back to two. I’m just, that’s my tree.
0:24:26 It starts at entered, goes to dogs down to two. And then the other side, after the verb,
0:24:32 the object, it goes to room. And then that goes back to the determiner or article,
0:24:37 whatever you want to call that word. So there’s a bunch of categories of words here we’re
0:24:40 noticing. So there are verbs. Those are these things that typically mark,
0:24:45 they refer to events and states in the world. And they’re nouns, which typically refer to
0:24:50 people, places and things is what people say, but they can refer to other more,
0:24:54 they can refer to events themselves as well. They’re marked by, you know, how they, how they,
0:25:00 you, the category, the part of speech of a word is how it gets used in language.
0:25:04 It’s like, that’s how you decide what the, what the category of a word is, not, not by the meaning,
0:25:09 but how it’s, how it gets used. How it’s used. What’s usually the root? Is it going to be the
0:25:15 verb that defies the event? Usually. Yes. Yes. Okay. Yeah. I mean, if I don’t say a verb,
0:25:20 then there won’t be a verb until it’ll be something else. What if you’re messing,
0:25:23 are we talking about language that’s like correct language? What if you’re doing
0:25:26 poetry and messing with stuff? Is it then, then rules got the window, right? Then it’s, no,
0:25:31 you’re still, no, no, no, no, no. You’re constrained by whatever language you’re dealing
0:25:35 with. Probably you have other constraints in poetry, such that you’re like usually in poetry,
0:25:40 there’s multiple constraints that you want to, like you want to usually convey multiple
0:25:44 meanings is the idea. And maybe you have like a rhythm or a rhyming structure as well. And
0:25:48 depending on, so, but you usually are constrained by your, the rules of your language for the most
0:25:54 part. And so you don’t violate those too much. You can violate them somewhat, but not too much.
0:26:00 So it has to be recognizable as your language. Like in English, I can’t say dogs to entered
0:26:06 room. I mean, I meant that, you know, two dogs entered a room and I can’t
0:26:11 mess with the order of the, the articles, the articles and the nouns. You just can’t do that.
0:26:17 In some languages, you can, you can mess around with the order of words much more. I mean, you
0:26:22 speak Russian. Russian has a much freer word order than English. And so in fact, you can move
0:26:27 around words in, you know, I told you that English has the subject verb object word order. So does
0:26:32 Russian, but Russian is much freer than English. And so you can actually mess around with the
0:26:37 word order. So probably Russian poetry is going to be quite different from English poetry because
0:26:42 the word order is much less constrained. Yeah, there’s a much more extensive culture of poetry
0:26:48 throughout the history of the last 100 years in Russia. And I always wondered why that is,
0:26:54 but it seems that there’s more flexibility in the way the language is used. There’s more,
0:26:59 you’re more female language easier by altering the words, altering the order of the words,
0:27:04 messing with it. Well, you can just mess with different things in each language. And so in
0:27:08 Russian, you have case markers, right? On the end, which is these endings on the nouns,
0:27:13 which tell you how it connects each noun connects to the verb, right? We don’t have that in English.
0:27:17 And so when I say, Mary kissed John, I don’t know who the agent or the patient is,
0:27:24 except by the order of the words, right? In Russian, you actually have a marker on the
0:27:28 end. If you’re using a Russian name and each of those names, you’ll also say, is it, you know,
0:27:32 agent, it’ll be the, you know, nominative, which is marking the subject or an accusative will mark
0:27:37 the object. And you could put them in the reverse order, you could put accusative first as you could
0:27:43 put subject, you could put the patient first, and then the verb, and then the, the, the subject,
0:27:49 and that would be a perfectly good Russian sentence. And it would still mean, Mary, I could
0:27:53 say John kissed Mary meaning Mary kissed John, as long as I use the case markers in the right way,
0:27:59 you can’t do that in English. And so I love the terminology of agent and patient and
0:28:04 and the other ones you use. Those are sort of linguistic terms, correct? Those are, those are
0:28:09 for like kind of meaning. Those are meaning and in subject and object are generally used for
0:28:14 position. So subject is just like the thing that comes before the verb and the object is
0:28:18 when it comes after the verb. The agent is kind of like the thing doing it. That’s kind of what
0:28:22 that means, right? The subject is often the person doing the action, right? The thing. So yeah.
0:28:28 Okay. This is fascinating. So how hard is it to form a tree in general? Is there,
0:28:31 is there a procedure to it? Like if you look at different languages, is it supposed to be a very
0:28:37 natural, like is it automatable or is there some human genius involved in? Because I think it’s
0:28:41 pretty automatable at this point. People can figure out the words are they figure out the morphemes,
0:28:45 which are the technically morphemes are the, the minimal meaning units within a language. Okay.
0:28:50 And so when you say eats or drinks, it actually has two morphemes in an English. There’s the,
0:28:55 there’s the root, which is the verb. And then there’s some ending on it, which tells you,
0:28:59 you know, that’s this third person, third person singular, say what morphemes are morphemes are
0:29:04 just the minimal meaning units within a language. And then a word is just kind of the things we
0:29:08 put spaces between in English and 10, they have a little bit more, they have the morphology as well.
0:29:12 They have the endings, this inflectional morphology on the endings on the roots.
0:29:16 They modify something about the word that adds additional meaning.
0:29:19 They tell you, yeah, yeah. And so we have a little bit of that in English, very little,
0:29:22 you have much more in Russian, for instance. And, and, but we have a little bit in English.
0:29:26 And so we have a little on the, on the nouns, you can say it’s either singular or plural.
0:29:30 And, and you can say, same thing for, for, for verbs, like simple past tense, for example,
0:29:36 it’s like, you know, notice in English, we say drinks, you know, he drinks, but everyone else
0:29:40 says, I drink, you drink, we drink, it’s unmarked in a way. And then, but in the past tense, it’s
0:29:45 just drank there for everyone. There’s no morphology at all for past tense. There is morphology,
0:29:50 it’s marking past tense, but it’s kind of, it’s an irregular now. So we don’t even, you know,
0:29:54 it drink to drink, you know, it’s not even a regular word. So in most verbs, many verbs,
0:29:59 there’s an ED, we kind of add, so walk to walked, we add that to say it’s the past tense,
0:30:03 that I just happened to choose an irregular because the high frequency word and the high
0:30:07 frequency words tend to have irregular as an English for.
0:30:09 What’s an irregular? Irregular is just, there’s, there isn’t a rule. So drink to drink is an,
0:30:14 it’s an irregular. Drink, drink, okay, as opposed to walk, walked, talked, talked.
0:30:19 And there’s a lot of irregular, irregular as in English. There’s a lot of irregular as in
0:30:23 English. The, the, the frequent ones, the common words tend to be irregular. There’s many, many
0:30:28 more low frequency words and those tend to be, those are regular ones.
0:30:32 The evolution of the irregular is fascinating because it’s essentially slang that’s sticky
0:30:36 because you’re breaking the rules and then everybody uses it and doesn’t follow the rules.
0:30:41 And they, they say screw it to the rules. It’s fascinating. So you said it, morphemes,
0:30:46 lots of questions. So morphology is what, the study of morphemes?
0:30:50 Morphology is the, is the connections between the morphemes onto the roots, the roots. So in
0:30:54 English, we mostly have suffixes. We have endings on the words, not very much, but a little bit.
0:30:59 And as opposed to prefixes, some words, depending on your language, can have,
0:31:04 you know, mostly prefixes, mostly suffixes or mostly, or both. And then even languages,
0:31:10 several languages have things called infixes where you have some kind of a general
0:31:16 form for the, for the root and you put stuff in the middle. You change the vowels.
0:31:22 That’s fascinating. That’s fascinating. So in general, there’s what, two morphemes per word,
0:31:29 usually one or two or three? Well, in English, it’s one or two. In English,
0:31:33 it tends to be one or two. There can be more. You know, in other languages, you know, language,
0:31:37 language like, like Finnish, which has a very elaborate morphology, there may be
0:31:43 10 morphemes on the end of a root. Okay. And so there may be millions of forms of a given word.
0:31:49 Okay. Okay. I will ask the same question over and over. But
0:31:53 how does the, just sometimes to understand things like morphemes, it’s nice to just ask
0:32:02 the question, how does these kinds of things evolve? So you have a great book studying sort of the,
0:32:09 how, how the cognitive processing, how language used for communication,
0:32:16 so the mathematical notion of how effective a language is for communication, what role that
0:32:20 plays in the evolution of language. But just high level, like how do we, how does a language evolve
0:32:26 with where English has two morphemes or one or two morphemes per word and then Finnish has
0:32:32 infinity per word? So what, how does that, how does that happen? Is it just people?
0:32:38 That’s a really good question. Yeah. That’s a very good question is like,
0:32:41 why do languages have more morphology versus less morphology? And I don’t think we know the
0:32:47 answer to this. I don’t, I think there’s just like a lot of good solutions to the problem of
0:32:52 communication. And so like, I believe, as you hinted that language is an invented system by humans
0:33:00 for communicating their ideas. And I think we, it comes down to, we label the things we want to
0:33:05 talk about. Those are the morphemes and words. Those are the things we want to talk about in the
0:33:09 world and we invent those things. And then we put them together in ways that are easy for us
0:33:15 to convey, to process. But that’s like a naive view. And I don’t, I mean, I, I think it’s probably
0:33:21 right, right? It’s naive and probably right. I don’t know if it’s naive. I think it’s simple.
0:33:26 Simple. Yeah. I think naive is, naive is an indication that it’s an incorrect somehow,
0:33:31 it’s a trivial to too simple. I think it could very well be correct. But it’s interesting how
0:33:37 sticky it feels like two people got together. It’s just, it just feels like once you figure out
0:33:44 certain aspects of a language that just becomes sticky and the tribe forms around that language,
0:33:49 maybe the language, maybe the tribe forms first and then the language evolves. And then you just
0:33:53 kind of agree and you stick to whatever that is. I mean, these are very interesting questions. We
0:33:57 don’t know really about how words, even words get invented very much about, you know, we don’t
0:34:04 really, I mean, assuming they get invented, they, we don’t really know how that process
0:34:09 works and how these things evolve. What we have is kind of a current picture, a current picture of
0:34:17 a few thousand languages, a few thousand instances. We don’t have any pictures of really how these
0:34:23 things are evolving really. And then the evolution is massively, you know, confused by contact,
0:34:31 right? So as soon as one language group, one group runs into another, we are smart, humans are
0:34:38 smart, and they take on whatever is useful in the other group. And so any kind of contrast,
0:34:44 which you’re talking about, which I find useful, I’m going to, I’m going to start using as well.
0:34:48 So I worked a little bit in specific areas of words in number words and in color words and in
0:34:56 color words. So we have in English, we have around 11 words that everyone knows for colors.
0:35:02 And many more, if you happen to be interested in color for some reason or other, if you’re a
0:35:09 fashion designer or an artist or something, you may have many, many more words. But we can see
0:35:14 millions. Like if you have normal color vision, normal trichrometric color vision, you can see
0:35:19 millions of distinctions in color. So we don’t have millions of words. You know, the most efficient,
0:35:24 no, the most detailed color vocabulary would have over a million terms to distinguish all
0:35:30 the different colors that we can see. But of course, we don’t have that. So it’s somehow,
0:35:34 it’s been, it’s kind of useful for English to have evolved in some way to, there’s 11 terms
0:35:41 that people find useful to talk about, you know, black, white, red, blue, green, yellow, purple,
0:35:48 gray, pink, and I probably missed something there. Anyway, there’s 11 that everyone knows.
0:35:53 But you go to different cultures, especially the non-industrialized cultures, and there’ll be
0:36:00 many fewer. So some cultures will have only two, believe it or not, that the Danai in Papua New Guinea
0:36:07 have only two labels that the group uses for color. And those are roughly black and white.
0:36:12 They are very, very dark and very, very light, which are roughly black and white. And you might
0:36:16 think, oh, they’re dividing the whole color space into, you know, light and dark or something.
0:36:21 That’s not really true. They mostly just only label the light, the black and the white things.
0:36:25 They just don’t talk about the colors for the other ones. And so, and then there’s other groups,
0:36:29 I’ve worked with a group called the Chimani down in, in Bolivia, in South America, and they have
0:36:35 three words that everyone knows, but there’s a few others that are, that, that several people,
0:36:42 that many people know. And so they have me, it’s kind of depending on how you count between
0:36:47 three and seven words that the group knows. Okay. And again, they’re black and white,
0:36:53 everyone knows those. And red, red is, you know, like that tends to be the third word that everyone,
0:36:59 that cultures bring in. If there’s a word, it’s always red, the third one. And then after that,
0:37:03 it’s kind of all bets are off about what they bring in. And so after that, they bring in a sort
0:37:08 of a big blue, green group, group, they have one for that. And then they have, and then,
0:37:14 you know, different people have different words that they’ll use for other parts of the space.
0:37:18 And so anyway, it’s probably related to what they want to talk, what they, not what they,
0:37:24 not what they see, because they see the same colors as we see. So it’s not like they have,
0:37:28 they don’t, they have a weak, a low color palette and the things they’re looking at. They’re looking
0:37:34 at a lot of beautiful scenery. Okay. A lot of different colored flowers and berries and things.
0:37:42 And, you know, and so there’s lots of things of very bright colors, but they just don’t label
0:37:46 the color in those cases. And the reason probably we don’t know this, but we think probably what’s
0:37:52 going on here is that what you do, why you label something is you need to talk to someone else
0:37:57 about it. And why do I need to talk about a color? Well, if I have two things which are identical
0:38:02 and I want you to give me the one that’s different and in the only way it varies is color,
0:38:08 then I invent a word which tells you, you know, this is the one I want. So I want the red sweater
0:38:13 off the rack, not the, not the green sweater, right? There’s two. And so those, those things will
0:38:17 be identical, because these are things we made and they’re dyed and there’s nothing different
0:38:21 about them. And so in, in industrialized society, we have, you know, everything, everything we’ve
0:38:27 got is pretty much arbitrarily colored. But if you go to a non-industrialized group, that’s not
0:38:32 true. And so they don’t, suddenly they’re not interested in color. If you bring bright colored
0:38:37 things to them, they like them just like we like them. Bright colors are great. They’re beautiful.
0:38:42 They are, but they just don’t need to, no need to talk about them. They don’t have.
0:38:46 So probably color words is a good example of how language evolves from sort of function
0:38:52 when you need to communicate the use of something. I think so. Then you kind of invent different
0:38:57 variations. And, and basically, you can imagine that the evolution of a language has to do with
0:39:03 what the early tribes doing, like what, what they want it, what, what kind of problems they’re
0:39:07 facing them. And they’re quickly figuring out how to efficiently communicate the solution to those
0:39:12 problems, whether it’s aesthetic or functional, all that kind of stuff, running away from a
0:39:17 mammoth or whatever. But you know, it’s, so I think what you’re pointing to is that we don’t have
0:39:22 data on the evolution of language, because many languages have formed a long time ago,
0:39:27 so you don’t get the chatter. We have a little bit of like old English to modern English,
0:39:33 because there was a writing system, and we can see how old English looked. So the word order
0:39:39 changed, for instance, in old English to middle English to modern English. And so it, you know,
0:39:42 we can see things like that, but most languages don’t even have a writing system. So of the
0:39:47 7000, only, you know, a small subset of those have a writing system. And even if they have a
0:39:52 writing system, they, it’s not a very modern writing system. And so they don’t have it. So we
0:39:56 just basically have for Mandarin, for Chinese, we have a lot of, a lot of evidence from, from,
0:40:02 for long time and for English, and not for much else, not from in German a little bit,
0:40:06 but not for a whole lot of like long term language evolution. We don’t have a lot.
0:40:11 Well, you get snapshots is what we’ve got of current languages.
0:40:14 Yeah, you get an inkling of that from the rapid communication on certain platforms,
0:40:19 like on Reddit, there’s different communities, and they’ll come up with different slang,
0:40:23 usually from my perspective, German by a little bit of humor, or maybe mockery or whatever it’s,
0:40:29 you know, just talking shit in different kinds of ways. And you could see the evolution
0:40:35 of language there. Because I think a lot of things on the internet, you don’t want to be the
0:40:43 boring mainstream. So you like want to deviate from the proper way of talking.
0:40:50 And so you get a lot of deviation, like rapid deviation, then when communities collide,
0:40:55 you get like, just like you said, humans adapt to it. And you can see it through the
0:41:00 lungs of humor. I mean, it’s very difficult to study, but you can imagine like 100 years from
0:41:04 now, well, if there’s a new language born, for example, we’ll get really high resolution data.
0:41:09 I mean, English is changing. English changes all the time. All languages change all the time.
0:41:14 So, you know, there’s a famous result about the Queen’s English. So if you look at the Queen’s
0:41:21 vowels, the Queen’s English is supposed to be, you know, originally the proper way for the talk
0:41:26 was sort of defined by whoever the Queen talked, or the King, whoever was in charge. And so if
0:41:32 you look at how her vowels changed from when she first became Queen in 1952 or ’53, when she was
0:41:39 currently the first, I mean, that’s Queen Elizabeth, who died recently, of course, until,
0:41:44 you know, 50 years later, her vowels changed, her vowels shifted a lot. And so that, you know,
0:41:49 even in the sounds of British English, in her, the way she was talking was changing,
0:41:54 the vowels were changing slightly. So that’s just in the sounds there’s changed. I don’t know what’s,
0:41:59 you know, we’re, I’m interested. We’re all interested in what’s driving any of these
0:42:03 changes. The word order of English changed a lot over a thousand years, right? So it used to look
0:42:08 like German. You know, it used to be a verb final language with case marking, and it shifted
0:42:14 to a verb-medial language, a lot of contact. So a lot of contact with French. And it became
0:42:19 verb-medial language with no case marking. And so it became this, you know, verb, verb-initially
0:42:25 thing. So and so that’s evolving. It totally evolved. And so it may very well, I mean, you know,
0:42:30 it doesn’t evolve maybe very much in 20 years is maybe what you’re talking about. But over 50
0:42:35 and 100 years, things change a lot, I think. We’ll now have good data on it, which is great.
0:42:39 That’s for sure. Can you talk to what is syntax and what is grammar? So you wrote a book on syntax.
0:42:45 I did. You were asking me before about what, you know, how do I figure out what a dependency
0:42:49 structure is? I’d say the dependency structures aren’t that hard. Generally, I think it’s a lot
0:42:54 of agreement of what they are for almost any sentence in most languages. I think people will
0:42:59 agree on a lot of that. There are other parameters in the mix such that some people think there’s a
0:43:06 more complicated grammar than just a dependency structure. And so, you know, like Noam Tromsky,
0:43:11 he’s the most famous linguist ever. And he is famous for proposing a slightly more complicated
0:43:19 syntax. And so he invented phrase structure grammar. So he’s well known for many, many
0:43:26 things. But in the 50s, in the early 60s, like the late 50s, he was basically figuring out what’s
0:43:31 called formal language theory. So, and he figured out sort of a framework for figuring out how
0:43:38 complicated language, you know, a certain type of language might be so-called phrase structure
0:43:43 grammars of language might be. And so his idea was that maybe we can think about the complexity
0:43:52 of a language by how complicated the rules are. And the rules will look like this. They will have
0:43:59 a left-hand side and they’ll have a right-hand side. Something on the left-hand side will expand
0:44:04 to the thing on the right-hand side. So we’ll say we’ll start with an S, which is like the root,
0:44:08 which is a sentence. And then we’re going to expand to things like a noun phrase and a verb
0:44:14 phrase is what he would say, for instance. An S goes to an NP and a VP is a kind of a phrase
0:44:19 structure rule. And then we figure out what an NP is. An NP is a determiner and a noun, for instance.
0:44:25 And verb phrase is something else, is a verb and another noun phrase and another NP, for instance.
0:44:30 Those are the rules of a very simple phrase structure. And so he proposed phrase structure
0:44:37 grammar as a way to sort of cover human languages. And then he actually figured out that, well,
0:44:42 depending on the formalization of those grammars, you might get more complicated or less complicated
0:44:47 languages. So he said, well, these are things called context-free languages, that rule that he
0:44:54 thought human languages tend to be what he calls context-free languages. But there are simpler
0:45:00 languages, which are so-called regular languages, and they have a more constrained form to the rules
0:45:05 of the phrase structure of these particular rules. So he basically discovered and kind of invented
0:45:12 ways to describe the language. And those are phrase structure, a human language. And he was
0:45:19 mostly interested in English initially in his work in the ’50s.
0:45:22 So quick questions around all this. So formal language theory is the big field of just studying
0:45:27 language formally. Yes. And it doesn’t have to be human language there. We can have computer
0:45:31 languages, any kind of system which is generating some set of expressions in a language. And those
0:45:41 could be like the statements in a computer language, for example. So it could be that
0:45:48 or it could be human language. So technically, you can study programming languages?
0:45:52 Yes. And have been heavily studied using this formalism. There’s a big field of programming
0:45:57 language within the formal language. Okay. And then phrase structure, grammar, is this idea
0:46:04 that you can break down language into this S-N-P-V-P type of thing? It’s a particular
0:46:09 formalism for describing language. And Chomsky was the first one. He’s the one who figured
0:46:15 that stuff out back in the ’50s. And that’s equivalent, actually. The context-free grammar
0:46:22 is actually kind of equivalent in the sense that it generates the same sentences as a
0:46:26 dependency grammar would. The dependency grammar is a little simpler in some way. You just have a
0:46:32 root and it goes, like, we don’t have any of these, the rules are implicit, I guess,
0:46:36 and we just have connections between words. The phrase structure, grammar is kind of a
0:46:40 different way to think about the dependency grammar. It’s slightly more complicated, but
0:46:45 it’s kind of the same in some ways. So to clarify, dependency grammar is the framework under which
0:46:52 you see language and you make a case that this is a good way to describe language. And
0:46:58 Noam Chomsky is watching this. He’s very upset right now, so I’m just kidding. But what’s the
0:47:05 difference between where’s the place of disagreement between phrase structure, grammar, and dependency
0:47:12 grammar? They’re very close. So phrase structure, grammar, and dependency grammar aren’t that far
0:47:17 apart. I like dependency grammar because it’s more perspicuous, it’s more transparent about
0:47:23 representing the connections between the words. It’s just a little harder to see in phrase structure
0:47:27 grammar. The place where Chomsky sort of devolved or went off from this is he also thought there was
0:47:35 something called movement. And that’s where we disagree. That’s the place where I would say
0:47:41 we disagree. And I mean, maybe we’ll get into that later. But the idea is, if you want to,
0:47:46 do you want me to explain that? No, I would love to explain movement. You’re saying so many
0:47:51 interesting things. Okay, so here’s the movement is Chomsky basically sees English and he says,
0:47:56 okay, I said, you know, we had that sentence earlier, like it was like two dogs entered the
0:48:01 room. It’s changed a little bit, say two dogs will enter the room. And he notices that, hey,
0:48:06 English, if I want to make a question, a yes, no question from that same sentence, I say,
0:48:12 instead of two dogs will enter the room, I say, will two dogs enter the room? Okay, there’s a
0:48:16 different way to say the same idea. And it’s like, well, the auxiliary verb that will thing,
0:48:21 it’s at the front as opposed to in the middle. Okay. And so, and he looked, you know, if you
0:48:26 look at English, you see that that’s true for all those modal verbs. And for other kinds of
0:48:32 auxiliary verbs in English, you always do that. You always put an auxiliary verb at the front.
0:48:36 And what he’s, when he saw that, so, you know, if I say, I can win this bet, can I win this bet,
0:48:42 right? So I move a can to the front. So actually, that’s a theory, I just gave you a theory there.
0:48:47 He talks about it as movement, that word in the declarative is the root is the sort of default
0:48:53 way to think about the sentence. And you move the auxiliary verb to the front, that’s a movement
0:48:58 theory. Okay. And he just thought that was just so obvious that it must be true that there’s
0:49:04 nothing more to say about that, that this is how auxiliary verbs work in English. There’s a movement
0:49:10 rule such that you’re move, like to get from the declarative to the interrogative, you’re moving
0:49:15 the auxiliary to the front. And it’s a little more complicated as soon as you go to simple,
0:49:19 simple present and simple past, because, you know, if I say, you know, John slept, you have to say,
0:49:24 did John sleep, not slept John, right? And so you have to somehow get an auxiliary verb. And I
0:49:29 guess underlyingly, it’s like slept is it’s a little more complicated than that. But that’s his
0:49:35 idea. There’s a movement. Okay. And so a different way to think about that, that isn’t, I mean,
0:49:39 he ended up showing later. So he proposed this theory of grammar, which has movement. And there’s
0:49:45 other places where he thought there’s movement, not just auxiliary verbs, but things like the passive
0:49:50 in English and things like questions, WH questions, a bunch of places where he thought there’s also
0:49:56 movement going on. And each one of those, these things, there’s words, well, phrases and words
0:50:01 are moving around from one structure to another, which he called deep structure to surface structure.
0:50:05 I mean, there’s like two different structures in his theory. Okay. There’s a different way to
0:50:10 think about this, which is there’s no movement at all. There’s a lexical copying rule such that
0:50:18 the word will or the word can, these auxiliary verbs, they just have two forms. And one of them
0:50:23 is the declarative and one of them is the interrogative. And you basically have the declarative
0:50:27 one and, oh, I form the interrogative or I can form one from the other. It doesn’t matter which
0:50:32 direction you go. And I just have a new entry, which has the same meaning, which has a slightly
0:50:38 different argument structure, argument structure. It’s a fancy word for the ordering of the words.
0:50:43 And so if I say, you know, it was the dogs, two dogs can or will enter the room. There’s two forms
0:50:51 of will. One is will declarative. And then, okay, I’ve got my subject to the left. It comes before
0:50:58 me. And the verb comes after me in that one. And then the will interrogative is like, oh,
0:51:03 I go first interrogative will is first. And then I have the subject immediately after and then the
0:51:09 verb after that. And so you just, you can just generate from one of those words, another word
0:51:14 with a slightly different argument structure with different ordering. And these are just lexical
0:51:18 copies. They’re not necessarily moving from one to another. There’s no movement. There’s a romantic
0:51:23 notion that you have like one main way to use a word. And then you could move it around, which is
0:51:30 essentially what movement is applying. But that’s the lexical copying is similar. So then we do
0:51:36 lexical copying for that same idea that maybe the declarative is the source and then we can copy it.
0:51:42 And so an advantage for, well, there’s multiple advantages of the lexical copying story. It’s
0:51:48 not my story. This is like Ivan Sog, linguists, a bunch of linguists have been proposing these
0:51:54 stories as well, you know, in tandem with the movement story. Okay, you know, he’s,
0:51:58 he’s Ivan Sog died a while ago, but he was one of the proponents of the non-movement of the
0:52:03 lexical copying story. And so that is that a great advantage is, well, Chomsky, really famously in
0:52:11 1971, showed that the movement story leads to learnability problems. It leads, it leads to
0:52:18 problems for, for how language is learned. It’s really, really hard to figure out what the underlying
0:52:24 structure of a language is. If you have both phrase structure and movement, it’s like really
0:52:28 hard to figure out what came from what. There’s like a lot of possibilities there. If you don’t
0:52:33 have that problem, learning, the learning problem gets a lot easier. Just say there’s lexical copies.
0:52:38 And when we say the learning problem, do you mean like humans learning a new language?
0:52:42 Yeah, just learning English. So baby is lying around listening to the crib, listening to me
0:52:47 talk. And, you know, how are they learning English? Or, or, you know, maybe it’s a two-year-old who’s
0:52:52 learning, you know, interrogatives and stuff or one, you know, they’re, you know, how are they
0:52:55 doing that? Are they doing it from like, are they figuring out or like, you know, so Chomsky said,
0:53:01 it’s impossible to figure it out, actually. He said it’s actually impossible, not, not hard,
0:53:05 but impossible. And therefore, that’s what that’s where universal grammar comes from,
0:53:10 is that it has to be built in. And so what they’re learning is that there’s some built in movement
0:53:16 is built in in his story is absolutely part of your language module. And, and then you are,
0:53:23 you’re just setting parameters, you’re said depending on English is just sort of a variant
0:53:27 of the universal grammar. And you’re figuring out, oh, which orders do those English do these
0:53:32 things? That’s the, the non-movement story doesn’t have this. It’s like much more bottom up.
0:53:38 You’re learning rules. You’re learning rules one by one. And, oh, there’s this, this word is connected
0:53:44 to that word. A great advantage, another advantage, it’s learnable. Another advantage of it is that
0:53:49 it predicts that not all auxiliaries might move, like it, it might depend on the word, depending
0:53:55 on whether you, and that turns out to be true. So there’s words that, that don’t really work
0:54:02 as auxiliar, you know, they work in declarative and not in an interrogative. So I can say,
0:54:06 I’ll give you the opposite first. If I can say, aren’t I invited to the party? Okay. And that’s an,
0:54:13 that’s an interrogative form, but it’s not from I aren’t invited to the party. There is no I aren’t,
0:54:19 right? So that’s, that’s interrogative only. And, and then we also have forms like ought. I,
0:54:26 I ought to do this. And, and I guess some British, old British people can say,
0:54:30 exactly. It doesn’t sound right, does it? For me, it sounds ridiculous. I don’t even think
0:54:36 ought is great, but I mean, I totally recognize I ought to do. It is not too bad, actually. I can
0:54:40 say ought to do this. That sounds pretty good. Yeah. If I’m trying to sound sophisticated, maybe.
0:54:44 I don’t know. It just sounds completely out to me. Yeah. Anyway, it’s, it’s, so there are variants
0:54:49 here. And a lot of these words just work in one versus the other. And, and that’s like fine under
0:54:55 the lexical copying story. It’s like, well, you just learn the usage, whatever the usage is,
0:55:00 is what you, is what you do with this, with this word. But it doesn’t, it’s a little bit harder
0:55:05 in the movement story, the movement story, like that’s an advantage, I think of lexical copying
0:55:09 and in all these different places, there’s, there’s all these usage variants, which make the movement
0:55:15 story a little bit harder to work. So one of the main divisions here is the movement story versus
0:55:22 the lexical copy story that has to do about the auxiliary words and so on. But if you rewind to
0:55:28 the phrase structure grammar versus dependency grammar, those are equivalent in some sense
0:55:34 in that for any dependency grammar, I can generate a dependency, a phrase structure grammar, which
0:55:39 generates exactly the same sentences. I just, I just like the dependency grammar formalism because
0:55:46 it makes something really salient, which is the dependent, the lengths of dependencies between
0:55:52 words, which isn’t so obvious in the, in the phrase, in the phrase structure, it’s just kind
0:55:56 of hard to see. It’s in there. It’s just very, very, it’s opaque. Technically, I think phrase
0:56:02 structure grammar is mappable to dependency grammar. And vice versa. And vice versa. But there’s like
0:56:07 these like little labels S and PVP. Yeah. For a particular dependency grammar, you can make a
0:56:13 phrase structure grammar, which generates exactly those same sentences and vice versa.
0:56:17 But there are many phrase structure grammars, which you can’t really make a dependency grammar.
0:56:21 I mean, there, you can do a lot more in a phrase structure grammar, but you get many more of these
0:56:26 extra nodes, basically, you can have more structure in there. And some people like that. And maybe
0:56:32 there’s value to that. I, I don’t like it. Well, for you, so we should clarify. So dependency grammar
0:56:39 is just, well, one word depends on only one other word and you form these trees.
0:56:43 And that makes, it really puts priority on those dependencies, just like as a, as a tree that you
0:56:50 can then measure the distance of the dependency from one word to the other, they can then map to
0:56:56 the cognitive processing of the, of these sentences, how well, how easy it is to understand
0:57:02 all that kind of stuff. So it just puts the focus on just like the mathematical
0:57:08 distance of dependence between words. So like, it’s just a good different focus.
0:57:13 Absolutely. Just continue on a thread of Chomsky because it’s really interesting because it,
0:57:18 as you’re discussing disagreement to the degree there’s disagreement, you’re also telling the
0:57:23 history of the study of language, which is really awesome. So you mentioned context free versus regular.
0:57:29 Does that distinction come into play for dependency grammars?
0:57:34 No, not at all. I mean, regular languages are too simple for human languages. They are,
0:57:41 it’s a part of the hierarchy, but human languages are in the phrase structure world are definitely,
0:57:48 at least context free, maybe a little bit more, a little bit harder than that. But so there’s
0:57:55 something called context sensitive as well, where you can have, like this is just the formal language
0:58:00 description. In a context free grammar, you have one, this is like a bunch of like formal
0:58:07 language theory we’re doing here. I love it. Okay. So you have a left hand side category,
0:58:12 and you’re expanding to anything on the right is a, that’s a context free. So like the idea is that
0:58:17 that category on the left expands in independent of context to those things, whatever they’re on
0:58:21 the right, doesn’t matter what. And a context sensitive says, okay, I actually have more than
0:58:28 one thing on the left. I can tell you only in this context, you know, I have maybe you have like a
0:58:33 left and a right context or just a left context or a right context, I have two or more stuff on the
0:58:37 left tells you how to expand that those things in that way. Okay. So it’s context sensitive.
0:58:42 A regular language is just more constrained. And so it, it doesn’t allow anything on the right.
0:58:49 It allows very, it allows, basically, it’s a one very complicated rule is kind of what a regular
0:58:56 language is. And so it doesn’t have any, let’s just say long distance dependencies, it doesn’t
0:59:01 allow recursion, for instance, there’s no recursion. Yeah, recursion is where you, which is human
0:59:06 languages have recursion, they have embedding, and you can’t, well, it doesn’t allow center embedded
0:59:11 recursion, which human languages have, which is what center embedded recursion within a sentence,
0:59:16 within a sentence. Yeah, within a sentence. So here we’re going to get to that. But I, you know,
0:59:19 the formal language stuff is a little aside, Chomsky wasn’t proposing it for human languages
0:59:24 even, he was just pointing out that human languages are context free. And then he was most in, for,
0:59:29 for human, because that was kind of stuff we did for formal languages. And what he was most interested
0:59:33 in was human language. And that’s like the, the movement is where we, we, we, where, where he
0:59:39 sort of set off in, on the, I would say a very interesting, but wrong foot, it was kind of
0:59:44 interesting. It’s a very, I agree, it’s kind of a very interesting history. So there’s a set,
0:59:48 he proposed this multiple theories in 57 and then 65, they’re, they all have this framework,
0:59:54 though, was phrase structure plus movement, different versions of the, of the phrase structure
0:59:58 and the movement in the 57, these are the most famous original bits of Chomsky’s work. And then
1:00:02 71 is when he figured out that those lead to learning problems, that, that there’s cases where
1:00:07 a kid could never figure out which rule, which set of rules was intended. And, and so, and then
1:00:15 he said, well, that means it’s innate. It’s kind of interesting. He just really thought the movement
1:00:19 was just so obviously true that he couldn’t, he didn’t even entertain giving it up. It’s just
1:00:25 obvious that that’s obviously right. And it was later where people figured out that there’s all
1:00:30 these like subtle ways in which things would, which look like generalizations aren’t generalizations,
1:00:36 and they, you know, across the category, they’re, they’re word specific and they have, and they,
1:00:40 they kind of work, but they don’t work across various other words in the category. And so it’s
1:00:44 easier to just think of these things as lexical copies. And I think he was very obsessed. I don’t
1:00:50 know, I’m guessing that he just, he really wanted this story to be simple in some sense and language
1:00:56 is a little more complicated. In some sense, you know, he didn’t like words. He never talks about
1:01:01 words. He likes to talk about combinations of words and words are, you know, look up a dictionary,
1:01:05 there’s 50 senses for a common word, right? The word take will have 30 or 40 senses in it.
1:01:11 So there’ll be many different senses for common words. And he just doesn’t think about that. It’s,
1:01:17 or doesn’t think that’s language. I think he doesn’t think that’s language. He thinks that
1:01:22 words are distinct from combinations of words. I think they’re the same. If you look at my brain
1:01:29 in the scanner, while I’m listening to a language I understand, and you compare, I can localize my
1:01:36 language network in a few minutes in like 15 minutes. And what you do is I listen to a language I
1:01:40 know, I listen to, you know, maybe some language I don’t know, or I listen to muffled speech, or I
1:01:46 read sentences, and I read non-words, like I do anything like this, anything that sort of really
1:01:50 like English and anything that’s not very like English. So I’ve got something like it and not,
1:01:54 and I got a control. And the voxels, which is just, you know, the 3D pixels in my brain that are
1:02:02 responding most is a language area. And that’s this left lateralized area in my head. And,
1:02:10 and wherever I look in that network, if you look for the combinations versus the words,
1:02:15 it’s, it’s, it’s everywhere. It’s the same. That’s fascinating. And so it’s like hard to find,
1:02:20 there are no areas that we know. I mean, that’s, it’s a little overstated right now. At this,
1:02:26 at this point, the technology isn’t great. It’s not bad, but we have the best, the best way to
1:02:31 figure out what’s going on in my brain when I’m listening or reading language is to use
1:02:35 FMRI, functional magnetic resonance imaging. And that’s a very good localization method. So I can
1:02:42 figure out where exactly these signals are coming from pretty, you know, down to, you know, millimeters,
1:02:47 you know, cubic millimeters are smaller, okay, very small, we can figure those out very well.
1:02:50 The problem is the when, okay, it’s, it’s measuring oxygen, okay, and oxygen takes
1:02:57 a little while to get to those cells. So it takes on the order of seconds. So I talk fast. I probably
1:03:03 listen fast and like, and probably understand things really fast. So a lot of stuff happens
1:03:06 in two seconds. And so to say that we know what’s going on, that the words right now in that network,
1:03:14 our best guess is that whole network is doing something similar, but maybe different parts
1:03:19 of that network are doing different things. And that’s probably the case. We just don’t have very
1:03:24 good methods to figure that out right at this moment. And so since we’re kind of talking about the
1:03:30 history of the study of language, what other interesting disagreements, and you’re both at
1:03:36 MIT, or were for a long time, what kind of interesting disagreements, their attention of
1:03:40 ideas are there between you and Noam Chomsky. And we should say that Noam was in the linguistics
1:03:46 department. And you’re, I guess, for a time were affiliated there, but primarily brain and cognitive
1:03:54 science department, which is another way of studying language. And you’ve been talking about
1:03:58 FMRI. So like, what, is there something else interesting to bring to the surface about the
1:04:04 disagreement between the two of you, or other people in the industry? Yeah, I mean, I’ve been at
1:04:10 MIT for 31 years since 1993, and Chomsky’s been there much longer. So I met him, I knew him,
1:04:18 I met when I first got there, I guess, and I, and we would interact every now and then. I’d say that,
1:04:23 so I tell you, our biggest difference is our methods. And so that’s the biggest difference
1:04:31 between me and Noam, is that I gather data from people. I do experiments with people and I gather
1:04:40 corpus data, whatever, whatever corpus data is available, and we do quantitative methods to
1:04:44 evaluate any kind of hypothesis we have. He just doesn’t do that. And so, you know, he has never
1:04:52 once been associated with any experiment or corpus work ever. And so it’s all thought experiments.
1:04:59 It’s his own intuitions. So I just don’t think that’s the way to do things. That’s a, that’s a,
1:05:06 you know, across the street, they’re across the street from us, kind of difference between
1:05:10 brain and cog sci and linguistics. I mean, not all linguists, some of the linguists,
1:05:14 depending on what you do, more speech oriented, they do more quantitative stuff. But in the,
1:05:19 in the meaning words and, well, it’s combinations of words and text semantics,
1:05:24 they tend not to do experiments and corpus analyses. So in linguistics size, probably,
1:05:31 but the method is a symptom of a bigger approach, which is sort of a psychology philosophy side on
1:05:38 Noam. And for you, it’s more sort of data driven, sort of almost like mathematical approach.
1:05:43 Yeah, I mean, I’m a psychologist. So I would say we’re in psychology. You know, I mean,
1:05:48 brain and cognitive sciences is MIT’s old psychology department. It was a psychology
1:05:52 department up until 1985. And that became the brain and cognitive science department.
1:05:56 And so I, I mean, my training isn’t, I call, I mean, my training is math and computer science,
1:06:00 but I’m a psychologist. I mean, I mean, I don’t know what I am.
1:06:04 So data driven psychologist. Yeah, you are. I am what I am. But I’m having to be called a linguist.
1:06:09 I’m happy to be called a computer scientist. I’m happy to be called a psychologist, any of those
1:06:13 things. But in the actual, like how that manifests itself outside of the methodology is like these
1:06:18 differences, these cell differences about the movement story versus the lexical copy story.
1:06:23 Yeah, those are theories, right? So the theories, like the theories are, but I think that the reason
1:06:28 we differ in part is because of how we evaluate the theories. And so I evaluate theories quantitatively
1:06:34 and Noam doesn’t. Got it. Okay, well, let’s, let’s explore the theories that you explore in your
1:06:42 book. Let’s return to this dependency grammar framework of looking at language. What’s a good
1:06:50 justification why the dependency grammar framework is a good way to explain language? What’s your
1:06:55 intuition? So the reason I like dependency grammar, as I’ve said before, is that it’s very
1:07:01 transparent about its representation of distance between words. So it’s like, it all it is, is
1:07:08 you’ve got a bunch of words you’re connecting together to make a sentence. And a really neat
1:07:14 insight, which turns out to be true, is that the further apart the pair of words are that you’re
1:07:20 connecting the harder it is to do the production, the harder it is to do the comprehension is as
1:07:24 harder to produce, hard to understand when the words are far apart, when they’re close together,
1:07:28 it’s easy to produce and it’s easy to comprehend. Let me give you an example. Okay, so we have,
1:07:35 in any language, we have mostly local connections between words, but they’re abstract, the
1:07:42 connections are abstract, they’re between categories of words. And so you can always
1:07:46 make things further apart. If you put your, if you add modification, for example, after a noun,
1:07:53 so a noun in English comes before verb, the subject noun comes before verb. And then there’s an
1:08:00 object after, for example, so I can say what I said before, you know, the dog entered the room or
1:08:04 something like that. So I can modify dog. If I say something more about dog after it, then what I’m
1:08:09 doing is, indirectly, I’m lengthening the dependence between dog and entered by adding more stuff to
1:08:16 it. So I just make it explicit here if I say the boy who the cat scratched cried. We’re going to
1:08:27 have a mean cat here. And so what I’ve got here is I get the boy cried, it would be a very short,
1:08:33 simple sentence. And I just told you something about the boy. And I told you it was the boy
1:08:38 who the cat scratched. Okay. So the cry is connected to the boy. The cry at the end is
1:08:44 connected to the boy in the beginning. Right. And so I can do that. And I can say that that’s a
1:08:47 perfectly fine English sentence. And I can say the cat, which the dog chased ran away or something.
1:08:56 Okay, I can do that. But it’s really hard. And so I, but it’s really hard now. I’ve got, you know,
1:09:01 whatever I have here, I have the boy who the cat. Now let’s say I try to modify cat. Okay. The boy
1:09:07 who the cat, which the dog chased scratched ran away. Oh my God, that’s hard, right? I can,
1:09:14 I’m sort of just working that through in my head how to produce and how to, and it’s really very,
1:09:18 just horrendous to understand. It’s not so bad. At least I’ve got intonation there to sort of mark
1:09:23 the boundaries and stuff. But it’s, that’s really complicated. That’s sort of English in a way. I
1:09:29 mean, that follows the rules of English. But so what’s interesting about that is, is that what
1:09:34 I’m doing is nesting dependencies there. I’m putting one, I’ve got a subject connected to a verb
1:09:39 there. And then I’m modifying that with a clause, another clause, which happens to have a subject
1:09:45 and a verb relation. I’m trying to do that again on the second one. And what that does is it lengthens
1:09:50 out the dependence, multiple dependents actually get lengthened out there. The dependencies get
1:09:54 longer, on the outside ones get long, and even the ones in between get kind of long. And you just,
1:09:59 so what’s fascinating is that that’s bad. That’s really horrendous in English. But that’s horrendous
1:10:06 in any language. And so in no matter what language you look at, if you do, just figure out some
1:10:12 structure where I’m going to have some modification following some head, which is connected to
1:10:16 some later head, and I do it again, it won’t be good. It guaranteed. Like 100%, that will be
1:10:22 uninterpretable in that language, in the same way that was uninterpretable in English.
1:10:26 Just clarify, the distance of the dependencies is whenever the boy cried, there’s a dependence
1:10:35 between two words, and then you counting the number of what morphemes between them.
1:10:41 That’s a good question. I just say words. Your words are morphemes between. We don’t know that.
1:10:45 Actually, that’s a very good question. What is the distance metric? But let’s just say it’s words.
1:10:48 Sure. Okay. And you’re saying the longer the distance to that dependence, the more, no matter
1:10:54 the language, except legalese. Even legalese. We’ll talk about it. Okay. But that the people
1:11:04 will be very upset that speak that language, not upset, but they’ll either not understand it,
1:11:09 or they’ll be like, their brain will be working in overtime. Yeah. They will have a hard time
1:11:14 either producing or comprehending it. They might tell you that’s not their language.
1:11:18 It’s sort of the language. They’ll agree with each of those pieces as part of the language,
1:11:23 but somehow that combination will be very, very difficult to produce and understand.
1:11:27 Is that a chicken or the egg issue here? Well, I’m giving you an explanation.
1:11:32 I’m giving you two kinds of explanations. I’m telling you that center embedding,
1:11:39 that’s nesting, those are synonyms for the same concept here. And the explanation for what,
1:11:45 those are always hard. Center embedding and nesting are always hard. And I give you an
1:11:48 explanation for why they might be hard, which is long distance connections. There’s a,
1:11:52 when you do center embedding, when you do nesting, you always have long distance connections
1:11:55 between the dependents. You just, and so that’s not necessarily the right explanation. It just
1:11:59 happens. I can go through reasons why that’s probably a good explanation. And it’s not really
1:12:03 just about one of them. So probably it’s a pair of them or something of these dependents that you
1:12:09 get long, that drives you to be really confused in that case. And so what the behavioral
1:12:15 consequence there, I mean, we, this is kind of methods, like how do we get at this? You could
1:12:21 try to do experiments to get people to produce these things. They’re going to have a hard time
1:12:25 producing them. You can try to do experiments to get them to understand them and see how well
1:12:29 they understand them, can they understand them. Another method you can do is give people partial
1:12:35 materials and ask them to complete them, you know, those, those center embedded materials,
1:12:40 and they, they’ll fail. So I’ve done that. I’ve done all these kinds of things.
1:12:43 So, wait a minute. So central embedding, meaning like you take a normal sentence,
1:12:49 like boy cried and inject a bunch of crap in the middle. Yes. That separates the boy and the
1:12:54 cried. Yes. Okay. That’s central bedding. And nesting is on top of that. No, no, nesting is the
1:12:59 same thing. Center embedding, those are totally equivalent terms. I’m sorry, I sometimes use
1:13:02 one in some terms. Oh, got it, got it. They don’t need anything different. Got it. And then
1:13:06 what you’re saying is there’s a bunch of different kinds of experiments you can do. I mean, I like
1:13:11 to understanding one is like, have more embedding, more central bedding, is it easier or harder to
1:13:16 understand, but then you have to measure the level of understanding, I guess. Yeah. Yeah, you could.
1:13:20 I mean, there’s multiple ways to do that. I mean, there’s, there’s the simplest ways just to ask
1:13:23 people how good is it sound, how natural is the sound. That’s a very blunt, but very good measure.
1:13:29 It’s very, very reliable. People will do the same thing. And so it’s like, I don’t know what it means
1:13:33 exactly, but it’s doing something such that we’re measuring something about the confusion,
1:13:37 the difficulty associated with those. And those, like, those are giving you a signal,
1:13:40 that’s why you can say them. Yeah. Okay. What about the completion of the central bed?
1:13:45 So if you give them a partial sentence, say I say the book which the author who, and I ask you to
1:13:54 now finish that off for me. I mean, either say it, but you can just say it’s written in front
1:13:58 of you and you can just type and have as much time as you want. They will, even though that one’s
1:14:02 not too hard, right? So if I say it’s like the book, it’s like, oh, the book which the author who I
1:14:08 met wrote was good. You know, that’s a very simple completion for that. You know, if I give that
1:14:14 completion online somewhere to a, you know, a crowdsourcing platform and ask people to complete
1:14:20 that, they will miss off of a verb very regularly, like half of the time, maybe two thirds of the
1:14:26 time, they’ll say, they’ll just leave off one of those verb phrases. Even with that simple, so I’ll
1:14:30 say the book which the author who, and they’ll say was, they won’t have, that you need three verbs,
1:14:39 right? I need three verbs are who I met wrote was good, and they’ll give me two. They’ll say,
1:14:44 who was famous was good or something like that. They’ll just give me two. And that’ll happen about
1:14:50 60% of the time. So 40%, maybe 30, they’ll do it correctly, correctly, meaning they’ll do a
1:14:56 three verb phrase. I don’t know what’s correct or not, you know, this is hard. It’s a hard task.
1:15:00 Yeah, I can actually, I’m struggling with it in my head. Well, it’s easier when you,
1:15:04 when you look at it, if you look at it a little easier, then listening is pretty tough. Because
1:15:08 you have to, because there’s no trace of it, you have to remember the words that I’m saying,
1:15:13 which is very hard, auditorily, we wouldn’t do it this way. We do it written, you can look at it
1:15:17 and figure it out. It’s easier in many dimensions in some ways, depending on the person. It’s easier
1:15:21 to gather written data for, I mean, most sort of psycho I work in psycholinguistics, right? Psychology
1:15:28 of language and stuff. And so a lot of our work is based on written stuff, because it’s so easy to
1:15:33 gather data from people doing written kinds of tasks. Spoken tasks are just more complicated to
1:15:39 administer and analyze because people do weird things when they speak. And it’s harder to analyze
1:15:45 what they do. But they generally point to the same kinds of things.
1:15:50 It’s okay. So the universal theory of language by Ted Gibson is that you can form dependency,
1:15:57 you can form trees from any sentences, and that’s right, you can measure the distance in some way
1:16:03 of those dependencies. And then you can say that most languages have very short dependencies.
1:16:10 All languages, all languages, all languages have short dependencies. You can actually measure that.
1:16:14 So a next student of mine, this guy is at University of California Irvine, Richard Futrell,
1:16:20 did a thing a bunch of years ago now, where he looked at all the languages we could look at,
1:16:25 which was about 40 initially. And now I think there’s about 60 for which there are dependency
1:16:31 structures. So they’re meaning there’s got to be a big text, a bunch of texts, which have been
1:16:36 parsed for their dependency structures. And there’s about 60 of those which have been parsed that
1:16:40 way. And for all of those, you can, what he did was take any sentence in one of those languages,
1:16:48 and you can do the dependency structure, and then start at the root. We were talking about
1:16:52 dependency structures. That’s pretty easy now. And he’s trying to figure out what a control
1:16:57 way you might say the same sentence is in that language. And so we just like, all right, there’s
1:17:02 a root, and it has a say as a sentence is, let’s go back to, you know, two dogs entered the room.
1:17:07 So entered is the root. And entered has two dependents that’s got dogs, and it has room.
1:17:14 Okay. And what he does is like, let’s scramble that order, that’s three things, the root and the
1:17:19 head and the two dependents, and into some random order, just random, and then just do that for
1:17:24 all the dependents down the two. So now look, do it for the, and whatever was two in dogs and for,
1:17:29 in room. And that’s, you know, that’s not, it’s a very short sentence. When sentences get longer,
1:17:33 and you have more dependents, there’s more scrambling that’s possible. And what he found,
1:17:38 what, so that, so, so that that’s one, you can figure out one scrambling for that sentence,
1:17:42 he did like a hundred times for every sentence in every corp, in every one of these texts,
1:17:47 every corpus. And, and then he just compared the dependency lengths in those random scramblings
1:17:53 to what actually happened with what the English or the French or the German was in the original
1:17:58 language or Chinese or what all these like 80, like, you know, 60 languages. Okay. And, and the
1:18:02 dependency lengths are always shorter in the real language compared to these, this kind of a control.
1:18:07 And there’s another, it’s a little more rigid, his control. So the way I described it, you could
1:18:16 have crossed dependencies, like that by scrambling that way, you could scramble in any way at all.
1:18:21 Languages don’t do that. They tend not to cross dependencies very much. Like, so the dependency
1:18:27 structure, they just, they tend to keep things non-crossed. And there’s a, you know, like,
1:18:31 there’s a technical term, they call that projective, but it’s just non-crossed is all that is
1:18:35 projective. And so if you just constrain the, the scrambling so that it only gives you projectives,
1:18:41 sort of non-crossed is the same thing holds. So it’s, so the, you still, still human languages are
1:18:46 much shorter than these, this kind of a control. So there’s like, what it means is that, that we’re,
1:18:52 in every language, we’re trying to put things close in relative to this kind of a control.
1:18:58 Like there, it doesn’t matter about the word order, some of these are verb final, some of the
1:19:01 means are verb, media-like English. And some are even verb initial. There are a few languages,
1:19:05 the world, which have VSO, world order, word order, verb, subject, object languages,
1:19:10 haven’t talked about those. It’s like 10% of the,
1:19:13 And even, even in those languages, it’s still short dependencies.
1:19:17 Short dependencies is rules.
1:19:19 Okay. So how, what, what, what are some possible explanations for that?
1:19:22 For why, why languages have evolved that way? So that, that’s one of the,
1:19:29 as opposed to disagreements you might have with Chomsky. So you consider the evolution
1:19:33 of language in, in terms of information theory. And for you, the purpose of language is ease of
1:19:43 communication, right, in processing. That’s right. That’s right. So I mean, the, the story here is just
1:19:47 about communication. It is just about production, really. It’s about ease of production is the story.
1:19:53 When you say production, can you, oh, I just mean ease of language production. It’s easier for me
1:19:57 to say things when the, when I’m doing, whenever I’m talking to you is somehow I’m
1:20:02 formulating some idea in my head and I’m putting these words together. And it’s easier for me to do
1:20:07 that, uh, to put, to say something where the words are close, closely connected in a dependency,
1:20:13 as opposed to separated, like by putting something in between and over and over again. I, it’s just
1:20:18 hard for me to keep that in my head. It like, that’s, that’s the whole story. Like the story,
1:20:22 it’s basically, it’s like the dependency grammar sort of gives that to you. Like just like long,
1:20:27 long as bad, short as good. It’s like easier to keep in mind because you have to keep it in mind for
1:20:32 probably for production, probably, you know, probably matters in comprehension as well. Like
1:20:36 also matters in comprehension. It’s on both sides of it. The production and the, but I would guess
1:20:40 it’s probably evolved for production. Like it’s about producing. It’s what’s easier for me to say
1:20:45 that ends up being easier for you also. I, that’s very hard to disentangle this idea of who’s it for.
1:20:51 Is it for me, the speaker, or is it for you, the listener? I mean, part of my language is for
1:20:55 you. Like the way I talk to you is going to be different from how I talk to different people.
1:21:00 So I’m, I’m definitely angling what I’m saying to who I’m saying, right? It’s not like I’m just
1:21:05 talking the same way to every single person. And so I am sensitive to my audience, but how does
1:21:12 that, does that, you know, work itself out in the, in the dependency link differences? I don’t
1:21:17 know. Maybe that’s about just the words, that part, you know, which words I select. My initial
1:21:21 intuition is that you optimize language for the audience. Yeah. But it’s just kind of like messing
1:21:28 with my head a little bit to say that some of the optimization might be, maybe the primary objective,
1:21:34 the optimization might be the ease of production. We have different senses, I guess. I’m, I’m like
1:21:39 very selfish and you’re like, I think it’s like, it’s all about me. I’m like, I’m just doing the
1:21:45 easiest for me at all times. I don’t want to, I’m like, I’ll, I mean, but I have to, of course,
1:21:49 choose the words that I think you’re going to know. I’m not going to choose words you don’t
1:21:54 know. In fact, I’m going to fix that when I, you know, so there it’s about, but, but maybe for,
1:21:58 for the syntax, for the combinations, it’s just about me. I feel like it’s, I don’t know though,
1:22:03 it’s very hard. Wait, wait, wait, wait, wait, wait, but the purpose of communication is to
1:22:06 be understood, is to convince others and so on. So like the selfish thing is to be understood.
1:22:11 Okay. It’s about the listener. It’s a little circular there too then. Okay. Right. I mean,
1:22:14 like the ease of production helps me be understood then. I don’t think it’s circular.
1:22:21 So I think the primary, I think the primary objective is to be understood, is about the
1:22:25 listener. Because otherwise, the, if you’re optimizing to, for the ease of production, then
1:22:30 you’re, you’re not going to have any of the interesting complexity of language. Like you’re
1:22:34 trying to like explain. Well, let’s control for what it is I want to say. Like I, I’m saying let’s
1:22:38 control for the thing, the, the message control for the message. I want to tell you, the message
1:22:42 needs to be understood. That’s the goal. Oh, but that’s the meaning. So I’m still talking about
1:22:46 the form, just the form of the meaning. How do I frame the form of the meaning is all I’m talking
1:22:52 about. You’re talking about a harder thing. I think it’s like, how am I like trying to change
1:22:56 the meaning. Let’s, let’s keep the meaning constant. Like which, if you keep the meaning constant,
1:23:02 how can I phrase whatever it is I need to say, like I got to pick the right words and I’m going
1:23:07 to pick the order so that it’s, so it’s easy for me. You know, that’s, that’s, that’s what I think
1:23:11 is probably like. I think I’m still tying meaning and form together in my head. But you’re saying,
1:23:18 if you keep the meaning of what you’re saying constant, what the optimization, yeah, it could be
1:23:23 the primary objective of that optimization is the, for production. That’s interesting. I’m,
1:23:30 I’m struggling to keep constant and meaning. It’s just so, I mean, I’m, I’m such a, I’m a human,
1:23:36 right? So for me, the form without having introspected on this, the form and the meaning
1:23:43 are tied together, like deeply because I’m a human. Like for me, when I’m speaking,
1:23:50 because I haven’t thought about language, like in a rigorous way about the form of language.
1:23:55 But look, for any event, there’s, there’s an, an unbounded, I don’t, I don’t want to say infinite,
1:24:03 but sort of ways of that. I might communicate that same event. This two dogs entered a room,
1:24:08 I can say, in many, many different ways. I can say, Hey, there’s two dogs. They entered the room.
1:24:14 Hey, the room was entered by something. The thing that was entered was two dogs. I mean,
1:24:18 there’s, I mean, it’s kind of awkward and weird and stuff. But those are all similar messages
1:24:22 with different forms, but different ways that might frame. And of course,
1:24:27 I use the same words there all the time. I could have referred to the dogs as, you know,
1:24:32 a Dalmatian and a Poodle or something. You know, I could have been more specific or less specific
1:24:36 about what they are. And I could have said, been more abstract about, about, about the number.
1:24:40 There’s like, so I, like, I’m trying to keep the meaning, which is this event constant. And then
1:24:46 how am I going to describe that to get that to you? It kind of depends on what you need to know,
1:24:50 right? And what I think you need to know. But I’m like, let’s control for all that stuff
1:24:54 and not, and, and I’m just like choosing about, I’m doing something simpler than you’re doing,
1:24:59 which is just forms. Yes. Just words to you specifying the species, the breed of dog and
1:25:06 whether they’re cute or not is changing the meaning. That might be. Yeah. Yeah. That would be
1:25:11 changing. Well, that would be changing the meaning for sure. Right. So you’re just, yeah. Yeah. Yeah.
1:25:16 That’s changing the meaning. But say, even if we keep that constant, we can still talk about what’s
1:25:20 easier or hard for me, right? The listener and the, and the, which phrase structures I use,
1:25:26 which combinations, which this is so fascinating and just like a really powerful window into human
1:25:34 language. But I wonder still throughout this, how vast the gap between meaning and form. I just,
1:25:42 I just have this, like, maybe romanticize notion that they’re close together, that they evolve
1:25:48 close to like hand in hand, that you can’t just simply optimize for one without the other being
1:25:55 in the room with us. Like it’s, well, it’s kind of like an iceberg. Form is the tip of the iceberg
1:26:01 and the rest, the, the meaning is the iceberg, but you can’t like separate. But I think that’s why
1:26:06 these large language models are so successful is because they’re good at form and form isn’t that
1:26:11 hard in some sense. And meaning is tough still. And that’s why they’re not, they’re, you know,
1:26:16 they don’t understand what they’re doing. We’re going to talk about that later maybe, but
1:26:20 like we can distinguish in our forget about large language models, like humans, maybe you’ll
1:26:26 talk about that later too, is like the difference between language, which is a communication system
1:26:31 and thinking, which is meaning. So language is a communication system for the meaning. It’s not
1:26:37 the meaning. And so that’s why, I mean, that, and there’s a lot of interesting evidence we can talk
1:26:42 about relevant, relevant to that. Well, I mean, that’s a really interesting question. What is the
1:26:46 different, what is the difference between language written, communicated versus thought?
1:26:54 What to use the difference between them? Well, you or anyone cast a think of a task, which they
1:27:02 think is, is a good thinking task. And there’s lots and lots of tasks, which should be good thinking
1:27:07 tasks. And whatever those tasks are, let’s say it’s, you know, playing chess, or that’s a good
1:27:12 thinking task, or playing some game, or doing some complex puzzles, maybe, maybe remembering
1:27:18 some digits that’s thinking, remembering some, a lot of different tasks we might think, maybe
1:27:22 just listening to music is thinking, or there’s a lot of different tasks we might think of is
1:27:26 thinking. There’s a woman in my department at Federico, and she’s done a lot of work on this
1:27:31 question about what’s the connection between language and thought. And so she uses, I was
1:27:36 referring earlier to MRI, fMRI, that’s her primary method. And so she has been really
1:27:42 fascinated by this question about whether, what language is, okay? And so as I mentioned earlier,
1:27:48 you can localize my language area, your language area in a few minutes, okay? In like 15 minutes,
1:27:53 I can listen to language, listen to non-language, or backward speech, or something. And we’ll find
1:27:59 areas left lateralized network in my head, which is especially, which is very sensitive to
1:28:05 language, as opposed to whatever that control was, okay? Can you specify what you mean by
1:28:09 language, like communicated language? Just sentences. You know, I’m listening to English
1:28:13 of any kind story, or I can read sentences, anything at all that I understand, if I understand it,
1:28:18 then it’ll activate my language network. So right now, my language network is going like crazy
1:28:23 when I’m talking, and when I’m listening to you, because we’re both, we’re communicating.
1:28:28 And that’s pretty stable. Yeah, it’s incredibly stable. So I’ve, I happen to be married to this
1:28:34 woman at Federico. So I’ve been scanned by her over and over and over since 2007 or six or something.
1:28:39 And so my language network is exactly the same, you know, like a month ago, as it was back in 2007.
1:28:45 It’s amazingly stable. It’s astounding. And with it, it’s, it’s a really fundamentally cool thing.
1:28:51 And so my language network is, it’s like my face, okay? It’s not changing much over time inside my
1:28:56 head. Can I ask a quick question? Sorry, I was a small tangent. At which point in the,
1:29:01 as you grow up from baby to adult, does it stabilize? We don’t know. Like that’s,
1:29:07 that’s a very hard question. They’re working on that right now, because of the problem scanning
1:29:11 little kids, like doing the, trying to do local, trying to do the, the localization on little
1:29:17 children in this scanner, or you’re lying in the fMRI scan, that’s the best way to figure out where
1:29:21 something’s going on inside our brains. And the scanner is loud and you’re in this tiny little
1:29:26 area, you’re claustrophobic. And it doesn’t bother me at all. I can go to sleep in there.
1:29:31 But some people are bothered by it. And little kids don’t really like it. And they don’t like to
1:29:34 lie still. And you have to be really still because you move around, that’s, that messes up the
1:29:39 coordinates of where, where everything is. And so, you know, try to get, you know, your question is,
1:29:43 how and when are language developing, you know, how, when, how does this left lateralized system
1:29:48 come to play? Where does it, you know, and it’s really hard to get a two year old to do this task.
1:29:52 But you can maybe, they’re starting to get three and four and five year olds to do this task for
1:29:56 short periods. And it looks like it’s there pretty early. So clearly, when you lead up to your, like,
1:30:01 a baby’s first words, before that, there’s a lot of fascinating turmoil going on about like figuring
1:30:08 out like, what are, what are these people saying? Yeah. And you’re trying to like make sense. How
1:30:13 does that connect to the world? And all that kind of stuff. Yeah, that might be just fascinating
1:30:18 development that’s happening there. That’s hard to introspect. But anyway, you,
1:30:22 we’re back to the scanner. And I can find my network in 15 minutes. And now we can ask a,
1:30:28 we can ask, find my network, find yours, find, you know, 20 other people do this task.
1:30:31 And we can do some other tasks. Anything else you think is thinking of some other thing. I can
1:30:36 do a spatial memory task. I can do a music perception task. I can do programming task,
1:30:43 if I program, okay, I can do what, where I can like understand computer programs. And
1:30:49 none of those tasks will tap the language network at all, like at all. There’s no overlap. They’re,
1:30:54 they’re highly activated in other parts of the brain. There’s a, there’s a bilateral network,
1:30:59 which I think she tends to call the multiple demands network, which does anything kind of hard. And,
1:31:04 and so anything that’s kind of difficult in some ways will activate that multiple demands network.
1:31:10 I mean, music will be in some music area, you know, there’s music specific kinds of areas. And so,
1:31:14 but they’re, but, but none of them are activating the language area at all, unless there’s words.
1:31:20 Like, so if you have music and there’s a song and you can hear the words, then then you get the
1:31:25 language area. We’re talking about speaking and listening, but are, or are we also talking about
1:31:29 reading? This is all comprehension of any kind. And so, that is fast. So what this, this, this
1:31:35 network doesn’t make any difference if it’s written or spoken. So the, the, the thing that she calls,
1:31:41 Federico calls the, the language network is this high level language. So it’s not about the spoken,
1:31:45 the spoken language, and it’s not about the written language. It’s about either one of them.
1:31:49 And so we’re, so when you do speech, you’re sort of listed, you either, you’re listening to speech,
1:31:53 and you’d, you’d, you’d subtract away some language you don’t understand. And so, or you
1:31:57 subtract away back, backward speech, which signs, sounds like speech, but it isn’t. And, and then,
1:32:02 so you take away the sound part altogether. And so, and then if you do written, you get exactly
1:32:08 the same network. So for just reading the language versus reading sort of nonsense words or something
1:32:13 like that, you’ll find exactly the same network. And so it’s just about high level,
1:32:17 the comprehension of language. Yeah. In this case, and the same thing happened,
1:32:21 production’s a little harder to run the scanner, but the same thing happens in production. You get
1:32:24 the same network. So production’s a little harder. You have to figure out how do you run a task,
1:32:28 you know, in the network such that you’re doing some kind of production. And I can’t remember
1:32:31 what, they’ve done a bunch of different kinds of tasks there where you get people to, you know,
1:32:36 produce things. Yeah. Figure out how to produce. And the same network
1:32:39 goes on there. Exactly the same place. And so if, wait, wait, so if you read random words.
1:32:44 Yeah. If you read things like, um, like gibberish. Yeah. Yeah. Lewis Carroll’s,
1:32:49 it was brilliant. Jabberwocky, right? They call that Jabberwocky speech.
1:32:52 The network doesn’t get activated. Not as much. There are words in there.
1:32:56 Yeah. Because it’s like, there’s, there’s function words and stuff. So it’s lower
1:32:59 activation. Yeah. Yeah. So there’s like, basically the more language like it is, the higher it goes
1:33:05 in the language network. And that network is there from when you speak from as soon as you
1:33:10 learn language. And, and it’s, it’s there. Like you speak multiple languages, the same network
1:33:16 is going for your multiple languages. So you speak English, you speak Russian,
1:33:19 both of them are hitting that same network. If you, if you’re fluent in those languages.
1:33:24 So programming. Not at all. Isn’t that amazing? Even if you’re a really good programmer,
1:33:29 that is not a human language. It’s just not conveying the same information. And so it is
1:33:34 not in the language network. And so that has mind blowing as I think that’s pretty cool.
1:33:38 That’s weird. It is amazing. And so that’s like one set of day. This is hers like shows that
1:33:43 what you might think is thinking is, is not language. Language is just the seek, just,
1:33:48 just this conventionalized system that we’ve worked out in human languages. Oh, another fascinating
1:33:54 little bit tidbit is that even if they’re these constructed languages like Klingon or I don’t
1:34:01 know the languages from Game of Thrones, I’m sorry. I don’t remember those languages. Maybe
1:34:04 a lot of people offended right now. There’s people that speak those languages. They really
1:34:08 speak those languages because the people that wrote the languages for the shows,
1:34:13 they did an amazing job of constructing on something like a human language. And those,
1:34:19 that, that lights up the language area. That’s like, because they can speak, you know,
1:34:24 pretty much arbitrary thoughts in a human language. It’s not a, it’s a constructed human
1:34:28 language. Probably it’s related to human languages because the people that were constructing them
1:34:32 wasn’t were making them like human languages in various ways, but it also activates the same
1:34:36 network, which is pretty, pretty cool. Anyway, sorry to go into a place where you may be
1:34:41 a little bit philosophical, but is it possible that this area of the brain is doing some kind
1:34:47 of translation into a deeper set of almost like concepts? It has to be doing. So it’s
1:34:55 doing in communication, right? It is translating from thought, whatever that is, is more abstract,
1:35:00 and it’s doing that. That’s what it’s doing. Like it is, that is kind of what it is doing.
1:35:04 It’s kind of a meaning network, I guess. Yeah, like a translation network.
1:35:08 Yeah. But I wonder what is at the core at the bottom of it? Like what are thoughts? Are they,
1:35:13 thoughts, to me, like thoughts and words, are they neighbors or are, is it one turtle sitting
1:35:20 on top of the other? Meaning like, is there a deep set of concepts that we… Well, there’s
1:35:26 connections right between what these things mean and then there’s probably other parts of the brain
1:35:31 that what these things mean. And so when I’m talking about whatever it is I want to talk about,
1:35:36 it’ll be represented somewhere else. That knowledge of whatever that is will be represented
1:35:41 somewhere else. Well, I wonder if there’s like some stable, nicely compressed encoding of meanings
1:35:48 that’s separate from language. I guess the implication here is that we don’t think in
1:35:57 language. That’s correct. Isn’t that cool? And that’s so interesting. So people, I mean,
1:36:03 this is like hard to do experiments on, but there is this idea of an inner voice and a lot of people
1:36:09 have an inner voice. And so if you do a poll on the internet and ask if you hear yourself talking
1:36:14 when you’re just thinking or whatever, about 70 or 80% of people will say yes. Most people
1:36:19 have an inner voice. I don’t. And so I always find this strange. So when people talk about an
1:36:25 inner voice, I always thought this was a metaphor. And they hear, I know most of you, whoever’s
1:36:30 listening to this thinks I’m crazy now because I don’t have an inner voice and I just don’t know
1:36:35 what you’re listening to. It sounds so kind of annoying to me, but to have this voice going on
1:36:40 while you’re thinking, but I guess most people have that. And I don’t have that. And I don’t,
1:36:46 we don’t really know what that connects to. I wonder if the inner voice activates that same
1:36:50 network. I don’t know. I don’t know. I don’t know. I mean, this could be speachy, right? So that’s
1:36:55 like, do you hear, do you have an inner voice? I don’t think so. A lot of people have this sense
1:37:00 that they hear themselves and then say they read someone’s email. I’ve heard people tell me that
1:37:06 they hear that other person’s voice when they read other people’s emails. And I’m like, wow,
1:37:11 that sounds so disruptive. I do think I like vocalize what I’m reading, but I don’t think I hear a
1:37:17 voice. Well, that’s, you probably don’t have an inner voice. Yeah, I don’t think I have an inner voice.
1:37:20 People have an inner voice. People have this strong percept of hearing sound in their heads
1:37:26 when they’re just thinking. I refuse to believe that’s the majority of people. Majority, absolutely.
1:37:31 What? It’s like two thirds or three quarters. It’s a lot. I would never ask class. And I went
1:37:36 internet. They always say that. So you’re in a minority. It could be a self report flaw.
1:37:41 It could be. You know, when I’m reading inside my head, I’m kind of like saying the words,
1:37:50 which is probably the wrong way to read, but I don’t hear a voice. There’s no press,
1:37:55 percept of a voice. I refuse to believe the majority people have. Anyway, it’s a fascinating,
1:38:01 the human brain is fascinating, but it still blew my mind that the, that language does appear,
1:38:06 comprehension does appear to be separate from thinking. So that’s one set. One set of data
1:38:14 from Fedorenko’s group is that no matter what task you do, if it doesn’t have words and combinations
1:38:21 of words in it, then it won’t light up the language network. And, you know, you could, it’ll be active
1:38:26 somewhere else, but not there. So that’s one. And then this other piece of evidence relevant
1:38:32 to that question is, it turns out there are these, this group of people who’ve had a massive stroke
1:38:38 on the left side and wiped out their language network. And as long as they didn’t wipe out
1:38:43 everything on the right as well, in that case, they wouldn’t be, you know, cognitively functionable.
1:38:47 But if they just wiped out language, which is pretty tough to do because it’s,
1:38:50 it’s very expansive on the left. But if they have, then there are these, there’s patients
1:38:55 like this called so-called global aphasics who can do any task just fine, but not language.
1:39:02 They can’t, you can’t talk to them. I mean, they don’t understand you. They can’t speak,
1:39:07 can’t write, they can’t read, but they can do, they can play chess, they can drive their cars,
1:39:12 they can do all kinds of other stuff, you know, do math, they can do all, like, so math is not
1:39:16 in the language area, for instance, you do arithmetic and stuff. That’s not language area.
1:39:20 It’s got symbols. So people sort of confuse some kind of symbolic processing with language. And
1:39:23 symbolic processing is not the same. So there are symbols and they have meaning, but it’s not
1:39:28 language. It’s not a, you know, conventionalized language system. And so language, so math isn’t
1:39:33 there. And so they can do math. They do just as well as their control, age match controls and
1:39:38 all these tasks. This is Rosemary Varley over in University College London, who has a bunch of
1:39:42 patients who she’s shown this that they’re just, so that sort of combination suggests that language
1:39:50 isn’t necessary for thinking. It doesn’t mean that you can’t think in language. You could think
1:39:55 in language because language allows a lot of expression, but it’s just, you don’t need it
1:39:59 for thinking. It suggests that language is separate, is a separate system.
1:40:03 This is kind of blowing my mind right now. It’s cool, isn’t it? I’m trying to load that in
1:40:07 because it has implications for large language models. It sure does. And they’ve been working
1:40:13 on that. Well, let’s take a stroll there. You wrote that the best current theories of human
1:40:18 language are arguably large language models. So this has to do with form. It’s kind of a big
1:40:24 theory. And, but the reason it’s arguably the best is that it does the best at predicting
1:40:30 what’s English, for instance. It’s, it’s like incredibly good, you know, it better than any
1:40:35 other theory. It’s so, you know, but, you know, we don’t, you know, there’s, it’s not sort of,
1:40:39 there’s not enough detail. It’s opaque. Like there’s not, you don’t know what’s going on.
1:40:43 You know what’s going on. It’s another black box. But I think it’s, you know, it is a theory.
1:40:47 What’s your definition of a theory? Because it’s a gigantic, it’s a gigantic black box with,
1:40:52 you know, a very large number of parameters controlling it. To me, theory usually requires
1:40:57 a simplicity, right? Well, I don’t know. Maybe I’m just being loose there. I think it’s a,
1:41:03 it’s not, it’s not a great theory, but it’s a theory. It’s a good theory in one sense in that
1:41:07 it covers all the data. Like anything you want to say in English, it does. And so that’s why it’s,
1:41:11 that’s how it’s arguably the best is that no other theory is as good as a large language model in
1:41:16 predicting exactly what’s good and what’s bad in English. Now you’re saying, is it a good theory?
1:41:22 Well, probably not, you know, because I want a smaller theory than that. It’s too big. I agree.
1:41:27 You could probably construct a mechanism by which it can generate a simple explanation
1:41:33 of a particular language, like a set of rules, something like it could generate a dependency
1:41:41 grammar for a language, right? Yeah. You could probably, you could probably just ask it about
1:41:50 it. Well, you know, that’s, I mean, that presumes, and there’s some evidence for this that some
1:41:58 large language models are implementing something like dependency grammar inside them. And so
1:42:02 there’s work from a guy called Chris Manning and colleagues over at Stanford in natural language.
1:42:09 And they looked at, I don’t know how many large language model types, but certainly Burt and
1:42:14 some others, where you do some kind of fancy math to figure out exactly what kind of abstractions
1:42:22 of representations are going on. And they were saying it does look like dependency structure is
1:42:26 what they’re constructing. It doesn’t, like so it’s actually a very, very good map. So kind of a,
1:42:31 they are constructing something like that. Does it mean that, you know, that they’re using that
1:42:37 for meaning? I mean, probably, but we don’t know. You write that the kinds of theories of language
1:42:42 that LLMs are closest to are called construction based theories. Can you explain what construction
1:42:48 based theories are? It’s just a general theory of language such that there’s a form and a meaning
1:42:54 pair for, for lots of pieces of the language. And so it’s, it’s, it’s primarily usage based is a
1:43:01 construction grammar. It’s just, it’s trying to deal with the things that people actually say,
1:43:06 actually say and actually write. And so that’s, it’s a usage based idea. And what’s the constructional
1:43:12 construction is either a simple word, so of like a morpheme plus its meaning or a combination of
1:43:17 words, it’s basically combinations of words, like the rules. So, but it’s, it’s, it’s
1:43:24 un, un specified as to what the form of the grammar is under underlying Lee. And so I would, I would
1:43:32 argue that the dependency grammar is maybe the right form to use for the types of construction
1:43:38 grammar. Construction grammar typically isn’t kind of formalized quite. And so maybe the formalization,
1:43:44 a formalization of that, it might be in dependency grammar. I mean, I, I would think so. But I mean,
1:43:52 it’s up to people, other researchers in that area, if they agree or not. So.
1:43:55 Do you think that large language models understand language? Are they mimicking language? I guess
1:44:03 the deeper question there is, are they just understanding the surface form? Or do they
1:44:07 understand something deeper about the meaning that then generates the form?
1:44:12 I mean, I would argue they’re doing the form. They’re doing the form, they’re doing it really,
1:44:16 really well. And are they doing the meaning? No, probably not. I mean, there’s lots of these
1:44:21 examples from various groups showing that they can be tricked in all kinds of ways. They really
1:44:26 don’t understand the, the meaning of what’s going on. And so there’s a lot of examples that he and
1:44:31 other groups have given, which just, which show they don’t really understand what’s going on.
1:44:36 So, you know, the Monty Hall problem is this silly problem, right? Where, you know, if you
1:44:42 have three door, it’s less make a deal as this old game show. And there’s three doors, and there’s
1:44:48 a prize behind one, and there’s some junk prizes behind the other two, and you’re trying to select
1:44:54 one. And if you, you know, he knows Monty, he knows where the target item is, the good thing.
1:45:00 He knows everything is back there. And you’re supposed to, he gives you a choice. You choose
1:45:05 one of the three. And then he opens one of the doors, and it’s some junk prize. And then the
1:45:09 question is, should you trade to get the other one? And, and the answer is yes, you should trade,
1:45:12 because he knew which ones you could turn around. And so now the odds are two thirds, okay.
1:45:17 And then you just change that a little bit to the large language mall, the large language malls,
1:45:22 seeing that, that, that explanation so many times that it just, if you change the story, it’s a
1:45:27 little bit, but it makes it sound like it’s the Monty Hall problem, but it’s not. You just say,
1:45:31 oh, there’s three doors and one behind them is a good prize. There’s two bad doors. I happen to
1:45:37 know it’s behind door number one. The good prize, the car is behind door number one. So I’m going
1:45:41 to choose door number one. Monty Hall opens door number three and shows me nothing there. Should
1:45:45 I trade for door number two? Even though I know the good prize in door number one, and then the
1:45:50 large language malls say, yes, you should trade because it’s a, it’s, it just goes through the,
1:45:54 the, the, the forms that it’s seen before so many times on these cases where it, yes, you
1:46:00 should trade because, you know, your odds have shifted from one in three now to two out of three
1:46:04 to being that thing. It doesn’t have any way to remember that actually you have 100% probability
1:46:10 behind that door number one. You know that. That’s not part of the, of the, the scheme that it’s seen
1:46:15 hundreds and hundreds of times before. And so you can’t, you can’t, even if you try to explain to
1:46:20 it that it’s wrong, that they can’t do that. It’ll just keep giving you back the, the problem.
1:46:25 But it’s also possible the larger language model would be aware of the fact that there’s sometimes
1:46:29 over a representation of a, of a particular kind of formulation. And it’s easy to get tricked by
1:46:37 that. And so you could see if they get larger and larger models be a little bit more skeptical.
1:46:44 So you see a over representation. So like you, it just feels like form can,
1:46:50 training on form can go really far in terms of being able to generate things that look like
1:47:00 the thing understands deeply the underlying world, world model of the kind of mathematical world,
1:47:10 physical world, psychological world that would generate these kinds of sentences.
1:47:16 It just feels like you’re creeping close to the meaning part, easily fooled, all this kind of
1:47:22 stuff, but that’s humans too. So it just seems really impressive how often it seems like it
1:47:31 un-understands concepts. I mean, you don’t have to convince me of that. I’m, I am very,
1:47:37 very impressed, but does it, does do, I mean, you’re, you’re giving a possible world where maybe
1:47:43 someone’s going to train some other versions such that it’ll be somehow abstracting away from types
1:47:48 of forms. I mean, I don’t think that’s happened. And so, well, no, no, no, no, I’m not saying that.
1:47:54 I think when you just look at anecdotal examples and just showing a large number of them where it
1:47:59 doesn’t seem to understand and it’s easily fooled, that does not seem like a scientific,
1:48:04 the data driven like analysis of like how many places is a damn impressive in terms of meaning
1:48:13 and understanding and how many places is easily fooled. And like that’s not the inference. So I
1:48:18 don’t want to make that, the inference I don’t, I wouldn’t want to make was that inference. The
1:48:21 inference I’m trying to push is just that is it, is it like humans here? It’s probably not like
1:48:26 humans here. It’s different. So humans don’t make that error. If you explain that to them,
1:48:31 they’re not going to make that error. You know, they don’t make that error. And so that’s something,
1:48:34 it’s doing something different from humans that they’re doing in that case.
1:48:38 What’s the mechanism by which humans figure out that it’s an error?
1:48:42 I’m just saying the error there is like, if I explain to you, there’s 100% chance
1:48:45 that the car is behind this case, this door, well, do you want to trade? If you’ll say no.
1:48:51 But this thing will say yes, because it’s so, that trick, it’s so wound up on the form
1:48:57 that it’s, that’s an error that a human doesn’t make, which is kind of interesting.
1:49:03 Less likely to make, I should say.
1:49:04 Yeah, less likely.
1:49:05 Because like humans are very…
1:49:07 Oh yeah. I mean, you’re asking, you know, you’re asking humans to, you’re asking a system to
1:49:13 understand 100%, like you’re asking some mathematical concepts. And so like…
1:49:18 Look, the places where large language models are, the form is amazing. So let’s go back to nested
1:49:26 structures, center embedded structures. Okay. If you ask a human to complete those, they can’t do it.
1:49:30 Neither can a large language model. They’re just like humans in that. If you ask, if I ask a large
1:49:35 language model… That’s fascinating, by the way.
1:49:37 That central embedding, the central embedding struggles with…
1:49:41 Just like humans, exactly like humans. Exactly the same way as humans. And that’s not trained.
1:49:46 So they do exactly… So that is the similarity. So, but then it’s, that’s not meaning, right?
1:49:53 This is form. But when we get into meaning, this is where they get kind of messed up.
1:49:57 When you start to saying, oh, what’s behind this door? Oh, it’s, you know, this is the thing I want.
1:50:02 Humans don’t mess that up as much. You know, here, the form is, it’s just like the form of the match
1:50:08 is amazing. It’s similar without being trained to do that. I mean, it’s trained in the sense
1:50:13 that it’s getting lots of data, which is just like human data, but it’s not being trained on,
1:50:17 you know, bad sentences and being told what’s bad. It just can’t do those. It’ll actually
1:50:23 say things like, those are too hard for me to complete or something, which is kind of interesting.
1:50:28 Actually kind of, how does it know that? I don’t know. But it really often doesn’t just
1:50:33 complete, very often says stuff that’s true and sometimes says stuff that’s not true.
1:50:42 And almost always the form is great. But it’s still very surprising that with really great
1:50:50 form, it’s able to generate a lot of things that are true based on what is trained on and so on.
1:50:56 So it’s not just form that is generating. It’s mimicking true statements from the internet.
1:51:06 I guess the underlying idea there is that on the internet, truth is overrepresented versus
1:51:12 falsehood. I think that’s probably right. Yeah. So, but the fundamental thing is trained on,
1:51:16 you’re saying is just form. I think so. Yeah, I think so. Well, that’s a sad, if that’s, to me,
1:51:24 that’s still a little bit of open question. I probably lean agreeing with you, especially
1:51:31 now you just blown my mind that there’s a separate module in the brain for language versus thinking.
1:51:36 Maybe there’s a fundamental part missing from the large language model approach
1:51:42 that lacks the thinking, the reasoning capability. Yeah, that’s what this group argues. So the same
1:51:51 group, Federenko’s group, has a recent paper arguing exactly that. There’s a guy called Kyle
1:51:57 Mahwell who’s here in Austin, Texas, actually. He’s an old student of mine, but he’s a faculty
1:52:03 in linguistics at Texas. And he was the first author on that. That’s fascinating. Still,
1:52:08 to me, an open question. Yeah. What do you have the interesting limits of LLMs?
1:52:12 You know, I don’t see any limits to their form. Their form is perfect. Impressive.
1:52:19 Yeah, it’s pretty much, I mean, it’s close to… Well, you said ability to complete central
1:52:23 embeddings. Yeah, it’s just the same as humans. It seems the same as humans. But that’s not
1:52:27 perfect, right? That’s good. No, but I want to be like humans. I’m trying to, I want a model of
1:52:32 humans. Oh, wait, wait, wait, wait. Oh, so perfect is as close to humans as possible. I got it. Yeah.
1:52:39 But you should be able to, if you’re not human, you’re like you’re superhuman, you should be able
1:52:43 to complete central embedded sentences, right? I mean, that’s the mechanism is, if it’s modeling
1:52:50 some, I think it’s kind of really interesting that it’s more like, like I think it’s potentially
1:52:57 underlyingly modeling something like what the way the form is processed. The form of human language.
1:53:02 The way that you… And how humans process the language. Yes. Yes. I think that’s plausible.
1:53:07 And how they generate language. Process language and general language, that’s fascinating.
1:53:12 So in that sense, they’re perfect. If we can just linger on the center embedding
1:53:17 thing. That’s hard for LLM’s produce. And that seems really impressive because that’s hard for
1:53:22 humans to produce. And how does that connect to the thing we’ve been talking about before,
1:53:28 which is the dependency grammar framework in which you view language and the finding that
1:53:34 short dependencies seem to be a universal part of language. So why is it hard to complete center
1:53:41 embeddings? So what I like about dependency grammar is it makes the cognitive cost associated
1:53:50 with longer distance connections very transparent. Basically, there’s some… It turns out there is
1:53:56 a cost associated with producing and comprehending connections between words, which are just not
1:54:02 beside each other. The further apart they are, the worse it is that according to… Well,
1:54:07 we can measure that. And there is a cost associated with that. Can you just linger on what do you
1:54:12 mean by cognitive cost? Sure. And how do you measure it? Oh, well, you can measure it in a lot
1:54:16 of ways. The simplest is just asking people to say whether… How good a sentence sounds.
1:54:22 We just ask… That’s one way to measure. And you try to triangulate then across sentences and
1:54:28 across structures to try to figure out what the source of that is. You can look at reading times
1:54:34 in controlled materials and certain kinds of materials. And then we can measure the
1:54:39 dependency distances there. There’s a recent study which looked at… We’re talking about
1:54:46 the brain here. We could look at the language network. We could look at the language network
1:54:51 and we could look at the activation in the language network and how big the activation
1:54:56 is depending on the length of the dependencies. And it turns out in just random sentences that
1:55:00 you’re listening to. If you’re listening to… So it turns out there are people listening to stories
1:55:03 here. And the longer the dependency is, the stronger the activation in the language network.
1:55:13 And so there’s some measure… There’s a different… There’s a bunch of different measures we could
1:55:16 do. That’s a kind of a neat measure actually of actual… Activations. Activation in the brain.
1:55:21 So you can somehow in different ways convert it to a number. I wonder if there’s a beautiful
1:55:26 equation connecting cognitive costs and length of dependency. E equals MC squared kind of thing.
1:55:30 Yeah. It’s complicated, but probably it’s doable. I would guess it’s doable. I tried to do that a
1:55:36 while ago and I was reasonably successful, but for some reason I stopped working on that. I
1:55:42 agree with you that it would be nice to figure out… So there’s some way to figure out the cost.
1:55:47 I mean, it’s complicated. Another issue you raised before was how do you measure distance?
1:55:52 Is it words? It probably isn’t. Is it part of the problem? Is that some words matter
1:55:57 than more than others? And probably… Meaning nouns might matter depending… And then it maybe
1:56:03 depends on which kind of noun. Is it a noun we’ve already introduced or a noun that’s already been
1:56:07 mentioned? Is it a pronoun versus a name? All these things probably matter. So probably the
1:56:12 simplest thing to do is just like, “Oh, let’s forget about all that and just think about words
1:56:17 or more themes.” For sure, but there might be some insight in the kind of function that fits
1:56:26 the data, meaning quadratic. I think it’s an exponential. We think it’s probably an exponential
1:56:33 such that the longer the distance, the less it matters. And so then it’s the sum of those.
1:56:39 That was our best guess a while ago. So you’ve got a bunch of dependencies. If you’ve got a
1:56:43 bunch of them that are being connected at some point, at the ends of those, the cost is some
1:56:50 exponential function of those, is my guess. But because the reason it’s probably an exponential
1:56:56 is it’s not just the distance between two words. Because I can make a very, very long subject,
1:57:00 verb depends by adding lots and lots of noun phrases and prepositional phrases. And it doesn’t
1:57:05 matter too much. It’s when you do nest it, when I have multiple of these, then things
1:57:12 get go really bad, go south. That’s probably somehow connected to working memory or something like
1:57:16 this. Yeah, that’s probably the function of the memory here is the access, is trying to find those
1:57:22 earlier things. It’s kind of hard to figure out what was referred to earlier, those are those
1:57:27 connections. That’s the sort of notion of working, as opposed to a storagey thing, but trying to
1:57:32 connect, retrieve those earlier words depending on what was in between. And then we’re talking about
1:57:39 interference of similar things in between. That’s the right theory probably has that kind of notion
1:57:44 and it is an interference of similar. And so I’m dealing with an abstraction over the right theory,
1:57:48 which is just, let’s count words, it’s not right, but it’s close. And then maybe you’re right though,
1:57:53 there’s some sort of an exponential or something to figure out the total so we can figure out a
1:57:59 function for any given sentence in any given language. But it’s funny, people haven’t done
1:58:04 that too much, which I do think is, I’m interested that you find that interesting. I really find
1:58:10 that interesting. And a lot of people haven’t found it interesting. And I don’t know why I haven’t
1:58:14 got people to want to work on that. I really like that too. That’s a beautiful. And the underlying
1:58:19 idea is beautiful that there’s a cognitive cost that correlates with the length of dependency.
1:58:23 It feels like language is so fundamental to the human experience. And this is a nice clean
1:58:31 theory of language where it’s like, wow, okay. So we like our words close together,
1:58:38 dependent words close together. That’s why I like it too. It’s so simple.
1:58:42 Yeah, the simplicity of the theory. And yet it explains some very complicated phenomena.
1:58:47 If I write these very complicated sentences, it’s kind of hard to know why they’re so hard.
1:58:51 And you can like, oh, nail it down. I can give you a math formula for why each one
1:58:56 of them is bad and where. And that’s kind of cool. I think that’s very neat.
1:59:00 Have you gone through the process? Is there like a, if you take a piece of text and then simplify
1:59:05 sort of like there’s an average length of dependency and then you like,
1:59:10 you know, reduce it and see comprehension on the entire, not just a single sentence, but
1:59:16 like, you know, you go from James Joyce to Hemingway or something.
1:59:19 No, no, simple answer is no. There’s probably things you can do in that kind of direction.
1:59:26 That’s fun. We might, you know, we’re going to talk about legalese at some point.
1:59:30 And so maybe we’ll talk about that kind of thinking with applied to legalese.
1:59:35 Well, let’s talk about legalese because you mentioned that as an exception,
1:59:37 which is taking a tangent upon tangent. That’s an interesting one. You give it as an exception.
1:59:42 It’s an exception.
1:59:43 That you say that most natural languages, as we’ve been talking about,
1:59:48 have local dependencies with one exception, legalese.
1:59:52 That’s right.
1:59:53 So what is legalese, first of all?
1:59:55 Oh, well, legalese is what you think it is. It’s just any legal language.
1:59:59 I mean, like I actually know very little about the kind of language that lawyers use.
2:00:04 So I’m just talking about language in laws and language in contracts.
2:00:08 So the stuff that you have to run into, we have to run into every other day or every day
2:00:13 and you skip over because it reads poorly and or, you know, partly it’s just long, right?
2:00:20 There’s a lot of text there that we don’t really want to know about.
2:00:23 And so, but the thing I’m interested in, so I’ve been working with this guy called
2:00:29 Eric Martinez, who is a, he was a lawyer who was taking my class.
2:00:33 I was teaching a psycholinguistics lab class and I have been teaching it for a long time
2:00:37 at MIT and he’s a, he was a law student at Harvard and he took the class because he had
2:00:41 done some linguistics as an undergrad and he was interested in the problem of why legalese
2:00:46 sounds hard to understand, you know, why and so why is it hard to understand
2:00:51 and why do they write that way if it is hard to understand.
2:00:55 It seems apparent that it’s hard to understand.
2:00:57 The question is why is it?
2:00:58 And so we didn’t know and we did an evaluation of a bunch of contracts.
2:01:04 Actually, we just took a bunch of sort of random contracts because I don’t know,
2:01:08 you know, there’s contracts and laws might not be exactly the same, but
2:01:11 contracts are kind of the things that most people have to deal with most of the time.
2:01:15 And so that’s kind of the most common thing that humans have,
2:01:18 like humans, that adults in our industrialized society have to deal with a lot.
2:01:24 And so that’s what we pulled and we didn’t know what was hard about them,
2:01:28 but it turns out that the way they’re written is very center embedded,
2:01:33 has nested structures in them.
2:01:34 So it has low frequency words as well.
2:01:36 That’s not surprising.
2:01:37 Lots of texts have low, it does have surprising,
2:01:40 slightly lower frequency words than other kinds of control texts,
2:01:44 even sort of academic texts.
2:01:46 Legalese is even worse.
2:01:47 It is the worst that we weren’t being able to find.
2:01:49 You just reveal the game that lawyers are playing.
2:01:52 They’re optimizing it different.
2:01:54 Well, you know, it’s interesting.
2:01:55 That’s like, now you’re getting at why.
2:01:57 And so, and I don’t think, it’s on your thing, they’re doing intentionally.
2:02:00 I don’t think they’re doing intentionally.
2:02:01 But let’s, let’s, let’s get to it.
2:02:03 It’s an emergent phenomena.
2:02:04 Yeah, yeah, yeah.
2:02:05 We’ll get to that.
2:02:06 We’ll get to that.
2:02:06 And so, but we wanted to see why, so we see what first as opposed.
2:02:10 So like, it turns out that we’re not the first to observe that legalese is weird.
2:02:14 Like back to Nixon had a plain language act in 1970 and, and Obama had one.
2:02:21 And boy, a lot of these, you know, a lot of her presidents have said,
2:02:25 oh, we’ve got to simplify legal language, must simplify.
2:02:28 But if you don’t know how it’s complicated, it’s not easy to simplify it.
2:02:32 You need to know what it is you’re supposed to do before you can fix it.
2:02:35 Right.
2:02:35 And so you need to like, you need a cycle linguist to analyze the text and see what’s
2:02:39 wrong with it before you can like fix it.
2:02:42 You don’t know how to fix it.
2:02:42 How am I supposed to fix something?
2:02:43 I don’t know what’s wrong with it.
2:02:45 And so what we did was just, that’s what we did.
2:02:47 We figured out, well, that’s okay.
2:02:48 We just had a bunch of contracts, had people, and we encoded them for the bunch of features.
2:02:54 And so another feature of the people, one of them was the center embedding.
2:02:57 And so that is like basically how often a, a clause would, would, would intervene between
2:03:05 a subject and a verb, for example, that’s one kind of a center embedding of a clause.
2:03:08 Okay.
2:03:09 And turns out they’re massively center embedded.
2:03:12 Like, so I think in random contracts and in random laws, I think you get about 70% or
2:03:18 80, something like 70% of sentences have a center embedded clause, which is insanely high.
2:03:23 If you go to any other text, it’s down to 20% or something.
2:03:26 It’s, it’s, it’s so much higher than any control you can think of, including, you think, oh,
2:03:31 people think, oh, technical, um, academic texts.
2:03:34 No, people don’t write center embedded sentences in, in technical academic texts.
2:03:38 I mean, they do a little bit, but much, it’s, it’s on the 20%, 30% realm,
2:03:41 as opposed to 70.
2:03:42 And so, and so there’s that, and, and there’s low frequency words.
2:03:45 And then people, oh, maybe it’s passive.
2:03:47 People don’t like the passive, passive, for some reason, the passive voice in English
2:03:51 has a bad rap.
2:03:52 And I’m not really sure where that comes from.
2:03:54 And, and there is a lot of passive in the, there’s much more passive voice in the, in the,
2:04:01 in legalese than there is in other texts.
2:04:03 And the passive voice accounts for some of the low frequency words.
2:04:05 No, no, no, no, those are separate.
2:04:07 Those are separate.
2:04:08 Oh, so passive voice sucks.
2:04:09 That’s really easy.
2:04:09 Low frequency word sucks.
2:04:10 Well, sucks are different.
2:04:11 So these are different.
2:04:12 That’s a judgment on passive.
2:04:13 Yeah, yeah, yeah, pass the, drop the judgment.
2:04:15 It’s just like, these are frequent.
2:04:16 These are things which happen in legalese texts.
2:04:18 Then we can ask the dependent measure is like,
2:04:21 how well you understand those things with those features.
2:04:24 Okay.
2:04:24 And so then, and it turns out the passive makes no difference.
2:04:27 So it has a zero effect on your comprehension ability, on your recall ability.
2:04:31 No, nothing at all.
2:04:32 That means no effect.
2:04:33 Your, the words matter a little bit.
2:04:35 They do low frequency words are going to hurt you in recall and understanding.
2:04:39 But what really, what really hurts is the central embedding.
2:04:42 That kills you.
2:04:43 That is like, that slows people down.
2:04:45 That makes them, that makes them very, very poor at understanding.
2:04:48 That makes them, they, they, they can’t recall what was said as well, nearly as well.
2:04:52 And we, we did this not only on lay people.
2:04:54 We didn’t have a lot of lay people.
2:04:56 We ran it on a hundred lawyers.
2:04:57 We recruited lawyers from a, from a wide range of, of sort of different levels
2:05:04 of law firms and stuff.
2:05:05 And they have the same pattern.
2:05:07 So they also, like, when, when, when they did this, I did not know it happened.
2:05:12 I thought maybe they could process, they’re used to legally.
2:05:14 So they can process it just as well as it was normal.
2:05:17 No, no, they, they, they’re much better than lay people.
2:05:21 So they’re much, like, they can much better recall, much better understanding,
2:05:24 but they have the same main effects as, as, as lay people, as lay people, exactly the same.
2:05:28 So they also much prefer the non-centered.
2:05:31 So we, we, we constructed non-centered embedded versions of each of these.
2:05:34 We constructed versions which have higher frequency words in those places.
2:05:39 And we, we did, we un-un-un-passivized, we turned them into active versions.
2:05:43 The passive active made no difference.
2:05:46 The words made a little difference.
2:05:48 And the un-centered embedding makes, makes big differences in all the populations.
2:05:52 Un-centered embedding.
2:05:53 How hard is that process, by the way?
2:05:54 Not very hard.
2:05:55 The society don’t question, but how hard is it to detect center embedding?
2:05:58 Oh, easy, easy to detect.
2:06:00 You’re just looking at long dependencies, or is there a real?
2:06:02 You can just, you can, so there’s automatic parsers for English, which are pretty good.
2:06:06 And they can detect center embedding.
2:06:07 Oh yeah.
2:06:08 Very.
2:06:08 Or, I guess, nested.
2:06:09 Perfectly.
2:06:10 Yeah, you, you’ve learned, yeah, pretty much.
2:06:12 So you, you’re not just looking for long dependencies.
2:06:14 You’re just literally looking for center embedding.
2:06:15 Yeah, yeah, we are in this case, in these case, but long dependencies are,
2:06:18 they’re highly correlated.
2:06:19 So like a center embedding is a, is a big bomb you throw inside, inside of a sentence
2:06:24 that just blows up the, that, that makes.
2:06:26 Yeah, yeah.
2:06:27 Can I read a sentence for you from these things?
2:06:29 Sure.
2:06:30 I see, I can find, I mean, this is just like one of the things that,
2:06:32 this is just.
2:06:32 My eyes, my glaze over in middle, mid sentence.
2:06:35 No, I understand that.
2:06:37 I mean, legalese is hard.
2:06:40 This is a go, because in the event that any payment or benefit by the company,
2:06:43 all such payments and benefits, including the payments and benefits under section 3a
2:06:47 here of being here at, here and after referred to as a total payment,
2:06:50 would be subject to the excise tax, then the cash severance payments shall be reduced.
2:06:55 So that’s something we pulled from a regular text, from a, from a contract.
2:06:58 Wow.
2:06:59 And, and, and the center embedded bit there is just, for some reason, there’s a definition.
2:07:03 They throw the definition of what payments and benefits are in between the subject and the verb.
2:07:09 Let’s, how about don’t do that?
2:07:11 Yeah.
2:07:11 How about put the definition somewhere else, as opposed to in the middle of the sentence.
2:07:15 And so that’s, that’s very, very common, by the way.
2:07:18 That’s, that’s what happens.
2:07:19 So you just throw your definitions, you use a word, a couple words, and then you define it,
2:07:24 and then you continue the sentence.
2:07:25 Like just don’t write like that.
2:07:27 And, and you ask, so when we asked lawyers, we thought, oh, maybe lawyers like this.
2:07:31 Lawyers don’t like this.
2:07:31 They don’t like this.
2:07:33 They don’t want to, they don’t want to write like this.
2:07:35 They, they, we asked them to rate materials which are with the same meaning
2:07:39 with, with uncentred bed and center bed, and they much preferred the uncentred bed versions.
2:07:45 On the comprehension, on the reading side.
2:07:46 Yeah.
2:07:47 Well, and we asked them, we asked them, would you hire someone who writes like this or this?
2:07:50 We asked them all kinds of questions.
2:07:52 And they always preferred the less complicated version, all of them.
2:07:56 So I don’t even think they want it this way.
2:07:58 Yeah, but how did it happen?
2:07:59 How did it happen?
2:07:59 That’s a very good question.
2:08:01 And, and the answer is, they still don’t know.
2:08:03 But I have some theories.
2:08:06 Well, our, our best theory at the moment is that there’s, there’s actually some kind of a
2:08:11 performative meaning in the center embedding, in the style which tells you it’s legalese.
2:08:16 We think that that’s the kind of a style which tells you it’s legalese.
2:08:20 Like that’s a, it’s a reasonable guess.
2:08:22 And maybe it’s just, so for instance, if you’re like, it’s like,
2:08:26 a magic spell.
2:08:27 So we kind of call this the magic spell hypothesis.
2:08:29 So when you give them, when you tell someone to put a magic spell on someone, what do you do?
2:08:33 They, you know, people know what a magic spell is and they, they do a lot of rhyming.
2:08:38 You know, that’s, that’s kind of what people will tend to do.
2:08:40 They’ll do rhyming and they’ll do sort of like some kind of poetry kind of thing.
2:08:43 Abracadabra type of thing.
2:08:44 Yeah.
2:08:45 And maybe that’s, there’s a syntactic sort of a reflex here of a, of a magic spell,
2:08:51 which is center embedding.
2:08:52 And so that’s like, oh, it’s trying to like tell you this is like, this is something which is true,
2:08:57 which is what the goal of law, law is, right?
2:08:59 Is telling you something that we want you to believe as certainly true, right?
2:09:04 That’s, that’s what legal contracts are trying to enforce on you, right?
2:09:07 And so maybe that’s like a form which has, this is like an abstract, very abstract form,
2:09:13 center embedding, which has a, has a, has a meaning associated with it.
2:09:16 Well, don’t you think there’s an incentive
2:09:20 for lawyers to generate things that are hard to understand?
2:09:24 That was our, one of our working hypotheses.
2:09:26 We just couldn’t find any evidence of that.
2:09:28 No, lawyers also don’t understand it.
2:09:30 But you’re creating space.
2:09:32 Why you yourself, but I mean, you ask in a communist Soviet union, the individual members,
2:09:39 their self-report is not going to correctly reflect what is broken about the gigantic bureaucracy
2:09:47 that leads to Chernobyl or something like this.
2:09:49 I think the incentives under which you operate are not always transparent
2:09:55 to the members within that system.
2:09:59 So like, it just feels like a strange coincidence that like, there is benefit
2:10:05 if you just zoom out, look at the system, as opposed to asking individual lawyers
2:10:09 that making something hard to understand is going to make a lot of people money.
2:10:14 Yeah.
2:10:15 Like there’s going to, you’re going to need a lawyer
2:10:17 to figure that out, I guess, from the perspective of the individual.
2:10:21 But then that could be the performative aspect.
2:10:23 It could be as opposed to the incentive driven to be complicated.
2:10:26 It could be performative to where we lawyers speak in this sophisticated way
2:10:31 and you regular humans don’t understand it, so you need to hire a lawyer.
2:10:35 Yeah, I don’t know which one it is, but it’s suspicious.
2:10:37 Suspicious that it’s hard to understand and everybody’s eyes glaze over and they don’t read.
2:10:43 I’m suspicious as well.
2:10:45 I’m still suspicious and I hear what you’re saying.
2:10:47 It could be kind of a no individual and even average of individuals.
2:10:50 It could just be a few bad apples in a way which are driving the effect in some way.
2:10:55 Influential bad apples at the sort of, that everybody looks up to,
2:11:00 whatever their like central figures and how, you know.
2:11:04 But it turns out, but it is kind of interesting that among our hundred lawyers,
2:11:08 they did not share that.
2:11:09 They didn’t want this, that’s fascinating.
2:11:11 They really didn’t like it.
2:11:12 And they weren’t better at than regular people at comprehending it.
2:11:16 Or they were on average better, but they had the same difference.
2:11:20 The exact same difference.
2:11:21 But they wanted it fixed.
2:11:23 And so that gave us hope that because it actually isn’t very hard to construct a material,
2:11:32 which is uncenter embedded and has the same meaning, it’s not very hard to do.
2:11:36 Just basically in that situation, just putting definitions outside of the subject
2:11:39 verb relation in that particular example, and that’s kind of, that’s pretty general.
2:11:43 What they’re doing is just throwing stuff in there, which you didn’t have to put in there.
2:11:46 There’s extra words involved.
2:11:48 Typically, you may need a few extra words sort of to refer to the things that you’re
2:11:53 defining outside in some way, because if you only use it in that one sentence,
2:11:57 then there’s no reason to introduce extra terms.
2:12:01 So we might have a few more words, but it’ll be easier to understand.
2:12:05 So, I mean, I have hope that now that maybe we can make legalese less convoluted in this way.
2:12:13 So maybe the next president in the United States can, instead of saying generic things,
2:12:18 say, “I ban center embeddings and make Ted the language czar of the U.S.”
2:12:26 Like Eric Martinez is the guy you should really put in there.
2:12:30 Eric Martinez, yeah, yeah, yeah.
2:12:32 But center embeddings are the bad thing to have.
2:12:36 That’s right.
2:12:36 So you can get rid of that.
2:12:38 That’ll do a lot of it.
2:12:39 That’ll fix a lot.
2:12:40 That’s fascinating.
2:12:41 That is so fascinating.
2:12:42 And it’s just really fascinating on many fronts that humans are just not able to
2:12:47 deal with this kind of thing.
2:12:48 And that language, because of that involved in the way you did, it’s fascinating.
2:12:51 So one of the mathematical formulations you have when talking about languages
2:12:57 communication is this idea of noisy channels.
2:13:00 What’s a noisy channel?
2:13:03 So that’s about communication.
2:13:06 And so this is going back to Shannon.
2:13:08 So Shannon, Claude Shannon was a student at MIT in the ’40s.
2:13:13 And so he wrote this very influential piece of work about communication theory or information
2:13:19 theory.
2:13:20 And he was interested in human language, actually.
2:13:23 He was interested in this problem of communication, of getting a message from
2:13:29 my head to your head.
2:13:31 And so he was concerned or interested in what was a robust way to do that.
2:13:38 And so assuming we both speak the same language, we both already speak English,
2:13:43 whatever the language is, we speak that.
2:13:45 What is a way that I can say the language so that it’s most likely to get the signal
2:13:52 that I want to you.
2:13:54 And so and then the problem there in the communication is the noisy channel.
2:13:58 Is that there’s a lot of noise in the system.
2:14:02 I don’t speak perfectly.
2:14:04 I make errors.
2:14:05 That’s noise.
2:14:06 There’s background noise.
2:14:08 You know that.
2:14:09 Like a literal background noise.
2:14:11 There is like white noise in the background or some other kind of noise.
2:14:14 There’s some speaking going on that you’re at a party.
2:14:18 That’s background noise.
2:14:19 You’re trying to hear someone.
2:14:20 It’s hard to understand them because there’s all those other stuff going on in the background.
2:14:23 And then there’s noise on the receiver side so that you have some problem maybe understanding
2:14:31 me for stuff that’s just internal to you in some way.
2:14:34 So you’ve got some other problems, whatever, with understanding for whatever reasons.
2:14:38 Maybe you’ve had too much to drink.
2:14:41 You know, who knows why you’re not able to pay attention to the signal.
2:14:44 So that’s the noisy channel.
2:14:45 And so that language, if it’s communication system, we are trying to optimize in some sense
2:14:52 the passing of the message from one side to the other.
2:14:55 And so I mean, one idea is that maybe, you know, aspects of like word order,
2:15:02 for example, might have optimized in some way to make language a little more easy
2:15:07 to be passed from speaker to listener.
2:15:09 And so Shannon’s the guy that did the stuff way back in the forties.
2:15:12 You know, it’s very interesting, you know, historically, he was interested in working
2:15:15 in linguistics.
2:15:17 He was in MIT and he did, this is his master’s thesis of all things.
2:15:20 You know, it’s crazy how much he did for his master’s thesis in 1948, I think,
2:15:25 or ’49 or something.
2:15:26 And he wanted to keep working in language and it just wasn’t a popular communication
2:15:32 as a reason, a source for what language was, wasn’t popular at the time.
2:15:36 So Chomsky was becoming, it was moving in there.
2:15:39 He was, and he just wasn’t able to get a handle there, I think.
2:15:41 And so he moved to Bell Haps and worked on communication from a mathematical point of
2:15:48 view and was, you know, did all kinds of amazing work.
2:15:51 And so he’s just more on the signal side versus like the language side.
2:15:54 Yeah, it would have been interesting to see if you proceed the language side.
2:15:58 That’s really interesting.
2:16:00 He was interested in that.
2:16:01 His examples in the forties are kind of like, they’re very language-like things.
2:16:08 We can kind of show that there’s a noisy channel process going on in when you’re
2:16:12 listening to me, you know, you can often sort of guess what I meant by what I, you know,
2:16:17 what you think I meant given what I said.
2:16:19 And I mean, with respect to sort of why language looks the way it does, we might,
2:16:24 there might be sort of, as I alluded to, there might be ways in which word orders
2:16:29 is somewhat optimized for, because of the noisy channel in some way.
2:16:33 I mean, that’s really cool to sort of model if you don’t hear certain parts of a sentence
2:16:38 or have some probability of missing that part.
2:16:40 Like how do you construct a language that’s resilient to that?
2:16:43 That’s somewhat robust to that.
2:16:44 Yeah, that’s the idea.
2:16:45 And then you’re kind of saying like the word order and the syntax of the language,
2:16:49 the dependency length are all helpful.
2:16:53 Yeah.
2:16:54 Well, dependency length is really about memory.
2:16:57 I think that’s like about sort of what’s easier or harder to produce in some way.
2:17:00 And these other ideas are about sort of robustness to communication.
2:17:04 So the problem of potential loss of loss of signal due to noise.
2:17:08 And so that there might be aspects of word order, which is somewhat optimized for that.
2:17:13 And, you know, we have this one guest in that direction.
2:17:16 These are kind of just so stories.
2:17:18 I have to be, you know, pretty frank, they’re not like, I can’t show this is true.
2:17:21 All we can do is like, look at the current languages of the world.
2:17:24 This is like, we can’t sort of see how languages change or anything
2:17:26 because we’ve got these snapshots of a few, you know, 100 or a few thousand languages.
2:17:31 We don’t really, we can’t do the right kinds of modifications to test these things experimentally.
2:17:37 And so, you know, so just take this with a grain of salt, okay, from here, this stuff.
2:17:41 The dependency stuff, I can, I’m much more solid on.
2:17:44 I’m like, here’s what the lengths are, and here’s what’s hard, here’s what’s easy.
2:17:47 And this is a reasonable structure.
2:17:49 I think I’m pretty reasonable.
2:17:50 Here’s like, why, you know, why does a word order look the way it does?
2:17:54 Is we’re now into shaky territory, but it’s kind of cool.
2:17:57 But we’re talking about, just to be clear, we’re talking about maybe just actually the sounds of
2:18:01 communication, like you and I are sitting in the bar, it’s very loud.
2:18:05 And you model with a noisy channel, the loudness, the noise.
2:18:11 And we have the signal that’s coming across the, and you’re saying word order might have
2:18:15 something to do with optimizing that presence of noise.
2:18:19 It’s really interesting.
2:18:21 I mean, to me, it’s interesting how much you can load into the noisy channel,
2:18:24 like how much can you bake in?
2:18:26 You said like, you know, cognitive load on the receiver end.
2:18:29 We think that those are, there’s three, at least three different kinds of things going on there.
2:18:33 And we probably don’t want to treat them all as the same.
2:18:36 And so I think that you, you know, the right model, a better model of a noisy channel would
2:18:40 treat, would have three different sources of noise, which, because, which are background
2:18:44 noise, you know, speaker, speaker, um, inherent noise and listener inherent noise.
2:18:49 And those are not this, those are all different things.
2:18:51 Sure. But then underneath it, there’s a million other subsets.
2:18:54 Oh yeah. That’s true.
2:18:56 On the receiver, I mean, I just mentioned cognitive load on both sides.
2:19:00 Then there’s like, uh, speaking, uh, speech impediments or just everything.
2:19:05 World view, I mean, on the meeting, we start to creep into the meeting realm of like,
2:19:10 we have different world views.
2:19:11 Well, how about just form still though?
2:19:12 Like just, just what language do you know?
2:19:14 Like, so how well you know the language.
2:19:16 And so if it’s second language for you versus first language,
2:19:20 and in how, maybe what other languages you know, these are still just form stuff.
2:19:24 And that’s like potentially very informative.
2:19:26 And, and you know, how old you are, these things probably matter, right?
2:19:29 So like a child learning a language is, is a, you know, as a noisy representation of
2:19:35 English grammar, uh, you know, depending on how old they are.
2:19:38 So maybe when they’re six, they’re perfectly formed, but.
2:19:42 You mentioned one of the things is like a way to measure the, the, a language is learning problems.
2:19:48 So like, what’s the correlation between everything we’ve been talking about and
2:19:52 how easy it is to learn a language?
2:19:54 So is, is, uh, like, uh, short dependencies correlated to ability to learn a language?
2:20:02 Is there some kind of, or like the dependency grammars, there’s some kind of connection there?
2:20:08 How easy it is to learn?
2:20:10 Yeah. Well, all the languages in the world’s language, none is right now,
2:20:14 we know is any better than any other with respect to sort of optimizing dependency lengths,
2:20:18 for example, they’re all kind of do it, do it well.
2:20:21 They all keep low.
2:20:22 It’s, so the, I think of every human language is some kind of an opposite,
2:20:26 sort of an optimization problem, a complex optimization problem to this communication
2:20:31 problem. And so they’ve like, they’ve solved it, you know, they’re just sort of noisy solutions
2:20:36 to this problem of communication.
2:20:37 And there’s just so many ways you can do this.
2:20:40 So they’re not optimized for learning.
2:20:41 They’re probably less for communication.
2:20:43 And, and learning.
2:20:44 So yes, one of the factors, which is, yeah, so learning is messing this up a bit.
2:20:49 And so, so for example, if it were just about minimizing dependency lengths,
2:20:54 and that was all that matters, you know, then we, you know, so then,
2:20:57 then we might find grammars, which didn’t have regularity in their rules, like,
2:21:02 but languages always have regularity in their rules.
2:21:05 So, so what I mean by that is that if, if I wanted to say something to you in the,
2:21:09 in the optimal way to say it was, what really mattered to me, all that mattered was keeping
2:21:13 the dependencies as close together as possible, then I, then I would have a very lack set of
2:21:18 phrase structure or dependency rule that wouldn’t have very many of those.
2:21:21 I would have very little of that.
2:21:23 And I would just put the words as close to the things that refer to the things that
2:21:27 are connected right beside each other.
2:21:28 But we don’t do that.
2:21:29 Like there are, like there are word order rules, right?
2:21:32 So they’re very, and depending on the language, they’re more and less strict, right?
2:21:35 So you speak Russian, they’re less strict than English.
2:21:38 English is very rigid word order rules.
2:21:40 We order things in a very particular way.
2:21:43 And so why do we do that?
2:21:45 Like that’s probably not about communication.
2:21:48 That’s probably about learning.
2:21:49 I mean, then we’re talking about learning.
2:21:50 It’s probably easier to learn regular, regular things, things which are very predictable and
2:21:55 easy to, so that’s, that’s probably about learning is my, is our guess.
2:21:59 Cause that can’t be about communication.
2:22:00 Can it be just noise?
2:22:01 Can it be just the messiness of the development of a language?
2:22:06 Well, if it were just a communication, then we, we should have languages which have very,
2:22:09 very free word order.
2:22:10 And we don’t have that.
2:22:11 We have free err, but not free.
2:22:14 Like there’s always.
2:22:14 Well, no, but what I mean by noise is like cultural, like sticky cultural things,
2:22:20 like the way, the way you communicate, just there, there’s a stickiness to it.
2:22:24 That it’s, it’s an imperfect, it’s a noisy, it’s stochastic.
2:22:29 Yeah.
2:22:30 The, the, the function over which you’re optimizing is very noisy.
2:22:33 Yeah.
2:22:33 So, uh, because I don’t, it feels weird to say that learning is part of the objective
2:22:39 function because some languages are way harder to learn than others, right?
2:22:43 Or is that, that’s not true.
2:22:45 That’s interesting.
2:22:46 I mean, that’s the public perception, right?
2:22:48 Yes.
2:22:48 That’s true for a second language.
2:22:51 For a second language.
2:22:52 But that depends on what you started with, right?
2:22:54 So, so it’s, it really depends on how close that second language is to the first language
2:22:58 you’ve got.
2:22:59 And so yes, it’s very, very hard to learn Arabic if you’ve started with English or it’s
2:23:04 hard to, you know, hard to learn Japanese or if you’ve started with Chinese, I think
2:23:08 is the worst in the, there’s like Defense Language Institute in the United States has
2:23:12 like a list of, of, of how hard it is to learn what language from English.
2:23:17 I think Chinese is the worst.
2:23:18 But that’s just the second thing I see.
2:23:20 You’re saying babies don’t care.
2:23:21 No, no, there’s no evidence that there’s anything harder, easier about any baby,
2:23:25 any language learned, like by three or four, they speak that language.
2:23:29 And so there’s no evidence of any, anything harder, easier about any human language.
2:23:33 They’re all kind of equal.
2:23:34 To what degree is language, this is returning to Chomsky a little bit, is innate.
2:23:40 You said that for Chomsky, he used the idea that language is some aspect of language
2:23:46 are innate to explain away certain things that are observed.
2:23:49 But how much are we born with language at the core of our mind, brain?
2:23:55 I mean, I, you know, the answer is I don’t know, of course, but the, I mean, I, I like to,
2:24:02 I’m an engineer at heart, I guess.
2:24:04 And I sort of think it’s fine to postulate that a lot of it’s learned.
2:24:08 And so I, I’m guessing that a lot of it’s learned.
2:24:11 So I think the reason Chomsky went with the innateness
2:24:13 is because he, he hypothesized movement in his grammar.
2:24:20 He was interested in grammar and movement’s hard to learn.
2:24:22 I think he’s right.
2:24:23 Movement is a hard, it’s a hard thing to learn to learn these two things together
2:24:26 and how they interact.
2:24:27 And there’s like a lot of ways in which you might generate exactly the same sentences.
2:24:31 And it’s like really hard.
2:24:32 And so he’s like, Oh, I guess it’s learned.
2:24:34 So I guess it’s not learned, it’s innate.
2:24:36 And if you just throw out the movement and just think about that in a different way,
2:24:40 you know, then you, you get some messiness, but the messiness is human language,
2:24:47 which it’s actually fits better.
2:24:48 It’s that messiness isn’t a problem.
2:24:51 It’s actually a, it’s a valuable asset of, of, of the theory.
2:24:57 And so, so I think I don’t really see a reason to postulate much innate structure.
2:25:03 And that’s kind of, I think these large language models are learning so well
2:25:06 is because I think you can learn the form, the forms of human language from the input.
2:25:12 I think that’s like, it’s likely to be true.
2:25:14 So that part of the brain that lights up when you’re doing all the comprehension,
2:25:17 that could be learned.
2:25:17 That could be just, you don’t need, you don’t need to be innate.
2:25:21 So like lots of stuff is modular in the brain that’s learned.
2:25:26 It doesn’t have to, you know, so there’s something called the visual word form area
2:25:30 in the back.
2:25:31 And so it’s in the back of your head near the, you know, the visual cortex.
2:25:35 Okay.
2:25:36 And that is very specialized language, sorry, very specialized brain area,
2:25:41 which does visual word processing if you read, if you’re a reader.
2:25:46 Okay.
2:25:46 If you don’t read, you don’t have it.
2:25:47 Okay.
2:25:48 Guess what?
2:25:48 You spend some time learning to read and you develop that, that brain area,
2:25:52 which does exactly that.
2:25:53 And so these, the modularization is not evidence for innateness.
2:25:57 So the modularization of a language area doesn’t mean we’re born with it.
2:26:01 We could have easily learned that.
2:26:02 I, I, we might have been born with it.
2:26:04 I, I, we just, we just don’t know at this point.
2:26:06 We might very well have been born with this left lateralized area.
2:26:10 I mean that there’s like a lot of other interesting components here,
2:26:13 features of this kind of argument.
2:26:16 So some people get a stroke or something goes really wrong on the left side,
2:26:21 where the left, where language area would be, and that, and that isn’t there.
2:26:25 It’s not, not available.
2:26:26 And it develops just fine on the right.
2:26:27 And so it’s no lie.
2:26:28 So it’s not about the left.
2:26:29 It goes to the left.
2:26:32 Like this is a very interesting question.
2:26:33 It’s like, why is the, why are any of the brain areas the way that they are?
2:26:38 And how, how, how did they come to be that way?
2:26:40 And, you know, there’s these natural experiments, which happen where people
2:26:44 get these, you know, strange events in their brains at very young ages,
2:26:48 which wipe out sections of their brain and, and they behave totally normally.
2:26:53 And no one knows anything was wrong.
2:26:54 And we find out later, because they happened to be accidentally scanned for some reason.
2:26:58 And it’s like, what, what happened to your left hemisphere?
2:27:00 It’s missing.
2:27:01 There’s not many people who’ve missed their whole left hemisphere,
2:27:03 but they’ll be missing some other section of their left or their right.
2:27:06 And they behave absolutely normally, we’d never know.
2:27:08 So that’s like a very interesting, you know, current research.
2:27:12 You know, this is another project that this person and Federico is working on.
2:27:16 She’s got all these people contacting her because she’s scanned some people who have
2:27:21 been missing sections.
2:27:23 One person missing, missed a section of her brain and was scanned in her lab.
2:27:27 And, and she, and she happened to be a writer for the New York Times.
2:27:30 And there was an article in New York Times about, about the, just about the scanning
2:27:35 procedure and, and about what might be learned about by sort of the general process of MRI
2:27:41 and language and that’s her language.
2:27:44 And, and because she’s writing for the New York Times,
2:27:46 then all these people started writing to her who also have similar,
2:27:50 similar kinds of deficits because they’ve been, you know, accidentally,
2:27:53 you know, to scan for some reason and, and found out they’re missing some section.
2:27:59 And they, they volunteer to be scanned.
2:28:02 These are natural experiments.
2:28:03 Natural experiments.
2:28:04 They’re kind of messy, but natural experiments, kind of cool.
2:28:06 She calls them interesting brains.
2:28:09 The first few hours, days, months of human life are fascinating.
2:28:13 It’s like, well, inside the womb actually, like that development,
2:28:16 that machinery, whatever that is, seems to create powerful humans that are able to
2:28:24 speak, comprehend, think all that kind of stuff, no matter what happened,
2:28:27 not no matter what, but robust to the different ways that the brain might be damaged and so on.
2:28:35 That’s, that’s really, that’s really interesting.
2:28:38 But what would Chomsky say about the fact, the thing you’re saying now that language
2:28:43 is, is, seems to be happening separate from thought, because as far as I understand,
2:28:49 maybe you can correct me, he thought that language underpins.
2:28:52 Yeah, he thinks so.
2:28:53 I don’t know what he’d say.
2:28:54 He would be surprised because for him, the idea is that language
2:28:58 is the sort of the foundation of thought.
2:29:00 That’s right.
2:29:01 Absolutely.
2:29:02 And it’s pretty mind blowing to think that it could be completely separate from thought.
2:29:08 That’s right.
2:29:08 But so, you know, he’s basically a philosopher, philosopher of language in a way,
2:29:13 thinking about these things.
2:29:14 It’s a fine thought.
2:29:15 You can’t test it in his methods.
2:29:19 You can’t do a thought experiment to figure that out.
2:29:21 You need a scanner.
2:29:23 You need brain damage people.
2:29:24 You need something.
2:29:25 You need ways to measure that.
2:29:27 And that’s what, you know, fMRI offers as a, and, and, you know, patients are a little messier.
2:29:33 fMRI is pretty unambiguous, I’d say.
2:29:36 It’s like very unambiguous.
2:29:37 There’s no way to say that the language network is doing any of these tasks.
2:29:43 There’s, like, you should look at those data.
2:29:45 It’s like there’s no chance that you can say that those networks are overlapping.
2:29:49 They’re not overlapping.
2:29:50 They’re just like completely different.
2:29:51 And so, you know, so, you know, you can always make, you know, it’s only two people.
2:29:56 It’s four people or something for the patients.
2:29:58 And there’s something special about them we don’t know.
2:30:00 But these are just random people and with lots of them, and you find always the same effects.
2:30:07 And it’s very robust, I’d say.
2:30:08 What’s the fascinating effect?
2:30:10 What’s the, you mentioned Bolivia.
2:30:12 What’s the connection between culture and language?
2:30:16 You’ve, you’ve also mentioned that, you know, much of our study of language comes from
2:30:25 WEIRD, Weird People, Western Educated Industrialized Rich and Democratic.
2:30:33 So when you study, like, remote cultures such as around the Amazon jungle,
2:30:38 what can you learn about language?
2:30:40 So that term WEIRD is from Joe Henrich.
2:30:45 He’s at Harvard.
2:30:46 He’s a Harvard evolutionary biologist.
2:30:49 And so he works on lots of different topics.
2:30:53 And he basically was pushing that observation that we should be careful about the inferences
2:30:59 we want to make when we’re talking in psychology or social, yeah, mostly in psychology, I guess,
2:31:05 about humans if we’re talking about, you know, undergrads at MIT and Harvard.
2:31:11 Those aren’t the same, right?
2:31:13 These aren’t the same things.
2:31:14 And so if you want to make inferences about language, for instance, you,
2:31:17 there’s a lot of very, a lot of other kinds of languages in the world, then English and French
2:31:23 and Chinese, you know, and so maybe for language, we care about how culture, because cultures can be
2:31:31 very, I mean, of course, English and Chinese cultures are very different, but, you know,
2:31:35 hunter-gatherers are much more different in some ways.
2:31:39 And so, you know, if culture hasn’t affected what language is, then we kind of want to look
2:31:45 there as well as looking, it’s not like the industrialized cultures aren’t interesting,
2:31:48 of course they are, but we want to look at non-industrialized cultures as well.
2:31:52 And so I worked with two, I worked with the Chimani, which are in Bolivia and in the Amazon,
2:31:59 both in the Amazon, in these cases.
2:32:01 And there are so-called farmer foragers, which is not hunter-gatherers.
2:32:05 It’s sort of one up from hunter-gatherers in that they do a little bit of farming as well,
2:32:10 a lot of hunting as well, but a little bit of farming.
2:32:13 And the kind of farming they do is the kind of farming that I might do.
2:32:16 If I ever were to grow like tomatoes or something in my backyard, it’s not like,
2:32:20 so it’s not like big field farming, it’s just a farming for a family,
2:32:24 a few things you do that.
2:32:25 And so that’s what, that’s the kind of farming they do.
2:32:27 And the other group I’ve worked with are the Pirajá, which are in, also in the Amazon,
2:32:34 and happen to be in Brazil.
2:32:35 And that’s with a guy called Dan Everett, who is a linguist anthropologist who actually lived
2:32:43 and worked in the, I mean, he was a missionary actually, initially, back in the 70s,
2:32:49 working with, trying to translate languages so they could teach them the Bible,
2:32:53 teach them Christianity.
2:32:54 What can you say about that?
2:32:56 Yeah, so the two groups I’ve worked with, the Cimani and the Pirajá, are both
2:33:00 Isolate languages, meaning there’s no known connected languages at all.
2:33:06 They’re just like on their own.
2:33:06 Oh, cool.
2:33:07 Yeah, there’s a lot of those.
2:33:08 And most of the Isolates occur in the Amazon or in Papua New Guinea,
2:33:15 in these places where the world has sort of stayed still for long enough.
2:33:21 And they’re, like, so there aren’t earthquakes.
2:33:25 There aren’t, well, certainly no earthquakes in the Amazon jungle.
2:33:30 And the climate isn’t bad, so you don’t have droughts.
2:33:35 And so, you know, in Africa, you’ve got a lot of moving of people because there’s
2:33:39 drought problems.
2:33:40 And so they get a lot of language contact when you have, when people have to,
2:33:43 if you’ve got to move because you’ve got no water, then you’ve got to get going.
2:33:48 And then you run into contact with other tribes, other groups.
2:33:53 In the Amazon, that’s not the case.
2:33:54 And so people can stay there for hundreds and hundreds and probably thousands
2:33:58 of years, I guess.
2:33:58 And so these groups have, the Cimani and the Pirajá are both Isolates in that.
2:34:03 And they just, I guess they’ve just lived there for ages and ages with minimal
2:34:07 contact with other outside groups.
2:34:11 And so, I mean, I’m interested in them because they are, I mean, I, you know,
2:34:17 in these cases, I’m interested in their words.
2:34:18 So I would love to study their syntax, their orders of words, but I’m mostly just
2:34:22 interested in how languages, you know, are connected to their cultures in this way.
2:34:29 And so with the Pirajá, the most interesting, I was working on number
2:34:33 there, number information.
2:34:34 And so the basic idea is I think language is invented.
2:34:37 That’s what I get from the words here, is that I think language is invented.
2:34:40 We talked about color earlier.
2:34:41 It’s the same idea.
2:34:42 So that what you need to talk about with someone else is what you’re going to
2:34:47 invent words for.
2:34:48 Okay.
2:34:49 And so we invent labels for colors that I need, not that I can see, but that things
2:34:55 I need to tell you about so that I can get objects from you or get you to give
2:34:59 me the right objects.
2:34:59 And I just don’t need a word for teal or a word for aquamarine in the Amazon jungle,
2:35:06 for the most part, because I don’t have two things which differ on those colors.
2:35:10 I just don’t have that.
2:35:11 And so numbers are really another fascinating source of information here where
2:35:16 you might, you know, naively, I certainly thought that all humans would have words
2:35:23 for exact counting and the Pirajá don’t.
2:35:27 Okay.
2:35:27 So they don’t have any words for even one.
2:35:30 There’s not a word for one in their language.
2:35:33 And so there’s certainly not a word for two, three or four.
2:35:35 So that kind of blows people’s minds off.
2:35:39 Yeah, that’s blowing my mind.
2:35:40 That’s pretty weird.
2:35:41 How are you going to ask, I want two of those?
2:35:43 You just don’t.
2:35:44 And so that’s just not a thing you can possibly ask in the Pirajá.
2:35:48 It’s not possible.
2:35:49 That is, there’s no words for that.
2:35:50 So here’s how we found this out.
2:35:52 Okay.
2:35:52 So it was thought to be a one, two, many language.
2:35:56 There are three words, four quantifiers for sets.
2:35:59 But people had thought that those meant one, two and many.
2:36:03 But what they really mean is few, some and many.
2:36:06 Many is correct.
2:36:07 It’s few, some and many.
2:36:08 And so the way we figured this out, and this is kind of cool,
2:36:13 is that we gave people, we had a set of objects.
2:36:18 Okay.
2:36:18 And these were having to be spools of thread.
2:36:19 It doesn’t really matter what they are.
2:36:20 Identical objects.
2:36:22 And when I sort of start off here, I just give, you know,
2:36:25 give you one of those and say, what’s that?
2:36:26 Okay.
2:36:26 I see you’re a Peter Hall speaker and you tell me what it is.
2:36:29 And then I give you two and say, what’s that?
2:36:31 And nothing’s changing in this set except for the number.
2:36:34 Okay.
2:36:34 And then I just ask you to label these things.
2:36:36 We just do this for a bunch of different people.
2:36:38 And frankly, I did this task.
2:36:40 This is fascinating.
2:36:41 And it’s a little bit weird.
2:36:43 So they say the word that we thought was one, it’s few,
2:36:46 but for the first one.
2:36:47 And then maybe they say few or maybe they say some for the second.
2:36:50 And then for the third or the fourth,
2:36:52 they start using the word many for the set.
2:36:55 And then five, six, seven, eight.
2:36:57 I go all the way to 10.
2:36:58 And it’s always the same word.
2:37:00 And they look at me like I’m stupid because they told me
2:37:03 what the word was for six, seven, eight.
2:37:05 And I’m going to continue asking them at nine and 10.
2:37:08 I’m sorry.
2:37:09 I just, I just, they understand that I want to know their language.
2:37:12 That’s the point of the task is like I’m trying to learn their language.
2:37:14 And so that’s okay.
2:37:15 But it does seem like I’m a little slow because I,
2:37:18 they already told me what the word for many was five, six, seven.
2:37:22 And I keep asking.
2:37:23 So it’s a little funny to do this task over and over.
2:37:25 We did this with the guy called Dan was the translator.
2:37:29 He’s the only one who really speaks Piraha fluently.
2:37:33 He’s a good bilingual for a bunch of languages, but also English and Piraha.
2:37:39 And then a guy called Mike Frank was also a student with me down there.
2:37:42 He and I did these things.
2:37:43 And so you do that.
2:37:46 Okay.
2:37:46 And everyone does the same thing.
2:37:48 They all, all, all, you know, we asked like 10 people and they all do
2:37:51 exactly the same labeling for one up.
2:37:53 And then we just do the same thing down on like random order.
2:37:56 Actually, we do some of them up, some of them down first.
2:37:58 Okay.
2:37:58 And so we do, instead of one to 10, we do 10 down to one.
2:38:02 And so, so I give them 10, nine and eight.
2:38:04 They start saying the word for some.
2:38:06 And then at down to, when you get to four, everyone is saying the word for few,
2:38:10 which we thought was one.
2:38:12 So it’s like, it’s the context determined what word, what, what,
2:38:15 what that quantifier they used was.
2:38:17 So it’s not a count word.
2:38:18 They’re not, they’re not count words.
2:38:20 They’re, they’re just approximate words.
2:38:21 And they’re going to be noisy when you interview a bunch of people,
2:38:23 the, what the definition of few, and there’s going to be a threshold in the context.
2:38:27 Yeah.
2:38:27 Yeah.
2:38:27 I don’t know what that means.
2:38:28 That’s, that’s going to be 10 on the context.
2:38:30 I think it’s true in English too, right?
2:38:31 If you ask an English person, what a few is.
2:38:33 I mean, that’s dependent completely on the context.
2:38:36 And it might actually be at first hard to discover.
2:38:38 Yeah.
2:38:39 Because for a lot of people, the jump from one to two will be few.
2:38:42 Right.
2:38:43 So it’s a jump.
2:38:44 Yeah.
2:38:44 It might be, it might still be there.
2:38:46 Yeah.
2:38:46 Right.
2:38:46 It’s, I mean, that’s fascinating.
2:38:48 That’s fascinating that numbers don’t present themselves.
2:38:50 Yeah.
2:38:51 So the words aren’t there.
2:38:52 And then, and so then we do these other things.
2:38:53 Well, if, if they don’t have the words, can they do exact matching kinds of tasks?
2:38:59 Can they even do those tasks?
2:39:01 And, and, and the answer is sort of yes and no.
2:39:04 And so yes, they can do them.
2:39:06 So here’s the tasks that we did.
2:39:07 We put out those spools of thread again.
2:39:10 Okay.
2:39:10 So maybe I put like three out here.
2:39:12 And then we gave them some objects.
2:39:14 And those happen to be uninflated red balloons.
2:39:17 It doesn’t really matter what they are.
2:39:18 It’s just a bunch of exactly the same thing.
2:39:20 And it was easy to put down right next to these spools of thread.
2:39:26 Okay.
2:39:26 And so then I put out three of these.
2:39:28 And your task was to just put one against each of my three things.
2:39:31 And they can do that perfectly.
2:39:33 So I mean, I would actually do that.
2:39:35 It was a very easy task to explain to them because I have,
2:39:37 I did this with this guy, Mike Frank.
2:39:39 And he would be my, I’d be the experimenter telling him to do this
2:39:43 and showing him to do this.
2:39:44 And then we just like, just do what he did.
2:39:45 You’ll copy him.
2:39:46 All we had to, I didn’t have to speak Peter Ha, except for know what, copy him.
2:39:50 Like do what he did is like all we had to be able to say.
2:39:53 And then they would do that just perfectly.
2:39:55 And so we’d move it up.
2:39:56 We’d do some sort of random number of items up to 10.
2:40:00 And they basically do perfectly on that.
2:40:02 They never get that wrong.
2:40:03 I mean, that’s not a counting task, right?
2:40:05 That is just a match.
2:40:06 You just put one against that.
2:40:07 It doesn’t matter how many,
2:40:07 I don’t need to know how many there are there to do that correctly.
2:40:10 And, and they would make mistakes, but very, very few and no more than MIT undergrads.
2:40:16 Just going to say, like there’s no, these are low stakes.
2:40:20 So, you know, you make mistakes.
2:40:21 So counting is not required to complete the matching task.
2:40:22 That’s right.
2:40:23 Not at all.
2:40:24 Okay.
2:40:24 And so, and so that’s our control.
2:40:26 And this guy had gone down there before and said that they couldn’t do this task,
2:40:30 but I just don’t know what he did wrong there because they can do this task perfectly well.
2:40:34 And, you know, I can, can train my dog to do this task.
2:40:36 So of course they can do this task.
2:40:38 And so, you know, it’s not a hard task.
2:40:40 But the other task that was sort of more interesting is like,
2:40:43 so then we do a bunch of tasks where you need some way to encode the set.
2:40:50 So like one of them is just, I just put a opaque sheet in front of the things.
2:40:58 I put down a bunch, a set of these things, and I put an opaque sheet down.
2:41:01 And so you can’t see them anymore.
2:41:03 And I tell you, do the same thing you were doing before, right?
2:41:05 You know, and it’s easy if it’s two or three, it’s very easy.
2:41:08 But if I don’t have the words for eight, it’s a little harder.
2:41:11 Like maybe, you know, with practice went, well, no.
2:41:14 Because you have to count.
2:41:17 For us, it’s easy because we just, we just count them.
2:41:19 It’s just so easy to count them.
2:41:21 But they don’t, they can’t count them because they don’t count.
2:41:24 They don’t have words for this thing.
2:41:25 And so they would do approximate.
2:41:26 It’s totally fascinating.
2:41:27 So they would get them approximately right, you know, after four or five.
2:41:32 You know, because you can basically always get four right, three or four.
2:41:36 That looks, that’s something we can visually see.
2:41:38 But after that, you kind of have, it’s an approximate number.
2:41:42 And so then, and there’s a bunch of tasks we did and they all failed as, I mean, failed.
2:41:46 They did approximate after five on all those tasks.
2:41:50 And it kind of shows that the words, you kind of need the words, you know,
2:41:55 to be able to do these kinds of tasks.
2:41:57 Because there’s a little bit of a chicken and egg thing there.
2:41:59 Because if you don’t have the words, then maybe they’ll limit you in the kind of,
2:42:05 like a little baby Einstein there, won’t be able to come up with a counting task.
2:42:11 You know what I mean?
2:42:11 Like the ability to count enables you to come up with interesting things probably.
2:42:16 So yes, you develop counting because you need it.
2:42:20 But then once you have counting, you can probably come up with a bunch of different inventions.
2:42:25 Like how to, I don’t know, what kind of thing they do matching really well for building purposes,
2:42:33 building some kind of hut or something like this.
2:42:35 So it’s interesting that language is a limiter on what you’re able to do.
2:42:41 Yeah, here’s language is just, is the words.
2:42:43 Here is the words.
2:42:44 Like the words for exact count is the limiting factor here.
2:42:49 They just don’t have them.
2:42:50 Yeah, that’s what I mean.
2:42:52 That limit is also a limit on the society of what they’re able to build.
2:42:58 That’s going to be true.
2:42:59 Yeah.
2:43:00 So it’s probable.
2:43:01 I mean, we don’t know, this is one of those problems with the snapshot of just current languages,
2:43:06 is that we don’t know what causes a culture to discover/invent a counting system.
2:43:11 But the hypothesis is the guess out there is something to do with farming.
2:43:15 So if you have a bunch of goats and you want to keep track of them,
2:43:20 and you save 17 goats and you go to bed at night and you get up in the morning,
2:43:24 boy, it’s easier to have a count system to do that.
2:43:27 You know, that’s an abstraction over a set.
2:43:30 So that I don’t have, like people often ask me when I talk to them about this kind of work,
2:43:34 they say, “Well, don’t these children have kids?
2:43:36 Don’t they have a lot of children?”
2:43:37 I’m like, “Yeah, they have a lot of children.”
2:43:39 And they do.
2:43:39 They often have families of three or four or five kids.
2:43:42 And they go, “Well, don’t they need the numbers to keep track of their kids?”
2:43:45 And I always ask the person who says this, like, “Do you have children?”
2:43:48 And the answer is always, “No.”
2:43:50 Because that’s not how you keep track of your kids.
2:43:52 You care about their identities.
2:43:54 It’s very important to me when I go, “I think I have five children.”
2:43:57 It doesn’t matter which, it matters which five.
2:44:02 It’s like, if you replaced one with someone else, I would care.
2:44:06 Goat maybe not, right?
2:44:08 That’s the kind of point.
2:44:08 It’s an abstraction.
2:44:10 Something that looks very similar to the one wouldn’t matter to me, probably.
2:44:13 But if you care about goats, you’re going to know them actually individually also.
2:44:17 Yeah, you will.
2:44:18 I mean, cows and goats, if there’s a source of food and milk and all that kind of stuff,
2:44:21 you’re going to actually really do the care.
2:44:23 But I’m saying it is an abstraction such that you don’t have to care
2:44:25 about their identities to do this thing fast.
2:44:28 That’s the hypothesis, not mine.
2:44:29 From anthropologists are guessing about where words for counting came from,
2:44:34 is from farming maybe.
2:44:36 Yeah. Do you have a sense why universal languages like Esperanto have not taken off?
2:44:42 Like why do we have all these different languages?
2:44:47 Well, my guess is that the function of a language is to do something in a community.
2:44:53 I mean, unless there’s some function to that language in the community,
2:44:57 it’s not going to survive.
2:44:58 It’s not going to be useful.
2:44:59 So here’s a great example.
2:45:00 Language death is super common.
2:45:05 Languages are dying all around the world.
2:45:07 And here’s why they’re dying.
2:45:09 And it’s like, yeah, I see this in, you know, it’s not happening right now
2:45:12 in either the Chimane or the Piedoha, but it probably will.
2:45:16 And so there’s a neighboring group called Mosetan, which is, I said that it’s an isolates.
2:45:22 Actually, there’s a dual.
2:45:23 There’s two of them.
2:45:24 Okay. So it’s actually, there’s two languages, which are really close,
2:45:27 which are Mosetan and Chimane, which are unrelated to anything else.
2:45:32 And Mosetan is unlike Chimane in that it has a lot of contact with Spanish and it’s dying.
2:45:38 So that language is dying.
2:45:39 The reason it’s dying is there’s not a lot of value for the local people in their native language.
2:45:46 So there’s much more value in knowing Spanish like because they want to feed their families.
2:45:51 And how do you feed your family?
2:45:52 You learn Spanish so you can make money so you can get a job and do these things.
2:45:56 And then you can, and then you make money.
2:45:57 And so they want Spanish things, they want, and so Mosetan is in danger and is dying.
2:46:03 And that’s normal.
2:46:04 And so basically the problem is that people, the reason we learn languages to communicate,
2:46:10 and we need to, we use it to make money and to do whatever it is to feed our families.
2:46:18 And if that’s not happening, then it won’t take off.
2:46:22 It’s not like a game or something.
2:46:24 This is like something we use.
2:46:25 Like, why is English so popular?
2:46:27 It’s not because it’s an easy language to learn.
2:46:29 Maybe it is.
2:46:31 I don’t really know.
2:46:32 But that’s not why it’s popular.
2:46:34 But because the United States is a gigantic economy and therefore…
2:46:37 It’s big economies that do this.
2:46:39 It’s all it is.
2:46:39 It’s all about money and that’s what…
2:46:42 And so there’s a motivation to learn Mandarin.
2:46:45 There’s a motivation to learn Spanish.
2:46:46 There’s a motivation to learn English.
2:46:48 These languages are very valuable to know because there’s so, so many speakers all over the world.
2:46:52 That’s fascinating.
2:46:52 There’s less of a value economically.
2:46:55 It’s like kind of what drives this.
2:46:56 It’s not just for fun.
2:46:59 I mean, there are these groups that do want to learn language just for language’s sake.
2:47:04 And then there’s something to that.
2:47:06 But those are rare.
2:47:07 Those are rarities in general.
2:47:08 Those are a few small groups that do that.
2:47:11 Not most people don’t do that.
2:47:12 Well, if that was the primary driver, then everybody was speaking English or speaking one language.
2:47:17 There’s also attention.
2:47:18 That’s happening.
2:47:19 And that, well…
2:47:19 We’re moving towards fewer and fewer languages.
2:47:22 We are.
2:47:23 I wonder if…
2:47:24 You’re right.
2:47:24 Maybe this is slow, but maybe that’s where we’re moving.
2:47:28 But there is attention.
2:47:30 You’re saying a language that defringes.
2:47:33 But if you look at geopolitics and superpowers, it does seem that there’s another thing of
2:47:39 tension, which is a language is a national identity sometimes.
2:47:43 For certain nations.
2:47:45 I mean, that’s the war in Ukraine.
2:47:47 Language, Ukrainian language is a symbol of that war in many ways.
2:47:52 Like a country fighting for its own identity.
2:47:54 So it’s not merely the convenience.
2:47:56 I mean, those two things that are attention is the convenience of trade and the economics
2:48:01 and be able to communicate with neighboring countries and trade more efficiently with
2:48:07 neighboring countries, all that kind of stuff, but also identity of the group.
2:48:11 That’s right.
2:48:11 I completely agree.
2:48:12 This language is the way…
2:48:13 For every community, like dialects that emerge are a kind of identity for people.
2:48:21 Sometimes a way for people to say F-U to the more powerful people.
2:48:26 That’s interesting.
2:48:28 So in that way, language can’t be used as that tool.
2:48:30 I completely agree.
2:48:32 And there’s a lot of work to try to create that identity.
2:48:36 So people want to do that speak as a cognitive scientist and language expert.
2:48:42 I hope that continues because I don’t want languages to die.
2:48:46 I want languages to survive because they’re so interesting for so many reasons.
2:48:53 But I mean, I find them fascinating just for the language part.
2:48:56 But I think there’s a lot of connections to culture as well, which is also very important.
2:49:01 Do you have hope for machine translation that can break down the barriers of language?
2:49:07 So while all these different diverse languages exist, I guess there’s many ways of asking
2:49:12 this question, but basically how hard is it to translate in an automated way for one language
2:49:19 to another?
2:49:20 There’s going to be cases where it’s going to be really hard.
2:49:22 So there are concepts that are in one language and not in another.
2:49:27 Like the most extreme kinds of cases are these cases of number information.
2:49:31 So good luck translating a lot of English into Piraha.
2:49:35 It’s just impossible.
2:49:36 There’s no way to do it because there are no words for these concepts that we’re talking about.
2:49:41 There’s probably the flip side, right?
2:49:43 There’s probably stuff in Piraha, which is going to be hard to translate into English
2:49:48 on the other side.
2:49:49 And so I just don’t know what those concepts are.
2:49:51 I mean, the space, the world space is different from my world space.
2:49:56 And so I don’t know what, so that the things they talk about, things are,
2:49:59 it’s going to have to do with their life as opposed to my industrial life,
2:50:04 which is going to be different.
2:50:05 And so there’s going to be problems like that always.
2:50:09 There’s like, maybe it’s not so bad in the case of some of these spaces,
2:50:12 and maybe it’s going to be harder than others.
2:50:14 And so it’s pretty bad in number.
2:50:16 It’s like extreme, I’d say, in the number space, exact number space.
2:50:20 But in the color dimension, right?
2:50:22 So that’s not so bad.
2:50:22 I mean, but it’s a problem that you don’t have ways to talk about the concepts.
2:50:29 And there might be entire concepts that are missing.
2:50:31 So to you, it’s more about the space of concept versus the space of form.
2:50:35 Like form, you can probably map.
2:50:38 Yes.
2:50:38 Yeah. But so you were talking earlier about translation
2:50:41 and about how translations, there’s good and bad translations.
2:50:46 I mean, now you’re talking about translations of form, right?
2:50:48 So what makes writing good, right?
2:50:51 There’s a music to the form.
2:50:53 Right. It’s not just the content.
2:50:55 It’s how it’s written.
2:50:57 And translating that, that sounds difficult.
2:51:00 We should say that there is like, I don’t hesitate to say meaning,
2:51:06 but there’s a music and a rhythm to the form.
2:51:10 When you look at the broad picture, like the Fritz Wietzi and Dostoyevsky and Tolstoy,
2:51:14 or Hemingway Bukowski, James Joyce, like I mentioned, there’s a beat to it.
2:51:21 There’s an edge to it that’s like, is in the form.
2:51:24 We can probably get measures of those.
2:51:27 Yeah.
2:51:27 I don’t know.
2:51:29 I’m optimistic that we could get measures of those things.
2:51:32 And so maybe that’s…
2:51:33 Translatable.
2:51:34 I don’t know. I don’t know, though.
2:51:35 I have not worked on that.
2:51:37 I would love to see…
2:51:38 That sounds totally fascinating.
2:51:39 Translation to Hemingway is probably the lowest…
2:51:44 I would love to see different authors,
2:51:46 but the average per sentence dependency length for Hemingway is probably the shortest.
2:51:53 That’s your sense, huh?
2:51:55 It’s simple sentences.
2:51:56 Simple sentences.
2:51:57 Short, yeah, yeah, yeah, yeah.
2:51:59 I mean, that’s when, if you have really long sentences,
2:52:01 even if they don’t have center, like…
2:52:03 They can have longer connections.
2:52:04 They can have longer connections.
2:52:06 They don’t have to, right?
2:52:06 You can’t have a long, long sentence with a bunch of local words, yeah.
2:52:10 But it is much more likely to have the possibility
2:52:13 of long dependencies with long sentences, yeah.
2:52:15 I met a guy named Azar Askin who does a lot of cool stuff.
2:52:21 Really brilliant.
2:52:22 Works with Tristan Harris and a bunch of stuff.
2:52:23 But he was talking to me about communicating with animals.
2:52:29 He co-founded Earth Species Project,
2:52:32 where you’re trying to find the common language between whales, crows, and humans.
2:52:37 And he was saying that there’s a lot of promising work,
2:52:42 that even though the signals are very different,
2:52:44 like the actual, if you have embeddings of the languages,
2:52:50 they’re actually trying to communicate similar type things.
2:52:54 Is there something you can comment on that?
2:52:58 Where is there a promise to that?
2:53:00 In everything you’ve seen in different cultures,
2:53:02 especially like remote cultures, that this is a possibility?
2:53:05 Or no?
2:53:05 Like we can talk to whales?
2:53:07 I would say yes.
2:53:09 I think it’s not crazy at all.
2:53:11 I think it’s quite reasonable.
2:53:13 But there’s this sort of weird view, well, odd view,
2:53:16 I think, that to think that human language is somehow special.
2:53:21 I mean, it is, maybe it is.
2:53:24 We can certainly do more than any of the other species.
2:53:28 You know, and maybe our language system is part of that.
2:53:34 It’s possible.
2:53:35 But people have often talked about how human, like Chomsky, in fact,
2:53:40 has talked about how human language has this compositionality thing
2:53:47 that he thinks is sort of key in language.
2:53:49 And the problem with that argument is he doesn’t speak whale.
2:53:53 And he doesn’t speak crow, and he doesn’t speak monkey.
2:53:57 You know, he’s like, they say things like,
2:53:59 well, they’re making a bunch of grunts and squeaks.
2:54:01 And the reasoning is like, that’s bad reasoning.
2:54:05 Like, you know, I’m pretty sure if you asked a whale what we’re saying,
2:54:08 they’d say, well, I’m making a bunch of weird noises.
2:54:10 Exactly.
2:54:11 And so it’s like, this is a very odd reasoning to be making,
2:54:15 that human language is special because we’re the only one
2:54:17 to have human language.
2:54:18 I’m like, well, we don’t know what those other, we just don’t,
2:54:23 we can’t talk to them yet.
2:54:24 And so there are probably a signal in there.
2:54:26 And it might very well be something complicated like human language.
2:54:31 I mean, sure, with a small brain, in lower species,
2:54:35 there’s probably not a very good communication system.
2:54:37 But in these higher species where you have, you know,
2:54:40 what seems to be, you know, abilities to communicate something,
2:54:45 there might very well be a lot more signal there than we might have otherwise thought.
2:54:50 But also, if we have a lot of intellectual humility here,
2:54:53 there’s somebody formerly from MIT, Neri Oxman,
2:54:56 who I admire very much, has talked a lot about,
2:54:59 has worked on communicating with plants.
2:55:03 So like, yes, the signal there is even less than,
2:55:07 but like, it’s not out of the realm of possibility
2:55:10 that all nature has a way of communicating.
2:55:14 And it’s a very different language,
2:55:16 but they do develop a kind of language through the chemistry,
2:55:21 through some way of communicating with each other.
2:55:23 And if you have enough humility about that possibility,
2:55:26 I think you can, I think it would be a very interesting,
2:55:29 in a few decades, maybe centuries, hopefully not,
2:55:32 a humbling possibility of being able to communicate,
2:55:37 not just between humans, effectively,
2:55:39 but between all of living things on Earth.
2:55:42 Well, I mean, I think some of them are not going to have much interesting to say.
2:55:47 But you could still.
2:55:48 We don’t know.
2:55:49 We certainly don’t know, I think.
2:55:50 I think if we were humble,
2:55:52 there could be some interesting trees out there.
2:55:55 Well, they’re probably talking to other trees, right?
2:55:58 They’re not talking to us.
2:55:59 And so to the extent they’re talking,
2:56:01 they’re saying something interesting to some other,
2:56:04 you know, conspecific as opposed to us, right?
2:56:07 And so they probably is, there may be some signal there.
2:56:10 So there are people out there,
2:56:12 actually it’s pretty common to say that human language is special
2:56:17 and different from any other animal communication system.
2:56:20 And I just don’t think the evidence is there for that claim.
2:56:24 I think it’s not obvious.
2:56:25 We just don’t know what,
2:56:30 because we don’t speak these other communication systems
2:56:32 until we get better.
2:56:34 You know, I do think there are people working on that,
2:56:37 as you pointed out, though,
2:56:38 people working on whale speak, for instance.
2:56:40 Like, that’s really fascinating.
2:56:42 Let me ask you a wild out there sci-fi question.
2:56:45 If we make contact with an intelligent alien civilization,
2:56:49 and you get to meet them, how hard do you think you,
2:56:53 like how surprised would you be about their way of communicating?
2:56:56 Do you think it would be recognizable?
2:56:59 Maybe there’s some parallels here when you go to the remote drives.
2:57:03 I mean, I would want Dan Everett with me.
2:57:05 He is like amazing at learning foreign languages.
2:57:08 And so he like, this is an amazing feat, right?
2:57:10 To be able to go.
2:57:11 This is a language, which has no translators before him.
2:57:15 I mean, there were, he was a missionary.
2:57:17 Well, there was a guy that had been there before,
2:57:18 but he wasn’t very good.
2:57:20 And so he learned the language far better
2:57:23 than anyone else had learned before him.
2:57:25 He’s like good at, he’s just a, he’s a very social person.
2:57:28 I think that’s a big part of it, is being able to interact.
2:57:30 So I don’t know, it kind of depends on these,
2:57:32 these, the species from outer space,
2:57:35 how much they want to talk to us.
2:57:37 Is there something you can say about the process he follows?
2:57:40 Like what, how do you show up to a tribe and socialize?
2:57:43 I mean, I guess colors and counting
2:57:45 is one of the most basic things to figure out.
2:57:47 Yeah, you start that.
2:57:48 You actually start with like objects and just say,
2:57:51 you know, just throw a stick down and say stick.
2:57:53 And then you say, what do you call this?
2:57:54 And then they’ll say the word, whatever.
2:57:56 And he says a standard thing to do is to throw two sticks at two sticks.
2:58:00 And then, you know, he learned pretty quick
2:58:02 that there weren’t any count words in this language
2:58:04 because they didn’t know this wasn’t interesting to them.
2:58:07 It was kind of weird.
2:58:07 They’d say some or something in the same word over and over again.
2:58:10 And so, but that is a standard thing.
2:58:11 You just like try to,
2:58:12 but you have to be pretty out there socially,
2:58:15 like willing to talk to random people.
2:58:18 Which these are, you know, really very different people from you.
2:58:21 And he was, and he’s very social.
2:58:23 And so I think that’s a big part of this is like, that’s how,
2:58:25 you know, a lot of people know a lot of languages
2:58:28 that they’re willing to talk to other people.
2:58:30 That’s a tough one.
2:58:31 We just show up knowing nothing.
2:58:32 Yeah. Oh, God.
2:58:33 That’s beautiful.
2:58:34 It’s beautiful that humans are able to connect in that way.
2:58:36 Yeah. Yeah.
2:58:37 You’ve had an incredible career exploring this fascinating topic.
2:58:41 What advice would you give to young people
2:58:43 about how to have a career?
2:58:47 Like that or a life that they can be proud of?
2:58:50 When you see something interesting, just go and do it.
2:58:53 Like I do, I do that.
2:58:54 Like that’s something I do,
2:58:55 which is kind of unusual for most people.
2:58:57 So like when I saw the Piroja,
2:58:58 like if Piroja was available to go and visit,
2:59:00 I was like, yes, yes, I’ll go.
2:59:02 And then when we couldn’t go back,
2:59:04 we had some trouble with the Brazilian government.
2:59:08 There’s some corrupt people there.
2:59:09 It was very difficult to get, go back in there.
2:59:11 And so I was like, all right, I got to find another group.
2:59:13 And so we searched around and we were able to find the,
2:59:16 because I wanted to keep working on this kind of problem.
2:59:18 And so we found the Chamani and just go there.
2:59:20 I didn’t really have, we didn’t have contact.
2:59:22 We had a little bit of contact and brought someone.
2:59:24 And that was, you know, we just kind of just try things.
2:59:28 I say it’s like, a lot of that just like ambition,
2:59:31 just try to do something that other people haven’t done.
2:59:33 Just give it a shot is what I, I mean, I do that all the time.
2:59:37 I don’t know.
2:59:37 I love it.
2:59:38 And I love the fact that your pursuit of fun
2:59:41 has landed you here talking to me.
2:59:43 This was an incredible conversation
2:59:45 that you’re, you’re, you’re just a fascinating human being.
2:59:48 Thank you for taking a journey
2:59:49 through human language with me today.
2:59:52 This is awesome.
2:59:52 Thank you very much.
2:59:53 Lex has been pleasure.
2:59:54 Thanks for listening to this conversation
2:59:57 with Edward Gibson to support this podcast.
3:00:00 Please check out our sponsors in the description.
3:00:02 And now let me leave you with some words from Wittgenstein.
3:00:06 The limits of my language mean the limits of my world.
3:00:11 Thank you for listening and hope to see you next time.
3:00:14 [MUSIC]
3:00:24 [MUSIC]

Edward Gibson is a psycholinguistics professor at MIT and heads the MIT Language Lab. Please support this podcast by checking out our sponsors:
– Yahoo Finance: https://yahoofinance.com
– Listening: https://listening.com/lex and use code LEX to get one month free
– Policygenius: https://policygenius.com/lex
– Shopify: https://shopify.com/lex to get $1 per month trial
– Eight Sleep: https://eightsleep.com/lex to get special savings

Transcript: https://lexfridman.com/edward-gibson-transcript

EPISODE LINKS:
Edward’s X: https://x.com/LanguageMIT
TedLab: https://tedlab.mit.edu/
Edward’s Google Scholar: https://scholar.google.com/citations?user=4FsWE64AAAAJ
TedLab’s YouTube: https://youtube.com/@Tedlab-MIT

PODCAST INFO:
Podcast website: https://lexfridman.com/podcast
Apple Podcasts: https://apple.co/2lwqZIr
Spotify: https://spoti.fi/2nEwCF8
RSS: https://lexfridman.com/feed/podcast/
YouTube Full Episodes: https://youtube.com/lexfridman
YouTube Clips: https://youtube.com/lexclips

SUPPORT & CONNECT:
– Check out the sponsors above, it’s the best way to support this podcast
– Support on Patreon: https://www.patreon.com/lexfridman
– Twitter: https://twitter.com/lexfridman
– Instagram: https://www.instagram.com/lexfridman
– LinkedIn: https://www.linkedin.com/in/lexfridman
– Facebook: https://www.facebook.com/lexfridman
– Medium: https://medium.com/@lexfridman

OUTLINE:
Here’s the timestamps for the episode. On some podcast players you should be able to click the timestamp to jump to that time.
(00:00) – Introduction
(10:53) – Human language
(14:59) – Generalizations in language
(20:46) – Dependency grammar
(30:45) – Morphology
(39:20) – Evolution of languages
(42:40) – Noam Chomsky
(1:26:46) – Thinking and language
(1:40:16) – LLMs
(1:53:14) – Center embedding
(2:19:42) – Learning a new language
(2:23:34) – Nature vs nurture
(2:30:10) – Culture and language
(2:44:38) – Universal language
(2:49:01) – Language translation
(2:52:16) – Animal communication

#426 – Edward Gibson: Human Language, Psycholinguistics, Syntax, Grammar & LLMs

Leave a Reply Cancel reply