Policymaking Is Not a Science — Yet (Update)

AI transcript
Hey there, it’s Stephen Dubner.
We just published a two-part series on what some people call sludge, meaning all the frictions
that make it hard to fill out tax forms or find a health care provider or even cancel
a subscription.
One part of our series involved government sludge and how it interferes with getting
policy done.
The series reminded me of another episode we once made that I thought was worth hearing
again, so we’re playing it for you here as a bonus episode.
It is called Policymaking is Not a Science Yet.
We have updated facts and figures as necessary.
As always, thanks for listening.
Usually when children are born deaf, they call it nerve deafness, but it’s really not the actual
nerve.
It’s little tiny hair cells in the cochlea.
Dana Susskind is a physician scientist at the University of Chicago and, more dramatically,
she is a pediatric surgeon who specializes in cochlear implants.
My job is to implant this incredible piece of technology which bypasses these defective
hair cells and takes the sound from the environment, the acoustic sound, and transforms it into electrical
energy, which then stimulates the nerve.
And somebody who is severe to completely profoundly deaf after implantation can have normal levels
of hearing.
And it is pretty phenomenal.
It is pretty phenomenal.
If you ever need a good cry, a happy cry, just type in cochlear implant activation on YouTube.
You’ll see little kids hearing sound for the first time and their parents flipping out with joy.
Good job!
Good job!
She’s smiley.
Oh, that’s great!
She’s so smiley.
Yeah, that’s your ears.
Yeah.
The cochlear implant is a remarkable piece of technology, but really it’s just one of
many remarkable advances in medicine and elsewhere, created by devoted researchers and technologists
and sundry smart people.
You know what’s even more remarkable?
How often we fail to take advantage of these advances.
One of the most compelling examples is the issue of hypertension.
About a third of all Americans have high blood pressure.
First of all, the awareness rate is about only 80%.
Of the total amount, only 50% actually are controlled.
We have great drugs, right?
But you can see the cascade of issues when you have to disseminate, you have to adhere, etc.,
and the public health ramifications of that.
Those blood pressure numbers are even worse today than they were when we first published
this episode in 2020.
Clearly, we still have not figured out how to get the science to the people who need it.
Prescription adherence is a very difficult nut to crack.
That’s John List.
He’s an economist at the University of Chicago.
They actually have to go and get the medicines, which a lot of people have a very hard time doing.
Even though it’s sitting next to your bed every night, people don’t take it.
And they don’t take it because they forget.
They don’t take it because the side effect is a lot worse than the benefit they think they’re getting.
All of these types of problems, as humans, including myself, we do a really bad job in trying to solve.
All of us, our lives get busy.
We forget.
You wouldn’t think you’d have an adherence issue with something like the cochlear implant.
It has such an obvious upside.
And yet…
When I put the internal device in, it stays there.
But it actually requires an external portion as well, sort of like a hearing aid.
And that is the part where you see issues related to adherence.
Just because I put the internal part doesn’t mean that an individual or a child will be wearing the external part.
In one study, only half of the participants wore their device full-time.
I mean, we have figured through randomized control trials to understand causation, real impact in the small scale.
But the next step is understanding the science of how to use this science.
Because, you know, how you do it on the small scale in perfect conditions is very different than the messy real world.
And that is a very real issue.
Today on Freakonomics Radio, what to do about that very real issue.
Because you see the same thing not just in medicine, but in education and economic policy and elsewhere.
Solutions that look foolproof in the research stage are failing to scale up.
People said, let’s just put it out there.
And then we quickly realized that it’s far more complicated.
There might be something that you think would be great, but it’s never going to be able to be implemented in the real world.
We need to know, what is the magic sauce?
We’ll go in search of that magic sauce right after this.
This is Freakonomics Radio, the podcast that explores the hidden side of everything, with your host, Stephen Dubner.
John List is a pioneer in the relatively recent movement to give economic research more credibility in the real world.
If you turn back the clock to the 1990s, there was a credibility revolution in economics,
focusing on what data and modeling assumptions are necessary to go from correlation to causality.
List responded by running dozens and dozens of field experiments.
Now, my contribution in the credibility revolution was instead of working with secondary data,
I actually went to the world and used the world as my lab and generated new data to test theories and estimate program effects.
Okay, so you and others moved experiments out of the lab and into the real world.
But have you been able to successfully translate those experimental findings into, let’s say, good policy?
I think moving our work into policymaking circles and having a very strong impact has just not been there.
And I think one of the most important questions is,
how are we going to make that natural progression of field experiments within the social sciences
to more keenly talk to policymakers, the broader public, and actually the scientific community as a whole?
The way List sees it, academics like him work hard to come up with evidence for some intervention
that’s supposed to help alleviate poverty or improve education,
to help people quit smoking or take their blood pressure medicine.
The academic then writes up their paper for an incredibly impressive-looking academic journal,
impressive at least to fellow academics.
The rest of us, it’s jargony and indecipherable.
But then, with paper in hand, the academic goes out proselytizing to policymakers.
He might say,
you politicians always talk about making evidence-based policy.
Well, here’s some new evidence for an effective and cost-effective way
of addressing that problem you say you care so much about.
And then the policymaker may say,
well, the last time we listened to an academic like you,
we did just what they told us, but it didn’t work.
And it cost three times what they said it would.
And we got hammered in the press.
And here’s the thing.
The politician and the academic may both be right.
John List has seen this from both sides now.
In a past life, I worked in the White House advising the president
on environmental and resource issues within economics.
This was in the early 2000s under George W. Bush.
A harsh lesson that I learned was you have to evaluate the effects of public policy
as opposed to its intentions.
Because the intentions are obviously good.
For instance, improving literacy for grade schoolers
or helping low-income high schoolers get to college.
When you step back and look at the amount of policies
that we put in place that don’t work,
it’s just a travesty.
List has firsthand experience with the failure to scale.
So down in Chicago Heights,
I ran a series of interventions.
And one of the more powerful interventions
was called the Parent Academy.
That was a program that brought in parents every few weeks.
And we taught them what are the best mechanisms and approaches
that they can use with their 3-, 4-, and 5-year-old children
to push both their cognitive skills
and their executive function skills.
Things like self-control.
What we found was within three to six months,
we can move a child in very short order
to have very strong cognitive test scores
and very strong executive function skills.
So, of course, we’re very optimistic
after getting this type of result,
and we want the whole world
to now do parent academies.
The UK approaches us and said,
we want to roll it out across London
and the boroughs around London.
What we found is that it failed miserably.
It wasn’t that the program was bad.
It failed miserably
because no parents actually signed up.
So if you want your program to work
at higher levels,
you have to figure out
how to get the right people
and all the people, of course,
into the program.
If you had asked me to guess
all the ways that a program like that could fail,
it would have taken me a while
to guess that you simply
didn’t get parental uptake.
The main problem is
we just don’t understand
the science of scaling.
If you were to attach a noun
to what this is,
the scalability blank,
is it a problem?
Is it a dilemma?
Is it a crisis?
I do think it’s a crisis
in that
if we don’t take care of it
as scientists,
I think everything we do
can be undermined
in the eyes of the policymaker
and the broader public.
We don’t understand
how to use our own science
to make better policies.
So John List and Dana Susskind
and some other researchers
are on a quest to address
this scalability crisis.
They’ve been writing
a series of papers,
for instance,
The Science of Using Science
Towards an Understanding
of the Threats to Scaling Experiments.
A lot of their focus
is on early education,
since that is a particular
passion of Susskind’s.
I guess you could say
I’m a surgeon by day
and social scientist by night.
My clinical work
is about taking care
of one child at a time.
My research
really comes out
of the fact
that not all children
do as well as others
after surgery
and trying to figure out
the best ways
to allow
all my patients
and really
children born
into low-income backgrounds
to reach
their educational potentials.
It is kind of like
a superhero in reverse.
During the day,
you’re doing
the big dramatic stuff
and at night,
you’re going home
to analyze the data
and figure out
what’s happening.
I think that really
the hard part
is the night part.
I love doing surgery.
I adore my patients,
but it’s actually
not as hard
as many of the complex issues
in this world.
And was that a recognition
that some kids
after the surgery
sort of zoomed up
the education ladder
and others didn’t?
Yeah.
It’s not simply
about hearing loss.
It’s because language
is the food
for the developing brain.
Before surgery,
they all looked like
they’d have the same potential
to, as you say,
zoom up the educational ladder.
After surgery,
there were very
different outcomes.
And too often
that difference
fell along
socioeconomic lines.
That made me start
searching outside
the operating room
for understanding
why and what
I could do about it.
And it has taken me
on a journey.
So Dana and I met
back in 2012
and we were introduced
by a mutual friend
and we did the usual
ignore each other
for a few years
because we’re too busy.
And push came to shove.
Dana and I started
to work on
early childhood research.
And after that,
research turned to love.
I always joke
that I was wooed
with spreadsheets
and hypotheses.
Is that true?
Yes.
Yes.
In fact,
the reason I decided
to marry him
was because I wanted
this area of scaling
to be a robust area
of research for him
because it really
is a major issue.
Suskind started
what was then called
the 30 million words
initiative.
30 million being
an estimate
of how many fewer
words a child
from a low-income home
will have heard
than an affluent child
by the time
they turn four.
But these days,
the project is called
the TMW Center
for Early Learning
and Public Health.
we’ve actually moved
away from the term
30 million words
because it’s such
a hot-button issue.
Hot-button because
it’s so hard to believe
that the number
is legit?
Well, no.
I mean,
some people say,
look,
it’s a deficit mentality.
You’re talking about
what’s not there.
And then the replication,
somebody did another study
that said,
oh, it’s only 4 million.
And it really isn’t
actually even the point
because it’s not
even about words.
It’s about the interaction.
So I just made
the decision.
I’d rather be focusing
on developing the research
than fighting
a naming battle.
So you didn’t make
TMW stand
for something else.
Well,
that’s what
everybody gives me
trouble for.
It stands for
30 million words,
but only I know that.
Okay,
now you all know it too.
Anyway,
they started the center
with this idea.
With this idea
that, you know,
we need to
take a public health
or a population-level
approach
during the early years
to optimize
early foundational
brain development
because the research
is pretty clear
that parent talk
and interaction
in the first
three years of life
are the catalyst
for brain development.
And so
that’s basically
our work.
Okay,
so far so good.
The research is clear
that heavy exposure
to language
is good for
the developing brain.
But how do you
turn that research
finding into action?
And how do you
scale it up?
Initially,
we started with
an intensive
home visiting
program,
but understanding
that to reach
population-level
impact,
you need to
develop programs
both with an
eye for scaling
as well as an eye
for understanding
where parents
go regularly.
Because healthcare,
unlike the education
system,
the first three years
of life really
don’t have any
infrastructure
in which to
disseminate programs.
So we actually
expanded our
model.
We have this
multifaceted program
that reached parents
where they were,
from maternity wards
into pediatrics
offices,
into the homes,
as well as group
sessions.
Those programs
that are most
vulnerable to the
issues of scale
are the complex
sort of service
delivery interventions.
You know,
anything that takes
a human service
delivery.
Scaling isn’t
an end.
It’s really
just a continuation.
You know,
it’s a hard one.
That is
Patti Chamberlain,
senior research
scientist at
Oregon Social
Learning Center.
And I do
research and
implementation
of evidence-based
practices in
child welfare,
juvenile justice,
mental health,
and education
systems.
Chamberlain also
looks at scaling
as a process.
So it’s almost
like there’s
stages that you
have to go
through.
And if the
first stage
is research
that involves
an RCT,
a randomized
controlled trial,
there’s already
an important
choice to make.
You’re far
better off
to situate
your RCT
in a real
world setting
than a
university clinic
so that you’re
learning from
the beginning
what’s feasible
and what’s
not feasible.
There might be
something that you
think would be
great,
but it’s never
going to be able
to be implemented
in the real
world.
I’ve been
at this
now for,
oh,
probably
25 years,
and I learned
sort of through
failing.
One program
Chamberlain founded
is called
Treatment Foster
Care Oregon.
Kids tend to
commit crimes
together.
It’s a team
sport.
But then,
oddly,
the way that
we’re set up
to deal with
kids who,
you know,
reach the level
where they’re
really being
unsafe to
themselves
and to
the community
is we put
them in
group homes
together.
We’re putting
kids in a
situation where
they’re more
likely to
commit crimes.
So we decided
what if we
placed a child
singly in a
family that
was completely
devoted to
using evidence-based
parenting skills
to help that
child do well
with peers in
school and in
the family
setting?
what if we
gave the
parents,
the biological
parents of
that kid,
the same kind
of skills that
the treatment
foster care
family had?
What if we
gave the kid
individual therapy?
The biological
family was
getting family
therapy.
We were giving
the kids
support at
school.
So we were
basically wrapping
all these services
around an
individual child
in a family
home.
What we found
was, yeah,
the kids do a
lot better.
They have a lot
fewer arrests.
they spend
less days in
institutions.
They use
fewer drugs.
And guess what?
It costs a lot
less as well.
Because you do
not have a
facility.
You do not
have 24-7 staff
that you’re paying
in shifts.
You do not
have, you know,
all of the
stuff that it
takes to run
an institution.
You have a
family.
The success of
Chamberlain’s
program caught
the eye of
researchers who
were working on
a program for a
federal agency
called the
Office of
Juvenile Justice
and Delinquency
Prevention.
And so we
got this call
saying, you
know, we
want you to
implement your
program in
15 sites.
If the
program was
successful at
one site, how
hard could it be
to make it work
at 15?
I went in
thinking that it
wouldn’t be that
hard because we
had good outcomes.
We showed that we
could save money.
And yet, we
were absolutely
not ready.
It wasn’t because
we didn’t have
enough data.
We had, at that
point, plenty of
data.
But we didn’t
have the know-how
of how to put
this thing down
in the real
world.
And it blew up.
One reason?
Systemic
complication.
The three
systems, child
welfare, juvenile
justice, and
mental health, all
put some money in
the pot to fund
this implementation.
I was completely
delighted.
I thought, oh,
this is going to
be great because
we have all the
relevant systems
buying into
this.
Well, what
happened was
when we tried
to implement,
we ran into
tremendous
barriers because
if we satisfied
the policies
and procedures
of one
system, we
were at
odds with
the policies
and procedures
in the
other system.
Patty
Chamberlain had
run up against
something that
Dana Susskind
had come to
see as an
inherent disconnect
when you try
to scale up
a research
finding.
There’s
obviously the
implementation,
everybody focusing
on adherence,
but there’s
also sort of
the infrastructure
delivery mechanism,
which I think
is an issue,
whether it’s
government or
health care,
that they’re
just not
set up for
interventions,
which are
sort of like
innovations.
So you’ve got
these researchers
who think of
themselves as
scientific entrepreneurs
developing the
next best thing,
thinking you build
it and they
will come,
and then you’ve
got organizations
that are sort of
built for
efficiency rather
than effectiveness
that can’t
uptake it.
If only there
were another
science,
a science to
help these
scientific
entrepreneurs
and institutions
come together
to implement
this new
research.
Maybe something
that could
be called
Implementation
science.
Implementation
science.
Implementation
science.
Implementation
science.
Okay, let’s
define
implementation
science.
It’s the
study of how
programs get
implemented into
practice and
how the quality
of that
implementation may
affect how well
that program
works or
doesn’t work.
That is
Lauren Suplee.
When we spoke
with her,
Suplee was the
deputy chief
operating officer
of a nonprofit
called Child
Trends, which
promotes evidence
based policy to
improve children’s
lives.
This whole science
is maybe 15
years old.
It’s really
coming out of
this movement of
evidence based
policy and
programs where
people said,
well, we have
this program.
It appears to
change important
outcomes.
Let’s just put
it out there
and then we
quickly realized
that there are
a lot of
issues and
actually that
put it out
there is far
more complicated.
A lot of the
evidence based
programs we have
were designed
by academic
researchers who
were testing it
in the maybe
more ideal
circumstances that
they had available
to them that
might have
included graduate
students.
It might have
been a school
district that
was very amenable
to research.
And then you
take the results
of that and
trying to put
that into
another location
is where the
challenge happened.
So coming up
after the break,
can implementation
science really
help?
You know, I want
policy science not
to be an oxymoron.
You’re listening to
Freakonomics Radio.
I’m Stephen Dubner.
We will be right
back.
What randomized
controlled trials
tell us about
an intervention
is what that
actual intervention
does in a
particular population
in a particular
context.
It doesn’t mean
that it’s
generalizable.
That, again,
is Dana Susskind
from the University
of Chicago.
But you have to
continue the science
so you can understand
how it’s going to
work in a different
place, in a different
context, in a different
population and have
the same effect.
And that’s part of
the scaling science.
The scaling science.
That is what Susskind
and her economist
collaborator John List,
who’s also her
husband, and other
researchers have been
working on.
They’ve been
systematically examining
why interventions
that work well in
experimental or
research settings
often fail to
scale up.
You can see why
this is an
important puzzle
to solve.
Scaling up a new
intervention, like
a medical procedure
or a teaching
method, has the
potential to help
thousands, millions,
maybe billions of
people.
But what if it
simply fails at
scale?
What if it ends up
costing way more
than anticipated or
creates serious
unintended consequences?
That’ll make it that
much harder for the
next set of
researchers to
persuade the next
set of policymakers
to listen to them.
So List and
Susskind have been
looking at scaling
failures from the
past and trying to
categorize what went
wrong.
You can kind of
put what we’ve
learned into three
general buckets that
seem to encompass the
failures.
Bucket number one is
that the evidence was
just not there to
justify scaling the
program in the first
place.
The Department of
Education did this
broad survey on
prevention programs
attempting to
attenuate youth
substance and crime
and aspects like
that.
And what they
found is that only
8% of those
programs were
actually backed by
research evidence.
Many programs that
we put in place
really don’t have
the research findings
to support them.
And this is what a
scientist would call a
false positive.
So are we talking
about bad research?
Are we talking
about cherry picking?
Are we talking
about publication
bias?
So here we’re
talking about none
of those.
We’re talking about
a small-scale
research finding
that was the
truth in that
finding.
But because of the
mechanics of
statistical inference,
and it just won’t
be right,
what you were
getting into is
what I would call
the second bucket
of why things
fail, and that’s
what I call the
wrong people were
studied.
You know, these
are studies that
have a particular
sample of people
that shows really
large program
effect sizes,
but when you
program is gone
to general
populations,
that effect
disappears.
So essentially,
we were looking
at the wrong
people and scaling
to the wrong
people.
And when you
say the wrong
people, the
people that are
being studied
then are to
what?
They are the
people who
are the
fraction or
the group of
people who
receive the
largest program
benefits.
So I think
of some of the
experiments that
are done on
college campuses,
right, where
there’s a
professor who’s
looking to find
out something
about, let’s
say, altruism,
and the
experimental
setting is a
classroom where
20 college
students will
come in, and
they’re a pretty
homogeneous population,
they’re pretty
motivated, maybe
they’re very
disciplined, and
that may not
represent what
the world
actually is.
Is that what
you’re talking
about?
That’s one
piece of it.
Another piece
is who will
sign their
kids up for
Head Start or
for a program
in a neighborhood
that advances
the reading
skills of the
child?
Who’s going
to be first
in line?
The people who
really care about
education and
the people who
think their
child will
receive the
most benefits
from the
program.
Now, another
way to get
it is sort
of along the
lines that
you talked
about.
It could
be the
researcher knows
something about
the population
that other
people don’t
know.
Like, I want
to give my
program its
best shot of
working.
Okay, and
what’s in your
third bucket
of scaling
failures?
The third
bucket is
something that
we call
the wrong
situation was
used.
And what I
mean by that
is that certain
aspects of the
situation change
when you go
from the
original research
to the scaled
research program.
We don’t
understand what
properties of
the situation
or features of
the environment
will matter.
there are a
really large
group of
implementation
scientists who
have explored
this question
for years.
Now, what
they emphasize
and focus on
is something
called voltage
drop.
And voltage
drop essentially
means I
found a really
good result in
my original
research study,
but then when
they do it at
scale, that
voltage drop
ends up being,
for example,
a tenth of
the original
result or a
quarter of the
original result.
An example of
this is when
you look at
Head Start’s
home visiting
services, what
they do there
is this is an
early childhood
intervention that
found huge
improvements in
both child and
parent outcomes
in the original
study, except
when they tried
to scale that
up and do
home visits at
a much larger
scale, what
they found is
that, for
example, home
visits for
at-risk families
involved a lot
more distractions
in the house
and there was
less time on
child-focused
activities.
So this is
sort of the
wrong dosage or
the wrong
program is given
at scale.
There are many
factors that
contribute to
this voltage
drop, including
the admirably
high standards
set by the
original researchers.
when the
researcher starts
his or her
experiment, the
inclination is
I’m going to
get the best
tutors in the
world, so I’m
going to be able
to show how
effective my
intervention is.
Dana Susskind
again.
you only needed
10 math tutors
and you happen
to get the
PhD students
from the
University of
Chicago, and
then what
happens is you
show this
tremendous effect
size, and in
the scaling, all
of a sudden, you
need a hundred or
a thousand, and you
no longer have that
access to those
individuals, and you
go either down the
supply chain with
individuals who are
not quite as well
trained, or you end
up having to pay a
whole lot more
money to
maintain the
trained tutor
program, and one
way or the other,
either the impacts
of the intervention
go down, or your
costs go up
significantly.
Another problem in
this third bucket,
it’s a big bucket,
is when the person
who designed the
intervention and
masterminded the
initial trial can
no longer be so
involved once the
program scales up to
multiple locations.
Imagine if instead
of talking about an
educational or
medical program, we
were talking about
a successful
restaurant and the
original chef.
When you think about
the chef, if a
restaurant succeeds
because of the
magical work of
the chef, and you
think about scaling
that, if you can’t
scale the magic in
the chef, that’s not
scalable.
Now, if the magic is
because of the mix
of ingredients, and
the secret sauce, like
Domino’s, for
example, the secret
sauce or Papa John’s
is the actual
ingredients, then
that will be
scalable.
Now, if you are
the kind of pizza
eater who doesn’t
think Domino’s or
Papa John’s is good
pizza, well, welcome
to the scaling
dilemma.
Going big means you
have to be many
things to many
people.
Going big means you
will face a lot of
trade-offs.
Going big means you’ll
have a lot of people
asking you, do you
want this done
fast, or do you
want it done right?
Once you peer
inside these failure
buckets that List and
Susskind describe, it’s
not so surprising that
so many good ideas
fail to scale up.
So, what do they
propose that could
help?
Now, our proposal
is that we do not
believe that we
should scale a
program until you’re
95% certain the
result is true.
So, essentially, what
that means is we
need the original
research and then
three or four well-powered
independent replications
of the original
findings.
And how often is that
already happening in the
real world of, let’s
say, education reform
research?
I can’t name one.
Wow.
How about in the
realm of medical
compliance research?
My intuition is that
they’re probably not far
away from three or four
well-powered independent
replications.
In the hard sciences, in
many cases, you not only
have the original
research, but you have a
first replication also
published in science.
you know, the current
credibility crisis in
science is a serious
one that major
results are not
replicating.
The reason why is
because we weren’t
serious about
replication in the
first place.
So, this sort of puts
the onus on
policymakers and
funding agencies in
a sense of saying,
we need to change the
equilibrium.
So, that
suggests that
policymakers or
decision makers, they
are being, what,
overeager, premature in
accepting a finding that
looks good to them and
want to rush it into
play?
Or is it that the
researchers are
overconfident themselves
or maybe pushing this
research too hard?
Where is this failure
really happening?
Well, I think it’s sort
of a mix.
I think it’s fair to
say that some
policymakers are out
looking for evidence
to base their
preferred program on.
What this will do is
slow that down.
if you have a
pet project that
you want to get
through, fund the
replications and
let’s make sure the
science is correct.
We think we should
actually be rewarding
scholars for
attempting to
replicate.
You know, right now
in my community, if I
try to replicate
someone else, guess
what I’ve just
made?
I’ve just made a
mortal enemy for
life.
If you find a
publishable result,
what result is that?
you’re refuting
previous research.
Now I’ve doubled
down on my
enemy.
So that’s like a
first step in
terms of rewarding
scholars who are
attempting to
replicate.
Now, to
complement that, I
think we should
also reward
scholars who
have produced
results that are
independently
replicated.
You know, and I’m
talking about tying
tenure decisions,
grant money, and the
like to people who
have given us
credible research
that replicates.
After the break,
how can researchers
make sure that the
science they are
replicating works
when it scales up?
Before the break, we
were talking with the
University of Chicago
economist John List
about the challenges
of turning good
research into good
policy.
One challenge is
making sure that the
research findings are
in fact robust enough
to scale up.
Say I’m doing an
experiment in Chicago
Heights on early
childhood, and I find
a great result, how
confident should I be
that when we take that
result to all of
Illinois or all of the
Midwest or all of
America, is that
result still going
to find that
important benefit
cost profile that
we found in
Chicago Heights?
We need to know
what is the magic
sauce.
Was it the 20
teachers you hired
down in Chicago
Heights where if we
go nationally, we
need 20,000?
So it should
behoove me as an
original researcher
teacher to say,
look, if this
scales up, we’re
going to need many
more teachers.
I know teachers are
an important input.
Is the average
teacher in the
20,000 the same
as the average
teacher in the
20?
This is the dreaded
voltage drop that
implementation
scientists talk
about.
And the
implementation
scientists have
focused on
fidelity as a core
component behind
the voltage
drop.
Fidelity
meaning that the
scaled up program
reflects the
integrity of the
original program.
Measures of
fidelity.
That’s a really
critical part of
the implementation
process.
That, again, is
Patty Chamberlain,
founder of
Treatment Foster
Care Oregon.
You’ve got to be
able to measure,
is this thing
that’s down in the
real world the
same, you know,
does it have the
same components
that produce the
outcomes in the
RCTs.
Remember, it was
Chamberlain’s good
outcomes with young
people in foster
care that made
federal officials want
to scale up her
program in the first
place.
We got this call
saying, we want you
to implement your
program in 15
sites.
She found the
scaling up initially
very challenging.
It wasn’t the
kumbaya moment that
we thought it was
going to be.
But in time,
Treatment Foster
Care Oregon became
a very well-regarded
program.
It’s been around for
roughly 30 years
now, and the
model has spread
well beyond Oregon.
One key to this
success has been
developing fidelity
standards.
So the way that we
do it is we have
people upload all of
their sessions onto
a HIPAA secure
website, and then
we code those.
And if they’re not
meeting the fidelity
standards, then we
offer a fidelity
recovery plan.
You know, we
haven’t had to drop
a site, but we
have had to have
some of the people
in the site
retrained or not
continue.
being able to
measure fidelity
well from afar
provides another
benefit to scaling
up.
It allows the
people who
developed the
original program
to ultimately
step back, so
they don’t become
a bottleneck, which
is a common
scaling problem.
There can be
sort of an
orderly process
whereby you
step back in
increments as
people become
more and more
competent doing
what they’re
doing.
And that’s
what you want
because you
don’t want to
have this tied to
the developer
forever.
Otherwise, you
can’t get any
kind of reasonable
reach.
That said, you
also need to
have some
humility.
When you’re
scaling up, you
shouldn’t assume
your original
program was
perfect, that it
won’t need
adjustment, and
you need to be
willing to make
adjustments.
For example, we
recognized that
when we were in
real-world
communities, kids
needed something
that wasn’t
therapy, per se.
they needed
skills because
the kids had
often been
excluded from
normal socializing
sort of things
like sports
teams and
clubs.
And so we
needed what
we call a
skills coach
to help
those kids
learn the
moves that
they needed
to be able
to participate
in these
pro-social
activities that
are normal
kind of things.
So you have
research, you
have a theory,
and then you
have the
implementation, and
that feeds
into more
research, more
theory, more
implementation.
Look, everybody’s
motivation at the
end of the day is
about trying to
do good for
the people they
serve.
Dana Susskind
again.
There are many
children out there,
and there are a
lot of injustices,
so we need to
move, but I
don’t know.
The science is
slower than
you’d like.
People have
wanted things
before I thought
they were ready,
and finding a
way to deal
with that dance
of people wanting
information, but
also wanting to
continue to build
the evidence.
I think we can
figure out how
to do it.
I think that’s
exactly right.
And John List
again.
I think too
many times,
whether it’s
in public
policy, whether
it’s a for-profit
or a not-for-profit,
we tend to
only focus on
one side of
the market when
we have
problems, and
you really need
to take account
of both sides
because your
optimal solutions,
the best
solutions, are
only going to
come when you
look at both
sides of the
market.
I’m probably
getting this
wrong, or at
least being way
too reductive,
but to me it
sounds like the
chief barrier to
scaling up programs
to help people
is people, that
people are the
problem.
Yeah, so I do
think inherently
it is about
people.
That said, this
is not a fatal
flaw that causes
us to throw up
our arms and
say, well, this
isn’t physics,
this isn’t
chemistry, we
have to deal
with people, so
we can’t use
science.
I think that’s
wrong, because
there are some
very, very neat
advantages of
scaling.
Think about on
the cost side,
economists always
talk about, you
know, when
things get bigger
and bigger, guess
what happens?
The per-unit cost
goes down.
It’s called
increasing returns
to scale.
The problem that
kind of we’re
thinking about is
let’s make sure
that those
policymakers who
really want to
do the right
thing in use
science, let’s
make sure that
they have the
right programs to
implement.
So one of your
papers includes
this quote from
Bill Clinton, or
at least something
that Clinton may
have said, which
is essentially
that nearly
every problem
has been solved
by someone
somewhere, but
we just can’t
seem to replicate
those solutions
anywhere else.
So what makes
you think that
you’ve got the
keys to success
here where
others may not
have been able
to do it?
You know, I
view what we’ve
done is put
forward a set
of modest
proposals as
only a start
to tackle what
I think is the
most vexing
problem in
evidence-based
policymaking,
which is
scaling.
I think we’re
just taking
some small
steps theoretically
and empirically,
but I do think
that these first
set of steps
are important
because if
you go in the
right direction,
what I’ve
learned is that
literature will
follow that
direction.
If you go in
the wrong
direction,
sometimes the
literature follows
that wrong
direction for
several years,
and we
really don’t
have the
time.
Right now,
the opportunity
cost of time
is very high.
You know, in
the end, I
want policy
science not
to be an
oxymoron,
and I think
that’s what this
research agenda
is about.
The way that I
would view it
is that the
world is
imperfect because
we haven’t
used science
in policymaking,
and if we
add science
to it,
we have a
chance to
make an
imperfect world
a little bit
more perfect.
If you want
to read the
papers that
John List and
Dana Susskind
and their
collaborators
have been
working on,
you will find
links on
Freakonomics.com
as well as
links to
Patty Chamberlain’s
work with
Treatment Foster
Care Oregon
and much more,
including, as
always, a
complete transcript
of this episode.
And we will
be back soon
with another
new episode
of Freakonomics
Radio.
Until then,
take care of
yourself.
And if you
can, someone
else, too.
Freakonomics
Radio is produced
by Stitcher
and Renbud
Radio.
You can find
our entire
archive on
any podcast
app, also
at Freakonomics.com
where we publish
transcripts and
show notes.
This episode was
produced by
Matt Hickey
with an update
by Augusta
Chapman.
The Freakonomics
Radio network
staff also includes
Alina Cullman,
Dalvin
Abuaji,
Eleanor Osborne,
Ellen Frankman,
Elsa Hernandez,
Gabriel Roth,
Greg Rippon,
Jasmine Klinger,
Jeremy Johnston,
John Schnarz,
Morgan Levy,
Neil Carruth,
Sarah Lilly,
Tao Jacobs,
and Zach Lipinski.
Our theme song
is Mr. Fortune
by the Hitchhikers
and our composer
is Luis Guerra.
As always,
thanks for listening.
So you want to
talk scaling?
Wow,
it’s a heavy
paper, right?
It’s great.
I thought it
was about
scaling fish
initially,
so that was
all my
background reading.
Yeah,
so I don’t
know anything
about what
we’re going
to talk about
today.
Neither do I,
so we can
just both
wing it.
The Freakonomics
Radio Network,
the hidden
side of
everything.
Stitcher.

Why do so many promising solutions in education, medicine, and criminal justice fail to scale up into great policy? And can a new breed of “implementation scientists” crack the code?

 

  • SOURCES:
    • Patti Chamberlain, senior research scientist at the Oregon Social Learning Center.
    • John List, professor of economics at the University of Chicago.
    • Lauren Supplee, former deputy chief operating officer at Child Trends.
    • Dana L. Suskind, professor of surgery at the University of Chicago.

 

 

Leave a Comment

AI Engine Chatbot
AI Avatar
Hi! How can I help?