Not to be deterred, I’ve decided to continue my Uncovering
Causality series because a) I think making statistics accessible is important and b) the nerd in me enjoys explaining statistics in a way that's intuitive. So, without further ado, here is a post on Regression
Discontinuity Designs (RDDs) – one of the cleverest and most interesting
statistical techniques you can come across.
The Selection Problem
Before understanding RDDs, we need to understand
the selection problem they seek to address. Suppose we wish to understand the
average ‘treatment effect’ of going to hospital – do hospital visits make us better
or worse? One way we could do this is to compare the health status of people
that have been recently visited hospital vs. those who haven’t. We can rank health
status on a scale of 0-9, where 9 = perfect health and 0 = death’s door. Suppose we observe the following:
In words, the group of people that hasn’t been to
hospital is healthier than the group that has. From this, can we
conclude that hospitals are harmful to our health?
Thankfully, the answer is no. Almost de facto, people who
visit hospital are less healthy than those who don’t, so it is unfair to
compare these groups ex post. In an ideal world, we would want to compare the
health status of the treated group (4.4) to what it would have been had they not
been treated. Unfortunately, we cannot observe this ‘counterfactual’ – so
we have to resort to clever statistical techniques to tease out what it might
have been.
Overcoming the Selection Problem
The best way to do this is to ensure that treatment is randomised.
Randomised Controlled Trials offer one option: by randomly deciding who
gets treatment, we ensure that both treatment and control groups are as similar
as possible ex ante, allowing them to be reliably compared ex post.
However, for practical and ethical reasons, RCTs are not always viable (the
ethical issues with randomising hospital treatment hardly need stating).
Fortunately, there are other ways to generate randomness outside of an
experimental setting.
Regression Discontinuity Designs
In October, 000s of 10-year-old kids will line up outside my
old school to take the 11+: a gruelling 3-hour exam based on non-verbal and
verbal reasoning. If they score above a certain mark, they are admitted to
grammar school; if not, they go to comprehensive school[1]
(for simplicity, assume there are no private schools in this world). To
estimate the treatment effect of selective education, we might plot children’s
11+ mark against the number of UCAS points[2]
they achieve upon completing secondary school.
From this set-up, how can we calculate the treatment
effect of going to grammar school? It would be unfair to simply compare the
UCAS scores of kids who got in to grammar school vs. those who didn’t. After
all, kids who get into grammar school at age 11 are likely more intelligent and
affluent to begin with – a classic selection problem.
However, what about the kids who just missed out vs.
the kids who just about made it? Presumably these kids are pretty similar.
Whether they got in or not may have depended on whether the right questions
came up, whether they slept well the night before or whether they guessed
correctly on a number of multiple-choice questions. These factors are largely
random: enabling us to identify a ‘treatment effect’ of selective education –
at least for the kids that scored near the cut-off.[3]
Early Childhood Development
Early childhood development matters – and assessing the
effect of postnatal interventions is vital in informing effective health
policy. However, in doing this, it is unfair to compare outcomes of babies that
received postnatal treatment vs. those that didn’t – this is the classic healthcare
selection problem outlined at the start.
However, what if there was an arbitrary cut-off that
determined whether or not they got treated? For example, in Norway and Chile,
new-borns below a 1500g cut-off get given additional respiratory and surfactant
treatment, while those above it do not. Using the assumption that a 1490g baby
is virtually identical to a 1510g baby, Bharadwaj et al. exploit this cut-off to
estimate the ‘treatment’ effect of additional post-natal care. The results are
displayed below: in both Chile and Norway, babies just below the 1500g cut-off
perform significantly better in Maths exams that take place almost a
decade later.
Gaming the System
The biggest risk to RDDs is people gaming the system. The
more people can exert influence on which side of the cut-off they
fall, the less we can assume that treatment is random. While mother’s cannot
meaningfully control their child's birthweight to ensure it falls below a certain cut-off, they might nonetheless be able to control which birthweight is recorded. Doctors in certain hospitals may
be easier to convince to record a false measurement to ensure the baby gets
extra care. If this happens, kids born just below the cut-off are no longer born
in a random sub-section of hospitals – compromising our ability to infer a reliable
treatment effect.
Despite this flaw, RDDs provide a clever way to untangle causality
outside of an expensive experimental context. Their surge in popularity
has no doubt contributed to the recent ‘credibility revolution’ in empirical
economics. Valid RDDs are not easy to come by, however – like instrumental variables, identifying a good ‘cut-off’ requires imagination and ingenuity.
This is partly why I like them: they force the researcher to shut the
Econometrics textbook and think creatively about the world around them.
Word count: 993 words
References
- Bharadwaj et al. (2013), ‘Early Life Health Interventions and Academic Achievement’
For an example of a really creative RDD, check out this
paper by Melissa Dell about the long-run impact of forced mining systems in
Peru and Bolivia: https://scholar.harvard.edu/files/dell/files/ecta8121_0.pdf
[1]
For my non-British readers: grammar schools are selective state schools (i.e. you have to pass a test to get in);
comprehensive schools are non-selective state schools
[2] Again,
for my non-British readers: UCAS points are what our grades are converted to
when we apply to University, sort of like a GPA
[3] In
the jargon, this is known as a Local Average Treatment Effect, since the
effect is only estimated on a certain sub-section of the population i.e. those
who scored near the cut-off
No comments:
Post a Comment