Home

Friday, March 8, 2019

Uncovering Causality II: Instrumental Variables

While randomised controlled trials may be the hottest tool in the econometricians toolbox, the most contentious, capricious but potentially brilliant tool is the instrumental variable (IV). A bad instrument will never see the light of day; but a good instrument can elevate the statistician from humble programmer to creative genius, catapulting the researcher to academic stardom.

Instrumental Variables


As is often the case, instrumental variables are best explained by means of example. Suppose I hope to estimate the causal effect of class sizes on student test scores. If I collect data on both, I may get a scatterplot that looks something like this:


What is the problem with extrapolating a causal relationship from this result? As before, there may be ‘confounding’ variables that influence both X and Y, obscuring the true relationship between the two. For example, low class sizes may be correlated with school wealth (e.g. richer schools can afford lower student-classroom ratios) which also affects test scores through other ways (e.g. better textbooks and non-classroom facilities).

One way to avoid this problem is to identify a variable/instrument (Z) which i) affects class size (X) and ii) is not correlated with anything else that affects test scores (Y). In econometrics jargon, i) is known as the relevance requirement and ii) is known as the exogeneity requirement. If both criteria are met, we can effectively ‘isolate’ the random variation in X and identify a true causal effect.




For example, in the context of class sizes and test scores, a candidate instrument could be the occurrence of earthquakes. In places like California, schools sometimes shut down due to infrastructure damage caused by periodic earthquakes. While the schools are being repaired, affected students are temporarily transferred to undamaged nearby schools, causing a sudden increase in class sizes.

We could also argue that earthquakes do not affect test scores (Y) in any other way apart from via their effect on class sizes (X). They do not change the school syllabus, they do not change the timing of the exams and they do not change the underlying ability of students. Assuming this argument is correct, we are left with a nice natural experiment: we have a random, isolated increase in class sizes (X) that allows us to identify its independent contribution to test scores (Y).

The Exogeneity Requirement: An Achilles Heel


Unfortunately, plausible instruments are difficult to identify. The relevance requirement is fairly easy to meet: it is not difficult to find variables that are correlated with the treatment (e.g. class sizes). However, the exogeneity requirement represents a perennial stumbling block. In my previous example, the exogeneity requirement is almost certainly not met. We can theorise many potential effects of earthquakes on test scores besides its impact on class sizes: for instance, earthquakes can have localised income effects as well as psychological effects on affected students, both of which effect test results. Alas, academic stardom will have to wait.

To make matters worse, the standards for an acceptable IV have generally become a lot more stringent. This is partly due to the emergence of new causal identification techniques that have raised the bar when it comes to credible identification strategies. The graph below gives a nice illustration of these new econometric techniques: some of which I may discuss in future posts.


Source: The Economist

As a result, some of the most iconic IVs appear unable to bear the weight of contemporary scrutiny. For example, perhaps the most famous IV in all of Economics is the use of settler (colonial) mortality rates to isolate the causal effect of institutions on growth (Acemoglu and Robinson, 2001). The authors theorised that in places where settler mortality rates were high (e.g. Africa, South America), settlers were more likely set up ‘extractive’ institutions in which a small group of individuals exploited the rest of the population. Conversely, in places where settler mortality rates were low (e.g. North America), settlers were more likely to establish trade relationships and ‘inclusive’ institutions, which included many people in the process of governing.

The authors go on to argue that these initial conditions greatly influenced the quality of economic and political arrangements today, due to the ‘sticky’ nature of institutions. Hence, the relevance requirement is met: Z (settler mortality rates) affect X (modern institutions). Then comes the leap of faith: the authors claim that historic settler mortality rates (Z) have not affected modern day growth (Y) except via their effect on the institution’s they brought about (X).

Needless to say, the validity of this instrument has received a hammering in recent years. For instance, settler mortality rates (Z) seem correlated with historic disease environment which is itself correlated with modern day growth (Y). These criticisms appear unanswerable: I remember a senior faculty member at Oxford once telling me that the IV would probably not get published today. Still, the above example serves as a testament to human creativity and to the rapid advancement of scientific standards.

All Hope is Not Lost


With this being said, instrumental variables are not completely defunct. One of the best IVs I have seen concerns estimating the effect of incarceration length (X) on subsequent employment outcomes (Y) (Kling, 2006). The immediate difficulty in estimating this effect is that people who get longer prison sentences are likely to be different from those who get shorter sentences, in ways that matter for future labour earnings.

To get around this, we can use the fact that judges are randomly assigned to cases, and that some judges appear systematically harsher at sentencing than others. This appears to meet the conditions of a valid instrument: the particular judge (Z) a defendant gets certainly effects his/her likely incarceration length (X), but is unlikely to be correlated with any other factor which effects his/her future earnings (Y).

This, I think, represents a plausible identification strategy with many avenues for future research (see this blog post). Though standards have risen, instrumental variables are certainly not obsolete. But remember: IV regression is only as good as the instrument itself!

Word count: 1,000 words

References