What's Next? External Validity

What’s next? Jonathan Morduch says: Making RCTs more useful.

When you’re thirsty, that first gulp of water is really satisfying. But after months of just drinking water, you’ll likely start hoping for more from your beverages.

I think that’s where we are with RCTs of microfinance.

The first microfinance RCTs were refreshing. They quenched a thirst for any credible, rigorous evidence on microcredit impacts. No one was particularly hankering for data specifically on microfinance in Manila, Hyderabad, Morocco, or Bosnia. But that’s what we got. It didn’t particularly matter where the studies were from, or what the particular financial methodology was, or who exactly the customers were. Especially since the results were not only credible but surprising and provocative. Researchers were opportunistic choosing sites and partners , and who can blame them? Because the evidence gap was so vast, the new findings reshaped conversations and expectations. The studies were influential and important for good reasons. 

We’re at a different point now, and we (here I mean the research and evaluation community) need an agenda that better maps RCTs to contexts most relevant for policy and economics.

In short, what’s next is external validity.

Getting to external validity means embracing the ways that results are conditional on place and time; we need to spend much more time spelling out the particularities of the populations we study and the moments we catch. When being opportunistic we need to be more careful in documenting who and what is being studied.

The incentives for scholars work against spelling out just how conditional and particular our results are. We get more credit for results that seem widely applicable, and there are abundant temptations to play down differences between our sites and others – or at least to not play them up.

The cause of external validity has been championed most recently by the London School of Economics philosopher of science, Nancy Cartwright. In Evidence-based Policy (Oxford 2012) she and Jeremy Hardie rail against randomista claims:

Evidence-based policy. You are told: use policies that work. And you are told: RCTs—randomized controlled trials—will show you what these are. That’s not so. RCTs are great, but they do not do that for you. They cannot alone support the expectation that a policy will work for you. What they tell you is true—that this policy produced that result there. But they do not tell you why that is relevant to what you need to bet on getting the result you want here. For that, you will need to show what else you have to have and how you set about finding it.

So if you’re a donor thinking about global policy—or thinking about interventions in Egypt, say, or Peru, or China -- it’s only somewhat helpful to know how impacts played out in Bosnia or Morocco. What you really want is guidance on how to map results from Bosnia and Morocco to the place and time that you care about. How does that particular result at that particular moment in that particular program in Bosnia help predict outcomes in a different moment and different program in Egypt?

A bias for external validity can go too far. As Evidence-Based Policy progresses, Cartwright and Hardie argue that policymakers and funders should trade off internal validity and external validity. So if given the choice between doing (a) an academically-sound study (an RCT say) of a place that’s hard to generalize from, or (b) a less sound study of a place that’s easier to generalize from, Cartwright and Hardie say you should give extra weight to option (b)

In principle, I agree. But, in practice, I disagree sharply. We’ve learned that the biases in non-randomized studies (specifically those that don’t have credible identification strategies) can be huge. Biases due to self-selection into programs and the non-random location of programs can swamp real effects. And the biggest problem is we don’t know a priori how big the biases will be in a given setting. That means we can’t say things like: “The study wasn’t perfect, but even discounting the results by 25% we still get big positive results.” The trouble is that we don’t know how big the biases are, especially when it comes to household financial decisions. Should we instead discount the results by 50%? By 100%? What value is there in generalizing an incredible (in the literal sense) result? So when given the choice, my vote goes with at least learning something clear and credible. I may not be a randomista but I am at least an internalista.

What we need is to figure out a path that privileges internal validity while also elevating external validity. That means being very clear about 3 big questions:

1. How does the population studied there differ from the population I’m interested in here? Are they better educated? Poorer? Healthier? Etc.  This needs to be done explicitly with an eye to why the results may not generalize.

2. How do supporting inputs differ? Are there critical government programs in place? Good roads and transport? Community institutions?

3. How do alternatives activities differ? Does the studied intervention mostly substitute for existing opportunities? Does it complement them? In recent work in South India, we found that a very promising anti-poverty program ended up having no net impact because alternative options were so good (and the control group availed themselves of those options). The same program had bigger impacts in sites with very similar populations but where, it seems, such good alternatives were lacking.

It will not always be possible to quantify how important these factors are, since much of the time there won’t be meaningful variation across the control and treatment groups. Researchers may then be hesitant to say much of anything about them. But that’s a mistake, because in the end, Cartwright is correct: we’re usually interested in studies because we want to extrapolate from them. At the very least that means we need to bend over backward to provide the details that makes thoughtful extrapolation possible.

Inevitably, that requires the hard work of getting to know sites well, which may be an added benefit in itself. But that’s what is needed to make external validity, and more effective decision-making, what’s next.