The Value of Replication

As a rule, replicating studies is boring and insufficiently rewarded. At least boring and insufficiently rewarded relative to striking out into new terrain. Not surprisingly, it doesn't happen much. On the other hand, it's fundamentally important for building knowledge.

There are two types of replication, both of which lose out to other kinds of studies.

The first type is replication of a given study in a different region or population. Does a commitment savings device that worked in the Philippines have the same power in Malawi? Do financial literacy trainings work in Mumbai in the same way that they do in Lima? These kinds of studies are vitally important--and are one of the reasons that research organizations like Innovations for Poverty Action are so critical--but if it's hard enough to publish the paper on saving in the Philippines or literacy in Mumbai, it's doubly hard to publish the replication studies. Still, only with these kinds of replications can we address debates about the "external validity" of findings--i.e., can the ideas be exported? And how far?

The second type of replication study is even less rewarded--it entails going back to existing studies and existing data and examining how robust the findings are to alternative ways of slicing the data.

Years ago--okay, a decade ago--I got sucked into a replication of this second type. The original study was published by Mark Pitt and Shahidur Khandker in the Journal of Political Economy and I hadn't intended to try to replicate the paper. The paper is on the impact of microcredit in Bangladesh, and I had simply wanted to get hold of their data in order to estimate a parameter that they had not. I wanted the number to pop into a cost-benefit analysis, and I thought this would be quick and easy.

Boy was I wrong. As a first step, I tried to simply organize the data to reproduce the original tables of means and standard deviations. It was surprisingly tricky, and even years of work (it would turn out) couldn't nail it 100 percent. Worse, my simple econometric estimates and analyses of identifying assumptions failed to come close to matching the originals. I had confidence in my work, but knew it would take a lot more work to demonstrate where Pitt's and Khandker's work fell apart. And they didn't seem to be in much of a mood to help, not that I could blame them. So I moved on to other projects with regret.

A year ago, David Roodman entered the scene with new ideas, new methods, and new energy. He built a brand new Stata command to replicate the Pitt-Khandker econometric approach to the T and went back to the data. Our study is now out as a working paper. It shows the puzzle described in my earlier paper but goes much further to explain the nature of "selection" problems. If you like econometric puzzles, it's well worth a read.

Even if you don't like econometric puzzles, it may be worth a read. The Newsweek blog just gave the paper some air time. It's a great piece, though it's important to correct one fundamental mis-characterization. The Newsweek piece asserts that we conclude that microcredit does nothing. That microcredit is all hype, a bubble ready to pop. Roodman and I don't conclude that microcredit does nothing: we conclude instead that you just can't tell in these data from Bangladesh. It's an important distinction--and an important demonstration of the value of replication.