Reliability of Self-Reported Data: Deliberate Misreporting

Program evaluations and policy proposals are only as good as the data upon which they are based. Although we all know this to be true, discussions about the reliability of data, especially self-reported data, have only recently emerged in the field of development economics. The other week, I highlighted two papers from the Journal of Development Economics’ Symposium on Measurement and Survey Design which discussed how recall bias might undermine the reliability of self-reported data. Even when recall bias is not at play though, self-reported data might be threatened by respondents’ desire to misreport their activities so as to portray their behaviors in a more positive light.

Sarah Baird and Berk Özler explore this phenomenon as it relates to education in their study, “Examining the Reliability of Self-Reported Data on School Participation.” Many Conditional Cash Transfer (CCT) programs are evaluated based on self-reported data about school enrollment and attendance rates. However, the desire to give socially desirable answers or the belief that program funding is linked to evaluation results might lead survey participants to over-report their level of school participation. Baird and Özler test the extent to which self-reported data of school enrollment rates can be considered reliable in CCT evaluations of this nature.

Using data from the Zomba Cash Transfer Program (ZCTP) in Malawi, the researchers compare self-reported enrollment and attendance data with two other sources of data: administrative program records and school ledgers. They find that study participants (school-age girls randomly selected from throughout the Zomba district) significantly overstate their school participation. Both the administrative records from the cash transfer program and the independent school ledgers confirm these findings. 

While all study participants overstated their enrollment and attendance rates, the researchers found that the extent to which this happened was significantly higher in the control group than in the group receiving the conditional cash transfers. Girls in the control group were almost three times as likely to over-report school enrollment as those in the CCT arm. It’s tempting to think that program beneficiaries would be more likely to overstate their school participation, yet these findings indicate the opposite. The authors suggest that because the attendance of CCT recipients is closely monitored, program participants might be more likely to accurately report their school participation than the unobserved girls in the control group. If these results are externally valid, CCT evaluations that rely on self-reported enrollment and attendance rates will systematically underestimate program impacts.

Concerns about the reliability of self-reported data are not unique to educational evaluations. Dean Karlan and Jonathan Zinman explore the validity of self-reported loan expenditure data in their paper, “List Randomization for Sensitive Behavior: An Application for Measuring Use of Loan Proceeds.

Most microfinance institutions evaluate their lending processes by asking loan recipients to self-report on how they spent their loan. Even if clients are assured their loan eligibility will not be affected by their responses, they may be inclined to lie if they don’t believe these assurances or if they wish to project a socially desirable image.

Drawing upon data from two studies – one from Arariwa in Peru, the other from First Macro Bank in the Philippines – the researchers measure the extent to which self-reported data about the use of loan proceeds can be considered reliable. The Arariwa study uses a technique known as “list randomization” to assess whether individuals underreport how often they use their loan proceeds for consumption rather than for investment purposes. By asking respondents to anonymously indicate whether various statements are true, list randomization allows evaluators to illicit answers to sensitive questions without directly asking the question. The second study from First Macro Bank compares how answers about loan uses vary in response to three different methods of data collection: direct questioning by the bank, direct questioning by an independent third-party surveyor, and list randomization presented by the surveyor.

The results of the first study show a sharp contrast between results elicited through direct questioning and those given in response to list randomization.  Survey respondents were far more likely to admit using their loan for household items and medical/educational expenses on the anonymous list randomization survey than they were in response to direct questioning. Likewise, in the second study, respondents were more likely to admit using their loan to pay down debt or to offset household expenses in the list randomization survey than they were when responding to bank representatives or independent surveyors. These results suggest that when loan data is collected through direct questioning, microcredit clients will significantly over-report business investments and underreport consumption.

These two studies have important implications for the design of program evaluations and the extent to which we should trust self-reported data. Baird and Özler’s findings suggest that while participants in CCT evaluations tend to exaggerate their levels of school participation, these tendencies are more pronounced among non-program participants. CCT evaluations that rely on self-reported data will therefore systematically underestimate program impacts. Karlan and Zinman’s results suggest that microfinance evaluations that rely on self-reported data will also fail to capture the true ways in which borrowers spend their loan proceeds. When directly questioned by bank representatives or independent surveyors, borrowers will exaggerate investments and underreport consumption expenses.

Whether program participants deliberately misreport their activities because they want to avoid embarrassment, wish to give socially desirable answers, or fear the consequences of bad behavior, self-reported data often fails to capture people’s true behaviors. Whenever possible, self-reported data should be supplemented with additional data collected by independent third-party sources. If this isn’t possible, innovative methods such as list randomization surveys should be used to ensure that evaluation quality is not comprised by inaccurate data. Check in with us next week for a discussion about additional data collection methods that can improve the quality of self-reported data.