Reliability of Self-Reported Data - Diaries and Alternative Methodologies

In last week’s blog post, I suggested that self-reported data should be supplemented with objective sources of information from independent third-party entities. Sometimes, however, independent data sources simply aren’t available and researchers have no choice but to base their analysis on self-reported data. Under these circumstances, some data collection methodologies might be more useful than others in ensuring that self-reported data are reliable. In this post, I discuss several studies of the potential of the diaries methodology and alternative strategies to capture accurate self-reported data.

Klaus Deininger, Calogero Carletto, Sara Savastano and James Muwonge examine the effect of personal diaries on the quality of self-reported agricultural data in their study, “Can Diaries Help in Improving Agricultural Production Statistics? Evidence from Uganda.” In Uganda, a large part of crop output consists of continually harvested crops such as cassava and banana. Since these crops are harvested over long periods of time, farmers who are asked to report harvest data may have trouble recalling events that happened several months earlier.

In an effort to capture more reliable agricultural data, the Uganda Bureau of Statistics (UBOS) launched the 2005/2006 Uganda National Household Survey (UNHS), a diary-based multi-purpose household survey. The UNHS drew from a sample of approximately 7500 households in 750 enumeration areas (EAs) across the country. Data on agricultural production were collected in two visits, each one corresponding to the country’s two main harvesting seasons. Diaries were distributed during the first visit and households were asked to record crop outputs until the time of the second visit, approximately five to six months later. In order to ensure diary quality and consistency, a locally respected person, in most cases a schoolteacher, visited households every two weeks to supervise the diary completion.   

Despite limited supervision and oversight from program administrators, the diaries showed promising results. Households remained in the study for an average of 5 months and made an average of 115 diary entries. Moreover, the value of crop outputs recorded in the diary differed significantly from the self-reported estimates based on recall. Output values recorded in the diaries were almost 60% higher than production values estimated from the UNHS's recall-based questionnaire. These results suggest that self-reported data elicited through recall might systematically underestimate agricultural production values. In studies involving high frequency agricultural events with some seasonal component, the findings from the UNHS study indicate that recall-based surveys should be augmented with production diaries to improve data quality and produce reliable results.

Yet, while personal diaries might be considered the gold standard for self-reported data collection, they are not always feasible. In “Methods of Household Consumption Measurement through Surveys: Experimental Results from Tanzania” Kathleen Beegle, Joachim De Weerdt, Jed Friedman and John Gibson explore how alternative data collection strategies can enhance the quality of self-reported consumption data when personal diaries are not an option.

Household consumption patterns are central to most measures of poverty in the developing world, but the survey methodologies used to capture this information differ significantly across countries and over time within countries. To test the extent to which different survey methodologies affect the quality of self-reported consumption data, Beegle et al. administered eight alternative questionnaires to 4000 randomly selected households in Tanzania. The questionnaires differed by method of data collection (diary versus recall), level of respondent (individual versus household), length of reference period for which consumption is reported (anywhere from 3 days to one year) and the degree of detail in the commodity list (from less than 20 items to over 400 items). The researchers compared the results from these surveys with data collected from frequently supervised personal consumption diaries, which they assume come closest to measuring households’ true levels of consumption.

The study yielded a number of interesting findings. Overall, the data quality of the diaries did not vary much based on the extent of supervision and the frequency of field staff visits. The one exception was for illiterate families: if infrequently supervised, illiterate households dramatically underestimated their levels of consumption when completing a household diary. Similarly, the gap between consumption levels reported in personal and household diaries was not pronounced in rural areas, but it was in urban areas. This is likely due to the fact that consumption performed outside of the home or by another family member occurs more frequently and with a higher intensity in urban areas than it does in rural areas.

In the recall surveys, a reduced number of consumption categories resulted in timesaving benefits but also substantially reduced the accuracy of the data. A 7-day recall period produced results that were closest to the benchmark results reported in the personal diaries. The authors recommend that if one recall module is to be chosen, it should be a 7-day module with a full list of consumption items. They advise, however, that researchers should carefully tailor their methods to the population with whom they are working because various demographics respond differently to alternative data collection methodologies.

Personal diaries are clearly considered the gold standard for field-based data collection. But personal diaries aren’t always feasible. Beegle et al. estimate that frequently supervised personal diaries cost roughly three times as much as infrequently supervised household diaries, and that any diary methodology costs more than recall-based methods. When financing limitations or practical considerations hinder dairy-based methods, researchers should tailor other data collection methodologies to best suit the needs of the demographic with which they are working. The Tanzanian consumption study reveals that subpopulations such as illiterate/literate households or urban/rural households may respond differently to alternative methods of questioning. While self-reported data may never be considered as reliable as diary-based data, researchers can employ certain strategies to render their data of the highest possible quality while spending a fraction of the cost that they would have on a diary-based study.