Tuesday, October 20, 2015

Demographic and Health Surveys (DHS)

DHS is a set of cross-country household surveys on health. See Corsi et al (2012) for a succinct overview of the surveys.

The Minnesota Population Center leads the initiative to make the DHS data comparable across countries: see the Integrated DHS website for detail.

HOW TO OBTAIN DATA: Log on to http://www.measuredhs.com and follow the procedure outlined here. (For some surveys, an extra permission of use is needed.)

LIST OF COUNTRIES: available here.
For the following countries, see also separate posts in this blog: Ethiopia, Lesotho, Paraguay, Peru, Sri Lanka, and Zambia.

SAMPLING METHOD: This is the guideline for all DHS surveys. For the actual sampling method used in each survey, see the DHS Final Report for relevant surveys, which can be downloaded at http://www.measuredhs.com/pubs. (Pick the country name and choose "DHS Final Reports" for publication type to search.)

QUESTIONNAIRE: The DHS surveys have evolved over time. There are five phases each of which contains a different set of questions (some of them are consistent across phases). The Model Questionnaire from all the five phases are available here. For the actual questionnaire used in each survey, see the DHS Final Report for relevant surveys, which can be downloaded at http://www.measuredhs.com/pubs. (Pick the country name and choose "DHS Final Reports" for publication type to search.)

The questionnaire consists of the core questionnaire (used in every survey) and optional modules (used in some of the surveys). Among these modules is maternal mortality.

CODEBOOK: For the complete list of variables in the datasets, download DHS Recode Manuals (one for each phase) from here. Beware that it is not easy to map each question in the questionnaire to the variable number in the codebook because the numbering system is very different. But the description of variables in the codebook is sometimes imprecise. Checking exactly what question is asked is therefore essential. Some variables are not available depending on which survey you are looking at. To learn this, see the ".doc" file zipped in the individual recode file for each survey.

Other Documentations:
Fieldwork Manuals

Data Processing Manual (see pp. 7-14 for how dates of birth, marriage, etc. are imputed)
Date imputation takes two stages. First, define the upper and lower bounds of the date. For the date of birth of children of interviewed women, for example, the age of the children, the age at death (if dead), the dates of vaccination, the duration of breastfeeding, etc. are used to narrow the bounds. The fact that two births cannot be less than 7 months apart (allowing for premature births) is also used to narrow the bounds. Then the date is imputed by picking one number randomly within the final bounds.

Description on dataset types (what is household recode, individual recode, etc.)

Description on file types (what is hierarchical or flat file?)

Description on file names (e.g. what does COIR41FL.ZIP stand for?)

What variables are available:

Anthropometry data (height and weight) is now collected for all the surveyed women of child bearing age. In the previous rounds of the surveys (until 1999), such data is collected only for women who gave birth within 3 or 5 years before the survey. (Exception to this is Cote d'Ivoire 1988/89 where all surveyed women were measured for their height and weight.) In the earliest rounds of surveys (until 1989), women's anthropometric data was not collected at all. Deaton (2007) uses the height data from DHS surveys extensively.

Hemoglobin level (for measuring anemia) is collected in the latest rounds of the survey (after 1999). Page 14 of the Model Questionnaire has some description on this. For details, see the following two documents:

Information on children's food intake is available though different versions collect the information in different ways. In DHS I and II surveys, what liquid and food is given during the past 24 hours as complementary diet is collected only for women still breastfeeding (available as variables V409-V414). DHS III surveys collect the same information for all living children born in the past 3 or 5 years (variables M37 or V409-V414 for the last born child). In addition, DHS III surveys ask what liquid and food is given during the past 7 days (variables M40). DHS IV surveys collect the same information only for the last born child living with his/her mother for the last 24 hours (variables M37 or V469) or for the last 7 days (variables M40 or V470).

Information on the treatment for fever and cough (symptoms of malaria and pneumonia, respectively, two of the major killers of children) is not very consistent across different rounds of surveys. First of all, the recall period changed from DHS-II surveys: in DHS-I, mothers were asked whether their child got fever or cough during the last four weeks; in DHS-II, it changed to the last two weeks. Then what treatments were given stopped being asked in DHS-III surveys. In DHS-IV, drugs taken for fever were again asked, but not for cough. To whom advice or treatment was sought keeps being collected, but it is not clear whether health professionals were not absent during the visit.

For variables V414 and V414A-D (what foods did the baby eat during the last 24 hours) in the DHS II surveys, even though the Recode Manual suggests that V414 is used only if V414-D is not collected, Burkina Faso's 1992 survey and Niger's 1992 survey use both. For both cases, it seems V414 refers to another food category in addition to V414A and V414D (in the questionnaire, three types of foods were asked about: "bouillie", "Autre aliment specialement préparé pour l'enfant", and "Plat familial").

Data on schooling for all the members of surveyed households is available in the household schedule. See variables HV106-HV110, HV121-HV129.

Data on infrastructure at the household level is available: access to electricity (HV206), telephone (HV221), and tap water (HV231).

Self-reported ethnicity of women is available (variable V131) for some surveys including the Rwanda 1992 survey.

References for understanding the dataset further

In order to understand the medical background for each variable, read Model Questionnaires, which also explain the purpose of each question. Do have a look at the earlier versions of the questionnaire because the newest version often describes the purpose of questions newly added only.

For further medical background for child health history questions (immunization, treatment of diarrhea, etc.), Gareth Jones et al. (2003) "How Many Child Deaths Can We Prevent This Year?" The Lancet, 362: 65-71 is extremely useful.

I have created this document to help researchers to match subnational districts in Sub-Saharan African countries between different rounds of DHS surveys.

Lubotsky and Wittenberg (2006, Review of Economics & Statistics) proposes how to make the best of asset ownership variables in the DHS data to estimate the impact of household wealth.

Papers using this dataset include:

Vogl (2016) to demonstrate how the relationship between fertility and income/education flips over the decades:

  • "[T]he associations of fertility with parental durable goods ownership and paternal education flipped from positive to negative in Africa and in rural Asia [between the 1986-94 and 2006-11 periods]; they were negative throughout in Latin America." (page 367)
  • "Among birth cohorts of the 1940s and 50s, most countries show positive associations between the number of ... siblings and educational attainment. Among cohorts of the 1980s, most countries show the opposite." (page 367)

Oster (2007) to measure sexual behaviour and knowledge on HIV in African countries.

Oster (2005) to measure sex ratios at birth in African countries.

Young (2005) for the data on fertility and education. In some footnotes of this paper, the author reports a couple of peculiarities found in the data (see footnotes 35 and 58).

Dow et al. (1999) use Malawi 1992, Tanzania 1994, Zambia 1992, and Zimbabwe 1994 to investigate the effect of tetanus vaccine on birth weight.

Pitt (1997) use the datasets for 14 Sub-Saharan African countries to investigate the determinants of child mortality.

Used by demographers to investigate the determinants of immunization uptakes. See Desai and Alva (1998) and Gage et al. (1997).

Thomas et al. (1991) use Brazil 1986 survey to investigate how parental education affects child height, by using the information on the frequency for mothers to listen to the radio, watch television, and read a newspaper.

Other papers using the DHS surveys can be found here.


Andrew said...

Thanks for your explanations of DHS. It is a pretty incredible but at times confusing source of data. I am a beginning user, do you understand what the variables with prefix *hm* are in the household recode of DHS surveys? Do the underscore numbers each correspond to a member of a household? And how does one link the member to his or her age and other characteristics. Thanks again.

Indian_Economist said...

if you still have the document showing how to match districts in sub-saharan african countries in the dhs datasets, could you please upload the link again? the current url does not exist. thanks for all the info, it's very useful.

Masa said...

It's available at here. (Look at the last several pages.)


Soph said...

The link for the document is broken again...But anyway thanks for a great blog!