Tuesday, April 4, 2017


Tucker et al. (2005) compile the global NDVI data, based on the AVHRR satellite sensors, for each of 8km square cells with a bimonthly frequency between July 1981 and December 2004, by improving the methodology over the previous attempts.

  • The data is supposed to be available here, but as of April 2017, it's out of service.

The NDVI data for 2001-2006, based on MODIS satellite sensors (better data quality than AVHRR but only for more recent periods), is available here.

Historical land use datasets

One of the first attempts to compile historical land use datasets is Ramankutty and Foley (1999), who focus on the fraction of areas used for agricultural cultivation at each of the 5 by 5 arc-minute cells across the world.

  • Downloadable at here.
  • The sample period: 1700-1992
  • The 1992 data is based on their own 1992 Croplands Dataset, with a few revisions (see section 2 of the paper)
  • Using historical cropland area statistics at the national level (or at the subnational level for 8 large countries) from FAOSTAT and other sources, the 1992 data is then extrapolated backwards.
  • The extrapolation assumes that the spatial distribution of cropland within a country (or a sub-national region where historical data is available) has remained the same throughout the sample period.

There are several subsequent attempts to improve historical land use data. Below are a few examples:

  • Pongratz et al. (2008), for example, extend the analysis back to 800 by using historical population data.
  • Meiyappan and Jain (2012) start with the construction of the land cover map for the year 1765 and then estimate land use change in subsequent years, with satellite data used for validation over the past few decades.
  • HYDE 3.1 (Click the link to jump to another post in this blog)

1992 Croplands Dataset

Compiled by Ramankutty and Foley (1998).

  • They start with the DISCover land cover dataset, which classifies each of the 1 x 1 km cells into one of several land use types, based on monthly NDVI data from March of 1992 through February of 1993.
  • These classifications are then regrouped into six categories: (0) other vegetation, (1) other vegetation with crops, (2) other vegetation/crop mosaic, (3) crop/other vegetation mosaic, (4) crops with other vegetation, and (5) crops.
  • The fraction of a cell used for agriculture is calibrated for each of these six labels, to match with the national-level total cropland area from FAOSTAT and other sources.
  • The 1 x 1 km cells are aggregated into the 5 x 5 arc-minute cells.

Downloadable here.

Ramankutty and Foley (1999) update this dataset by using alternative data sources etc., as part of their "Historic Croplands Dataset, 1700-1992."

Monday, March 27, 2017

Infant and Child Mortality

Many researchers use infant and child mortality data compiled by the World Bank's World Development Indicators, by UNICEF's State of the World's Children, or, for child mortality rates only, by Ahmad, Lopez, and Inoue (2000). According to Ross (2006), the most transparent is UNICEF's (see page 866).

These international organizations now coordinate in producing infant and child mortality statistics, under the name of The UN Inter-agency Group for Child Mortality Estimation (IGME). See
UNICEF, WHO, The World Bank and UN Population Division, "Levels and Trends of Child Mortality in 2006: Estimates developed by the Inter-agency Group for Child Mortality Estimation", New York, 2007.
Section 2 of this document is also useful to learn how to estimate infant and child mortality rates from each type of data (vital registration, household surveys, etc.). The resulting estimates are available online at childinfo.org for selected years and at CME Info Child Mortality Estimates for all years since 1950.

Abouharb and Kimball (2007) introduce a dataset on annual infant mortality rates in each country for 1816-2002, by filling as many country-year cells as possible with infant mortality data from a variety of sources (I am not sure if this does not sacrifice the comparability across countries and years). The dataset and the codebook are available at www.prio.no/jpr/datasets (look for the last link for 2007 (vol. 44), no. 6). They avoid using the UN Demographic Yearbooks (which actually do provide annual data in its printed version, but not online) as much as possible. They keep the record on which data source is used, for each country-year observation. It turns out that 41 percent of observations after 1950, mainly developing countries, come from US Census Bureau's International Data Base. I am not sure why we should trust US Census Bureau more than the United Nations.

For poor countries, however, these data may be created by the interpolation of very few data points. See Qian (2015: 303-304).

GPS Surveys

If you plan GPS surveys (ie. collect geographic coordinates of locations of your interest), read the following:

Clara R. Burgert, Blake Zachary, and Josh Colston (2013) "Incorporating Geographic Information into Demographic and Health Surveys: A Field Guide to GPS Data Collection"
The GPS survey manual for Demographic and Health Surveys.

Yale Map Collection (200?) "GPS & GIS: Collecting Spatial Coordinates and Using them in ArcGIS"
This manual is more technical than the above.

UNICEF MICS surveys use Garmin eTrex 30. GPS data collection manuals are downloadable here.

Geo-referencing the location of a place from its name

If you need to geo-reference your data based on the name of a location, there are a handful of useful websites that allow you to search by the name of a location around the world (outside the US) to obtain the latitude and longitude of that location:


National Geospatial-Intelligent Agency (NGA) GEOnet Names Server

  • According to Strandow et al. (2011), this website "often contains more alternate spellings than Geonames." If you cannot find a location on GeoNames, therefore, try this server to find an alternate spelling, which you can search on GeoNames.

Global Gazetteer Version 2.3
JRC Fuzzy Gazetteer

  • It retrieves place names even if the spelling does not match perfectly. This is useful because English spellings of foreign place names rarely are standardized.
  • Gleditsch and Weidmann (2012) recommends 

Work and Iron Status Evaluation (WISE)

See Duncan Thomas's website for details.

Its consumption expenditure and price data is used by McKelvey (2011) to check the validity of unit value as a proxy for price.

The paneL data on farm labor demand and household composition is used by Lafave and Thomas (2016) to test the neoclassical agricultural household model.

Sample periods
"After a listing survey in late 2001, a population-representative sample of households living in Purworejo kabupaten were interviewed every four months beginning in 2002 and continuing through 2005. A longer-term follow-up was conducted five years after the start of the survey in 2007." (Lafave and Thomas (2016), p. 1926)
See section 4 of Lafave and Thomas (2016) for more detail, including data quality issues.