Tuesday, June 28, 2016

Road network spatial data

For the entire world

Global Roads Open Access Data Set
Compiled by CIESIN at Columbia University. Downloadable here. Used by Dreher et al. (2015).

For Afghanistan, see this post.

For Latin America

CIAT Latin America and the Caribbean Roads Database
Used by Acemoglu and Dell (2009), although I cannot locate it on The International Center for Tropical Agriculture (CIAT)'s website.

For Africa

Africa Infrastructure Country Diagnostic
The initiative led by the World Bank has produced road map data for quite a few countries in Africa, downloadable here. Used by Bllimpo, Harding, and Wantchekon (2013) and by Storeygard (2016).

UN OCHA Geodata
Niger, Brundhi, Liberia, etc.

FEWS-NET Africa Data Dissemination Service
For Burkina Faso, Chad, Mali, Mauritania, and Niger,
a webpage in FEWS-NET's Africa Data Dissemination Service provides road network data. Scroll down to the bottom.

For Burundi, DR Congo, Egypt, Eritrea, Kenya, Rwanda, Sudan, and Tanzania, the AfriCover project provides road network data created from satellite images. To download, register first. Then click the metadata and submit reasons and expected outputs to request the authorization for downloading the data.

Vmap0 Road Data
A geo-referenced data of the road network around the world, created by US government. Can be obtained at FAO GeoNetwork (you need to obtain the log-in ID and a password from FAO). See this post on Vmap0 data.

By Googling "Vmap0 Road", you'll see a couple of webpages (like this one) claiming that the accuracy of this dataset is questionable. See this article for more.

City Population

GRUMP Settlement Points

  • Cover cities and towns with population more than 1,000, with population in 1990, 1995, and 2000 (interpolated from the census-year data by using the UN population growth rate).

City Population

DMSP-OLS Nighttime Lights Time Series Version 4

Downloadable here.

The spatial resolution is 30 arc-second (about 1km).

Data construction

To understand how this dataset is constructed from the original satellite images and the potential data issues, see Elvidge et al. (2001) and Elvidge et al. (2010). Noor et al. (2008) is also useful to understand this data.


Min et al (2013) validate this measure against survey-based electricity access measure in rural Senegal and Mali in 2011. Their conclusions (quoted from Min and Gaba 2014, p. 9512) are:

  • Electrified villages are consistently brighter than unelectrified villages across a variety of nighttime satellite images
  • Electrified villages appear brighter in satellite imagery because of the presence of streetlights, and brightness increases with the number of streetlights.
  • The correlation between light output recorded by the satellite with household electricity use and access is low.
Min and Gaba (2014) conduct the same validation exercise for villages in Vietnam in 2013. They reach the same conclusions except for the last point: in Vietnam, household-level access to electricity is also correlated with nighttime light satellite images.

See also Chen and Nordhaus (2011).

Storeygard (2016) finds a significantly positive correlation between changes in light intensity and GDP growth at the city-level in China, with the size of the correlation coefficient comparable to global country-level analysis (see Table 1 columns 4-5).

Use in economics research

The data is becoming popular among economists. Recent examples include Henderson et al. (2012), Papaioanno and Michalopoulos (2013, 2014), and Alesina et al. (2012)Hodler and Raschky (2014) exploit the annual panel nature of the data to find that the birth place of a new national leader becomes brighter after he assumes power. Baskaran et al (2015) relate nighttime light to electoral cycles in India. Storeygard (2016) uses light as a measure of city-level income across cities in Africa.

Bleakey and Lin (2012) use nighttime light as a measure of spatial distribution of contemporary economic activity, to see whether portage sites still predict where economic activities are concentrated today, long after their original advantage became obsolete.

The raw data ranges from 0 to 63. To be used in regression analysis, there are several ways to aggregate the raw data.
  • Henderson et al. (2012), Papaioanno and Michalopoulos (20132014) and Hodler and Raschky (2014) use the nighttime light data as the measure of living standards. They use the logarithm of the average within each spatial unit of analysis.
    • Logarithmic transformation is used because the distribution of nighttime light intensity is right-skewed with around 10% of observations being zero.
    • Papaioanno and Michalopoulos (20132014) and Hodler and Raschky (2014) add 0.01 to the average before taking log, to use the 10% of the observations without light.
  • Alesina et al. (2016) and Baskaran et al (2015) use the average or sum of light values from all pixels within each spatial unit of analysis divided by population.
  • Baskaran et al (2015) also measure the proportion of villages with the positive value of nighttime light at the village centroid. 

To use this dataset as a panel data, one issue is the compatibility of different satellites in measuring light intensity. Henderson et al. (2012) simply take the average if two satellites provide the data for the same year and control for year fixed effects in regression analysis to account for any differences across years. Alternatively, the following book chapter attempts to calibrate values from different satellites to account for inter-satellite differences and inter-annual sensor decay:
Elvidge, Christopher D., Feng-Chi Hsu, Kimberly E. Baugh and Tilottama Ghosh (2014). "National Trends in Satellite Observed Lighting: 1992-2012." Global Urban Monitoring and Assessment Through Earth Observation. Ed. Qihao Weng. CRC Press.
The calibrated version aggregated to the 0.5x0.5 degree cell level is available as part of the PRIO-GRID data.

Cross-country data on the prevalence of 9 infectious diseases

The prevalence of  nine  diseases  (leishmanias,  schistosomes,  trypanosomes,  leprosy,  malaria, typhus,  filariae,  dengue,  and  tuberculosis) is compiled for 230 geo-political regions (countries, territories, protectorates, culturally different regions within a nation) by Murray and Schaller (2010).

Used by Gorodnichenko and Roland (2016) as an instrument for collectivist culture.

GPS Surveys

If you plan GPS surveys (ie. collect geographic coordinates of locations of your interest), read the following:

The GPS survey manual for Demographic and Health Surveys.

Yale Map Collection (200?) "GPS & GIS: Collecting Spatial Coordinates and Using them in ArcGIS"
This manual is more technical than the above.

UNICEF MICS surveys use Garmin eTrex 30. GPS data collection manuals are downloadable here.

MICS (Multiple Indicator Cluster Surveys)

Here's the description of MICS by UNICEF, WHO, The World Bank and UN Population Division (2007) (page 15):
Originally developed by UNICEF to help assess progress towards the goals established by the 1990 World Summit for Children, MICS surveys now serve as a monitoring tool for the MDGs and other international commitments... Three rounds of MICS surveys have been conducted, in 1994-95, 2000-01, and 2005-06. Over 50 countries participated in each round. MICS surveys typically sample 4,000 to 5,000 households, although samples can range up to 15,000 households. The women’s questionnaire includes a module on child mortality which asks how many children a woman has ever borne, when she first gave birth and how many of her children have died or survived.
See UNICEF's website for more details such as the questionnaires and how to download survey.

Monday, June 27, 2016

Global Crop Areas and Yields in 2000

Monfreda, Ramankutty, and Foley (2008) introduce the global dataset of which area harvests which of 175 crops by how much, at the resolution level of 5 min by 5 min latitude-longitude grids. See section 3 in the paper for how the data is constructed.

The dataset is downloadable at EarthStat (click "Harvested Area and Yields of 175 crops).

This dataset is an updated version of the 18 crop data, introduced by Leff, Ramankutty, and Foley (2004).

The dataset is used by Johnston et al. (2009) and Harari and La Ferrara (2013).