Monday, September 12, 2016

Mining Atlas

Data on all mining sites across the world. The following variables are recorded (copy-and-pasted from their website):
  • Ownership: Who owns the operation or project
  • Regions: Global (all continents and countries covered)
  • Categories: Operations and Projects
  • Types: Mines, Concentrators, Refineries, Smelters, Terminals
  • SubTypes: Open-pit, Underground, Tailings, Placer, In-Situ Leach, among others
  • Status: Conceptual / Early Exploration, Pre-Feasibility, Feasibility, Bankable, Construction, In Operation, Suspended, Closed, Restart
  • Mineral Commodity: All precious metals (e.g. gold, silver etc), base metals (e.g. copper, zinc etc), platinum group metals, ferrous metals, coal, potash, mineral sands, rare-earth elements, uranium, diamonds and many more
  • Production: Run-of-Mine and annual production
  • Geology: Reserves and Resources
  • Financial: Capex, Opex, Life-of-Mine
  • Other: start date, closure date

Friday, August 19, 2016

Historical population estimates

1. Atlas of World Population History by McEvedy and Jones (1978)
  • Only available at the country level.
  • Data quality for Central America seems unreliable
  • The data for 1500 is used by Spolaore and Wacziarg (2013) as a measure of economic performance in 1500 (see their footnote 3 for justification)
2. Krumhardt, K. (2010). ‘Methodology for worldwide population estimates: 1000 BC to 1850’, ARVE Technical Report no. 3.
  • Used by Fenske (2013) as an alternative data source of historical population to McEvedy and Jones (1978)
  • Available at the subnational level across the world
  • Available at the subnational level across the world for some countries in the Americas

Tuesday, August 16, 2016

Mobile Coverage Explorer

Collins Bartholomew distributes mobile phone coverage maps on behalf of the GSMA (an association of major GSM mobile service providers around the world). It is not for free of charge.

Used by various papers that look at the impact of mobile phones, including Pierskalla and Hollenbach (2013)Gonzalez (2015) "Social Monitoring and Electoral Fraud: Evidence from a Spatial Regression Discontinuity Design in Afghanistan" and Manacorda and Tesei (2016) "Liberation Technology: Mobile Phones and Political Mobilization in Africa".

Monday, August 15, 2016

DMSP-OLS Nighttime Lights Time Series Version 4

Downloadable here. The spatial resolution is 30x30 arc-second (about 1x1 km) across the globe between 75 degrees north and 65 degrees south. Available annually since 1992 (and up to 2013, as of August 2016). The nighttime light intensity in each cell is represented by the "digital number", an integer from 0 to 63.

For a quick summary of the dataset, see Section I of Henderson et al. (2012). For detailed discussion on the data, see Doll (2008).

The data is becoming popular among economists.

Henderson et al. (2012) and Pinkovskiy and Sala-i-Martin (2016) use nighttime light to improve the data on national accounts GDP.

Michalopoulos and Papaioannou (20132014), and Alesina et al. (2016) use nighttime light as a measure of living standards across African ethnic groups.

Hodler and Raschky (2014) exploit the annual panel nature of the data to find that the birth place of a new national leader becomes brighter after he assumes power.

Baskaran et al (2015) relate nighttime light to electoral cycles in India.

Storeygard (2016) uses light as a measure of city-level income across cities in Africa.

Bleakey and Lin (2012) use nighttime light as a measure of spatial distribution of contemporary economic activity, to see whether portage sites still predict where economic activities are concentrated today, long after their original advantage became obsolete.

Data construction

To understand how this dataset is constructed from the original satellite images and the potential data issues, see Elvidge et al. (2001) and Elvidge et al. (2010). Noor et al. (2008) is also useful to understand this data. See also Alexei Abrahams's guest post for Development Impact Blog.

Data issues

Digital number: it's "not exactly proportional to the physical amount of light received (called true radiance)," quoted from p. 999 of Henderson et al. (2012).

Top-coding: The maximum value of light intensity is 63. This issue shouldn't matter much for poor and middle-income countries. Henderson et al. (2012) remove Singapore and Bahrain from their cross-country analysis for this concern (see footnote 16)

Bottom-censoringHenderson et al. (2012) notes that there are "remarkably few pixels with digital numbers of 1 or 2" (p. 1000). Storeygard (2016) describes how the data processing algorithm causes bottom-censoring (see Appendix section A.8).

Compatibility across years and satellites: Satellite sensors age over time and are replaced periodically. Thus, the same digital number does not necessarily mean the same level of light intensity across years and satellites. Henderson et al. (2012) deal with this concern by controlling for year fixed effects in a regression of log GDP on log light per area.
  • Alternatively, the following book chapter attempts to calibrate values from different satellites to account for inter-satellite differences and inter-annual sensor decay:
    • Elvidge, Christopher D., Feng-Chi Hsu, Kimberly E. Baugh and Tilottama Ghosh (2014). "National Trends in Satellite Observed Lighting: 1992-2012." Global Urban Monitoring and Assessment Through Earth Observation. Ed. Qihao Weng. CRC Press. (The working paper version is available here.)
    • The calibrated version aggregated to the 0.5x0.5 degree cell level is available as part of the PRIO-GRID data.
Gas flare: The digital number picks up gas flare caused by oil production. Henderson et al. (2012) drops Equatorial Guinea from their cross-country analysis for this reason (footnote 16). In one of their robustness checks, Henderson et al. (2012) also drop pixels within gas flare polygons, so does Storeygard (2016).

Blooming: Light tends to be magnified over certain terrain types such as water and snow cover.

Blurring: A single point source of light would be recorded in several neighbouring cells due to the way the satellite sensor captures the light emission. See Alexei Abrahams's guest post for Development Impact Blog for more detail.

High latitude locations: Due to long daytime length, nighttime light cannot be observed in summer for high latitude locations (the raw satellite images are taken between 8:30 and 10:00 pm local time). For this reason, Henderson et al. (2012) exclude observations north of the Arctic Circle.

Validation as a measure of income/wealth

Logarithm of light intensity per area (and its long-run change over the 15-year period) is known to be linearly correlated with
Logarithm of light intensity per capita is known to be linearly correlated with
Pinkovskiy and Sala-i-Martin (2016) (p. 609) calibrate the exponent on the digital number to match the average income of the states in Mexico (obtained from Luxembourg Income Study). They note (fn. 20), "We allow the calibrated exponent to differ across years, but in no year is it smaller than 5/2, and in some years it is as large as 9. Therefore, it is likely that the specification that is prevalent in the literature (setting the exponent equal to unity) is incorrect."

Validation as a measure of public goods provision

Michalopoulos and Papaioannou (2014) shows that logarithm of light intensity per area is correlated with access to electrification, presence of a sewage system, access to piped water, and education (averaged across households in each enumeration area) from Afrobarometer Surveys in 17 African countries.

Min et al (2013) validate this measure against survey-based electricity access measure in rural Senegal and Mali in 2011. Their conclusions (quoted from Min and Gaba 2014, p. 9512) are:
  • Electrified villages are consistently brighter than unelectrified villages across a variety of nighttime satellite images
  • Electrified villages appear brighter in satellite imagery because of the presence of streetlights, and brightness increases with the number of streetlights.
  • The correlation between light output recorded by the satellite with household electricity use and access is low.
Min and Gaba (2014) conduct the same validation exercise for villages in Vietnam in 2013. They reach the same conclusions except for the last point: in Vietnam, household-level access to electricity is also correlated with nighttime light satellite images.

See also Chen and Nordhaus (2011).

Aggregation methods

The raw data ranges from 0 to 63 at the 30x30 arc-second cells. To be used in regression analysis, there are several ways to aggregate the raw data.
  • Henderson et al. (2012) (see footnote 7) obtain the weighted average across pixels within a country, where the weight is the land area of each 30x30 arc-second pixel, obtained from CIESIN/IFPRI/CIAT (2004).
  • Michalopoulos and Papaioannou (20132014) and Hodler and Raschky (2014) use the logarithm of light intensity per area within each spatial unit of analysis.
    • Logarithmic transformation is used because the distribution of nighttime light intensity is right-skewed with around 10% of observations being zero.
    • 0.01 is added to the average before taking log, to use the 10% of the observations without light.
  • Alesina et al. (2016) and Baskaran et al (2015) use the average or sum of light values from all pixels within each spatial unit of analysis divided by population.
  • Baskaran et al (2015) also measure the proportion of villages with the positive value of nighttime light at the village centroid. 
  • Storeygard (2016) measure the city-level light intensity as follows: first convert the original data "into one binary grid encoding whether a pixel was lit in at least one satellite-year. These ever-lit areas were then converted to polygons; contiguous ever-lit pixels were aggregated, and their DNs were summed within each satellite-year." (p. 1268)

Real GDP per capita

World Development Indicators (WDI) - in current/constant local currency unit and in current/constant US dollars since 1960


Penn World Table (PWT) - in purchasing power parity since 1950

See here for my rough summary of data construction.

See Nuxoll (1994) for the validity of using economic growth rates from Penn World Table.

See also Feenstra et al. (2004)

For version 5.6, there is an augmented version constructed by Fearon and Laitin (2003). Which is used by Miguel et al. (2004), hence contained in their dataset.

Comparison of WDI vs PWT

Discussing PWT version 6, Johnson et al. (2013) argue that while PWT is good at cross-country comparison, economic growth is better measured by WDI. See also Ciccone and Jarocinski (2010).

See Pinkovskiy and Sala-i-Martin's working paper "Newer Need Not Be Better: Evaluating the Penn World Tables and the World Development Indicators Using Nighttime Lights" for how much PWT versions 7 and 8 do any better.


Angus Maddison (2003) The World Economy: Historical Statistics (Paris: OECD)

Annual data entries, wherever possible, from 1820 until 2001.

Data for 1500, 1600, and 1700 is also available, used by Acemoglu, Johnson, and Robinson (2005)'s "The Rise of Europe" paper.

Downloadable from the book's website (you need username and password written at the end of Table of Contents in the book)

Used by Acemoglu and Johnson (2006) for their analysis on the effect of life expectancy on economic growth between 1940 and 1980.

Used also by Persson and Tabellini (2006).


Barro-Ursua Macroeconomic Data

An attempt to correct Maddison's data. Used by Barro and Ursua "Rare Macroeconomic Disasters" and Barro "Convergence and Modernization Revisited".

Downloadable from Robert Barro's website.

Regional Development Data

GDP at 1569 subnational regions from 110 countries in 2005 is compiled by Gennaioli et al (2013). The data is downloadable from Andrei Shleifer's website.

GDP at 1528 subnational regions from 83 countries at different points in time (wherever available) is compiled by Gennaioli et al (2014). The data is downloadable from Andrei Shleifer's website.

An alternative measure of subnational-level living standards is the wealth index provided by DHS surveys.
  • See Rutstein and Johnson 2004 for the methodology to construct the index.
  • DHS surveys are usually representative at the first level of administrative boundaries (e.g., provinces).
  • Briggs (2015) use the DHS wealth index, to see whether foreign aid reaches the poorer regions of the country.

Thursday, August 11, 2016

Brazilian census

Bustos et al. (2016) use 2000 and 2010 censuses to measure the share of employment in agriculture, manufacturing, and services at municipality level.