Monday, February 4, 2019

Penn World Table (PWT)

The data source for real GDP per capita across countries over time. Available at

A good overview of how different versions of Penn World Table correspond to each other is given by Section 2.1 of Pinkovskiy and Sala-i-Marin (2016).

Its Section 2.2 is also useful for whether you should use PWT or World Development Indicators for real GDP per capita.

The major change from Version 7 to 8 was, if I understand correctly, a response to the critique by Johnson et al. (2009).

Below is my own rough summary of how real GDP per capita is computed up to Version 7: For more detailed accounts, see Johnson et al. (2009).

1. Collect data of prices of hundreds of identically specified goods and services prevailing in each "benchmark" country (this is done by the United Nations International Comparison Program, or ICP).
The PWT version 6 uses the 1993 ICP data. As the 2005 ICP data is now released, GDP figures in international dollars are likely to change. See Arvind Subramanian's article on Dani Rodrik's blog.

2. Obtain PPPs for the benchmark countries by comparing the prices of each good and service.

3. Use capital city price surveys by United Nations International City Service Commission, Employment Conditions Abroad (a British firm), and the US State Department, to estimate PPPs for a wider range of countries.

4. By regressing PPPs obtained in step 2 on PPPs obtained in step 3 for the sample of benchmark countries, PPPs for non-benchmark countries are estimated based on their PPP estimates obtained in step 3.

5. Use PPPs to convert the countries' national currency expenditures (from national accounts) to a common currency unit.

Steps 1-5 were carried out for the base year (1985 for PWT version 5; 1996 for PWT version 6.1).

6. Real GDP per capita in PPP for other years is obtained by applying the growth rates from the constant-price national accounts series to the base-year real GDP per capita.

See pages 329, 341-4 of Robert Summers and Alan Heston (1991) "The Penn World Table (Mark 5): An Expanded Set of International Comparisons, 1950-1988" Quarterly Journal of Economics, 106, pp.327-368.

Sunday, February 3, 2019

DMSP-OLS Nighttime Lights Time Series Version 4

Downloadable here. The spatial resolution is 30x30 arc-second (about 1x1 km) across the globe between 75 degrees north and 65 degrees south. Available annually since 1992 (and up to 2013, as of August 2016). The nighttime light intensity in each cell is represented by the "digital number", an integer from 0 to 63.

For a quick summary of the dataset, see Section I of Henderson et al. (2012). For detailed discussion on the data, see Doll (2008).

The data is becoming popular among economists.

Henderson et al. (2012) and Pinkovskiy and Sala-i-Martin (2016) use nighttime light to improve the data on national accounts GDP.

Michalopoulos and Papaioannou (20132014), and Alesina et al. (2016) use nighttime light as a measure of living standards across African ethnic groups.

Hodler and Raschky (2014) exploit the annual panel nature of the data to find that the birth place of a new national leader becomes brighter after he assumes power.

Baskaran et al (2015) relate nighttime light to electoral cycles in India.

Storeygard (2016) uses light as a measure of city-level income across cities in Africa.

Bleakey and Lin (2012) use nighttime light as a measure of spatial distribution of contemporary economic activity, to see whether portage sites still predict where economic activities are concentrated today, long after their original advantage became obsolete.

Data construction

To understand how this dataset is constructed from the original satellite images and the potential data issues, see Elvidge et al. (2001) and Elvidge et al. (2010). Noor et al. (2008) is also useful to understand this data. See also Alexei Abrahams's guest post for Development Impact Blog.

Data issues

Digital number: it's "not exactly proportional to the physical amount of light received (called true radiance)," quoted from p. 999 of Henderson et al. (2012).

Top-coding: The maximum value of light intensity is 63. This issue shouldn't matter much for poor and middle-income countries. Henderson et al. (2012) remove Singapore and Bahrain from their cross-country analysis for this concern (see footnote 16)

Bottom-censoringHenderson et al. (2012) notes that there are "remarkably few pixels with digital numbers of 1 or 2" (p. 1000). Storeygard (2016) describes how the data processing algorithm causes bottom-censoring (see Appendix section A.8).

Compatibility across years and satellites: Satellite sensors age over time and are replaced periodically. Thus, the same digital number does not necessarily mean the same level of light intensity across years and satellites. Henderson et al. (2012) deal with this concern by controlling for year fixed effects in a regression of log GDP on log light per area.
  • Alternatively, the following book chapter attempts to calibrate values from different satellites to account for inter-satellite differences and inter-annual sensor decay:
    • Elvidge, Christopher D., Feng-Chi Hsu, Kimberly E. Baugh and Tilottama Ghosh (2014). "National Trends in Satellite Observed Lighting: 1992-2012." Global Urban Monitoring and Assessment Through Earth Observation. Ed. Qihao Weng. CRC Press. (The working paper version is available here.)
    • The calibrated version aggregated to the 0.5x0.5 degree cell level is available as part of the PRIO-GRID data.
Gas flare: The digital number picks up gas flare caused by oil production. Henderson et al. (2012) drops Equatorial Guinea from their cross-country analysis for this reason (footnote 16). In one of their robustness checks, Henderson et al. (2012) also drop pixels within gas flare polygons, so does Storeygard (2016).

Blooming: Light tends to be magnified over certain terrain types such as water and snow cover.

Blurring: A single point source of light would be recorded in several neighbouring cells due to the way the satellite sensor captures the light emission. See Alexei Abrahams's guest post for Development Impact Blog for more detail.

  • To deblur the data with Abrahams's Matlab code, you need the pct_lights.tif files. Unfortunately, this file for 2011 is missing on the website. If you have downloaded and kept this file somewhere in your computer, let NOAA people know about it.

High latitude locations: Due to long daytime length, nighttime light cannot be observed in summer for high latitude locations (the raw satellite images are taken between 8:30 and 10:00 pm local time). For this reason, Henderson et al. (2012) exclude observations north of the Arctic Circle.

Validation as a measure of income/wealth

Logarithm of light intensity per area (and its long-run change over the 15-year period) is known to be linearly correlated with
Logarithm of light intensity per capita is known to be linearly correlated with
Pinkovskiy and Sala-i-Martin (2016) (p. 609) calibrate the exponent on the digital number to match the average income of the states in Mexico (obtained from Luxembourg Income Study). They note (fn. 20), "We allow the calibrated exponent to differ across years, but in no year is it smaller than 5/2, and in some years it is as large as 9. Therefore, it is likely that the specification that is prevalent in the literature (setting the exponent equal to unity) is incorrect."

Validation as a measure of public goods provision

Michalopoulos and Papaioannou (2014) shows that logarithm of light intensity per area is correlated with access to electrification, presence of a sewage system, access to piped water, and education (averaged across households in each enumeration area) from Afrobarometer Surveys in 17 African countries.

Min et al (2013) validate this measure against survey-based electricity access measure in rural Senegal and Mali in 2011. Their conclusions (quoted from Min and Gaba 2014, p. 9512) are:
  • Electrified villages are consistently brighter than unelectrified villages across a variety of nighttime satellite images
  • Electrified villages appear brighter in satellite imagery because of the presence of streetlights, and brightness increases with the number of streetlights.
  • The correlation between light output recorded by the satellite with household electricity use and access is low.
Min and Gaba (2014) conduct the same validation exercise for villages in Vietnam in 2013. They reach the same conclusions except for the last point: in Vietnam, household-level access to electricity is also correlated with nighttime light satellite images.

See also Chen and Nordhaus (2011).

Aggregation methods

The raw data ranges from 0 to 63 at the 30x30 arc-second cells. To be used in regression analysis, there are several ways to aggregate the raw data.
  • Henderson et al. (2012) (see footnote 7) obtain the weighted average across pixels within a country, where the weight is the land area of each 30x30 arc-second pixel, obtained from CIESIN/IFPRI/CIAT (2004).
  • Michalopoulos and Papaioannou (20132014) and Hodler and Raschky (2014) use the logarithm of light intensity per area within each spatial unit of analysis.
    • Logarithmic transformation is used because the distribution of nighttime light intensity is right-skewed with around 10% of observations being zero.
    • 0.01 is added to the average before taking log, to use the 10% of the observations without light.
  • Alesina et al. (2016) and Baskaran et al (2015) use the average or sum of light values from all pixels within each spatial unit of analysis divided by population.
  • Baskaran et al (2015) also measure the proportion of villages with the positive value of nighttime light at the village centroid. 
  • Storeygard (2016) measure the city-level light intensity as follows: first convert the original data "into one binary grid encoding whether a pixel was lit in at least one satellite-year. These ever-lit areas were then converted to polygons; contiguous ever-lit pixels were aggregated, and their DNs were summed within each satellite-year." (p. 1268)

Monday, December 17, 2018

Botswana 1946 census

Bechuanaland Population and Housing Census of 1946

Acemoglu and Robinson (2012), p. 412, cite it as the last census of Botswana asking questions about ethnicity. "In the Ngwato reserve, for example, only 20 percent of the population identified themselves as pure Ngwato; though there were other Tswana tribes present, there were also many non-Tswana groups whose first language was not Setswana."

Monday, November 12, 2018

Doing Business surveys

Annual cross-country data on regulations, conducted by the World Bank, since 2004. As Djankov (2016) explains, it originated in academic papers written by Andrei Shleifer and his coauthors.

The data is available for free at the World Bank's website.

Besley (2015) discusses pros and cons of this dataset, including his own finding that the correlation between the Doing Business data and firm survey data is not always as expected (Table 2).

Monday, November 5, 2018

Real GDP per capita

World Development Indicators (WDI) - in current/constant local currency unit and in current/constant US dollars since 1960


Penn World Table (PWT) - in purchasing power parity since 1950

See here for my rough summary of data construction.

See Nuxoll (1994) for the validity of using economic growth rates from Penn World Table.

See also Feenstra et al. (2004)

For version 5.6, there is an augmented version constructed by Fearon and Laitin (2003). Which is used by Miguel et al. (2004), hence contained in their dataset.

Comparison of WDI vs PWT

Discussing PWT version 6, Johnson et al. (2013) argue that while PWT is good at cross-country comparison, economic growth is better measured by WDI. See also Ciccone and Jarocinski (2010).

See Pinkovskiy and Sala-i-Martin's working paper "Newer Need Not Be Better: Evaluating the Penn World Tables and the World Development Indicators Using Nighttime Lights" for how much PWT versions 7 and 8 do any better.


Angus Maddison (2003) The World Economy: Historical Statistics (Paris: OECD)

Annual data entries, wherever possible, from 1820 until 2001.

Data for 1500, 1600, and 1700 is also available, used by Acemoglu, Johnson, and Robinson (2005)'s "The Rise of Europe" paper.

Downloadable from the book's website (you need username and password written at the end of Table of Contents in the book)

Used by Acemoglu and Johnson (2006) for their analysis on the effect of life expectancy on economic growth between 1940 and 1980.

Used also by Persson and Tabellini (2006).

For the latest updated data, see Maddison Project Database (Bolt, Jutta, and Jan Luiten van Zanden, “The Maddison Project: Collaborative Research on Historical National Accounts,” Economic History Review, 67 (2014), 627–651.)


Barro-Ursua Macroeconomic Data

An attempt to correct Maddison's data. Used by Barro and Ursua "Rare Macroeconomic Disasters" and Barro "Convergence and Modernization Revisited".

Downloadable from Robert Barro's website.

Jones-Klenow well-being measure across countries

Constructed by Jones and Klenow (2016). Quote from their abstract:
We propose a summary statistic for the economic well-being of people in a country. Our measure incorporates consumption, leisure, mortality, and inequality, first for a narrow set of countries using detailed micro data, and then more broadly using multi-country datasets.
Data can be downloaded from the AER website.

Sunday, November 4, 2018

Global Preference Survey

"an experimentally validated survey data set of time preference, risk preference, positive and negative reciprocity, altruism, and trust from 80,000 people in 76 countries" (Falk et al. (2018), abstract)

Introduced by Falk et al. (2018).