Monday, January 29, 2018

Cross-country "years of education" datasets

Penn World Tables 9.0

See this document, which reviews the academic debate on the quality of Barro-Lee dataset.

Cohen and Soto (2007)
provide an alternative data to Barro and Lee (see below)

Barro-Lee dataset

A well-known dataset on average years of schooling (i.e. stock of human capital) by 5-year age group for 146 countries from 1950 to 2010. See Barro and Lee (2013) for detail. To download the data, visit For data sources, see Appendix Notes.

For details on the data construction, read Robert J. Barro and Jong-Wha Lee, "International Data on Educational Attainment: Updates and Implications" (CID Working Paper No. 42, April 2000). This 2000 paper is an updated version of Barro and Lee (1993). Both papers compare various measures of human capital.

The average years of schooling is available for the six sets of the population: male over 25, female over 25, all over 25, male over 15, female over 15, all over 15.

Population over the age of 15 "corresponds better to the labor force for many developing countries." (Barro and Lee 2000, p.2)

Percentages of those who attained/completed each level of school in the total/male/female population are also available. Note that the sum of variables LU, LP, LS, and LH is 100; Lx-LxC, where x is either P, S, or H, is the percentage of those dropping out before completing primary, secondary, or higher school, respectively. In other words, the percentage of ".... school attained" contains the percentage of "... school complete".

Downloadable at this page by Center for International Development at Harvard University (CID).

The data file in the panel dataset format is best avoided because it excludes countries not in Penn World Table 5.0 (e.g. former socialist countries).

Note that variable SHCODE (numerical country code in Penn World Table 5.0) is different from the one in Penn World Table 5.6.

A very minor point, but the data entries for USSR/Russia in 1990 seem unreliable. Population seems to refer to USSR while educational attainment figures seem to refer to Russia.

Papers using this dataset include Acemoglu et al. (2005) and Glaeser et al. (2007).

For other datasets on average schooling years, see Kyriacou (1991), which is used by Benhabib and Spiegel (1994, JME), and Nehru et al. (1995), which is used by Pritchett (2000).

See Krueger and Lindahl (2001, JEL) for critical reviews on average schooling year data.


Paul said...

The Soto and Cohen dataset (Daniel Cohen & Marcelo Soto, 2007. "Growth and human capital: good data, good results," Journal of Economic Growth, Springer, vol. 12(1), pages 51-76) is also very good, and improves on B&L in a number of ways.

It is available here...

Paul said...

You should also mention the work on schooling quality, and the datasets used for that - Barro's 2001 dataset includes international test scores, as does Altinok (2007), which expands Hanushek's work to include the African surveys.

kdmtz said...

Thank you very much, Paul, for very useful feedbacks!

Paul said...

No worries. You should start a section for comparable education quality stats - with the new round of SACMEQ due soon there will be plenty out there, and not a lot of people with both the skills and interest to do it. I'm happy to write it and e-mail it across.

lucas said...

There is also the data set on schooling inputs by Barro and Lee, AER,1996, International measures of schooling years and schooling quality. It includes pupil-teacher ratios, spending per student, teacher salaries. It is available here:,,contentMDK:20699068~pagePK:64214825~piPK:64214943~theSitePK:469382,00.html

I asked prof. Lee and that there was no update on that data set though.