This Methodological Note was written by Emmanuel Letouzé, Gabriel Pestre, and Emilio Zagheni. Funding for this paper was generously provided by the World Bank Group. Data-Pop Alliance is currently developing this note as one of the three inputs for the World Bank’s 2016 World Development Report: “Digital Dividends” (forthcoming).
This paper works on modelling and correcting sample bias in Call Detail Records (CDRs). A proper understanding of sample bias is key to produce useful estimates derived from CDRs: such calculations rely heavily on a good understanding of how the sample (cell-phone users) relates to the larger populations it is drawn from (in the cases in point, Senegal and Ivory Coast and their administrative subdivisions). It could have major applications in crisis monitoring and response, as in the case of flood vulnerability predictions.
For this upcoming report we use both statistical and machine learning approaches, relying on data from Orange’s D4D challenges, official censuses, and Demographic and Health Survey (DHS) program data.