“Correcting for Sample Bias with Application to the Case of Senegal”

Population 2021
(Millions)

HDI Score
2019 (Max. 1)

SDG Score
2020-2021
(Max. 100)

Gender Inequality
Index Score
(Max. 1)

Internet Inclusivity
Index 2022
(100 countries)

Sources: 1. World Bank (2021), 2. UNDP (2019), 3. Sustainable Development Report (2021), 4. UNDP (2019), 5. Economist Impact (2022).

Overview

This Methodological Note was written by Emmanuel Letouzé, Gabriel Pestre, and Emilio Zagheni. Funding for this paper was generously provided by the World Bank Group. Data-Pop Alliance is currently developing this note as one of the three inputs for the World Bank’s 2016 World Development Report: “Digital Dividends” (forthcoming).

This paper works on modelling and correcting sample bias in Call Detail Records (CDRs). A proper understanding of sample bias is key to produce useful estimates derived from CDRs: such calculations rely heavily on a good understanding of how the sample (cell-phone users) relates to the larger populations it is drawn from (in the cases in point, Senegal and Ivory Coast and their administrative subdivisions). It could have major applications in crisis monitoring and response, as in the case of flood vulnerability predictions.

For this upcoming report we use both statistical and machine learning approaches, relying on data from Orange’s D4D challenges, official censuses, and Demographic and Health Survey (DHS) program data.

Projects

This project developed with the support of the Spanish Agency for International Development Cooperation (AECID), strengthened the technical capacities of government officials in Latin America and the Caribbean to take advantage of Big Data for sustainable development and official statistics. During the first phase of the project, through an exploratory study (see Publication below), we analyzed the current state of the infrastructure, institutional framework, regulatory framework, capacities and use cases of Big Data for the generation of public policies in 5 LAC countries: Bolivia, Dominican Republic, El Salvador, Guatemala and Peru.

The second phase focused on developing four capacity building workshops between June 2022 and March 2023.

  • Introduction to Big Data for Sustainable Development
  • Big Data and Poverty Analysis for Sustainable Development
  • Big Data and Health Analysis for Sustainable Development
  • Big Data, Security and Violence for Sustainable Development

This training itinerary provided participants with a comprehensive knowledge of the key concepts, the necessary tools and the main challenges of Big Data for sustainable development, with a special emphasis on the applicability of these data sources for statistical purposes.

This project aimed to support the Inter-American Development Bank (IDB) in preparing for the IDB Andean Summit event held on November 29, 2018, in Quito, Ecuador. A study was generated that identified new Big Data tools being developed and/or used by academic institutions, international organizations, and the public or private sector that would concretely benefit current and future IDB projects. Based on DPA’s experience, the consultancy’s goal was to contribute to the IDB’s knowledge, identification, and capabilities regarding available technological tools that provided observable material improvements at different stages of current or future projects. The study focused on IDB projects in five countries in the region: Bolivia, Colombia, Ecuador, Peru, and Venezuela.