Our Publications

“Data for Refugees: The D4R Challenge on Mobility of Syrian Refugees in Turkey”

July 2018

Albert Ali Salah, Alex “Sandy” PentlandBruno LepriEmmanuel LetouzéPatrick VinckYves-Alexandre de Montjoye, Xiaowen Dong, Ozge Dagdelen

The Data for Refugees (D4R) Challenge is a non-profit challenge initiated to improve the conditions of the Syrian refugees in Turkey by providing a special database to scientific community for enabling research on urgent problems concerning refugees, including health, education, unemployment, safety, and social integration. The collected database is based on anonymised mobile Call Detail Record (CDR) of phone calls and SMS messages from one million Turk Telekom customers. It indicates broad activity and mobility patterns of refugees and citizens in Turkey for one year. The data collection period is from 1 January 2017 to 31 December 2017. The project is initiated by Turk Telekom, in partnership with the Turkish Academic and Research Council (TUBITAK) and Bogazici University, and in collaboration with several academic and non-governmental organizations, including UNHCR Turkey, UNICEF, and International Organization for Migration.

“How to use Big Data? Leading experts’ roadmap to data-driven innovation projects”

November 2017

Emmanuel Letouzé, Director and Co-Founder, Data-Pop Alliance David Sangokoya, Research Manager, Data-Pop Alliance In Cooperation with the Vodafone Institute for Society and Communications

This position paper highlights overall takeaways and recommendations in the areas of privacy protection, responsible data governance, transparency and accountability for unleashing big data-driven innovation, including: (1) putting ‘privacy by design’ into action: privacy-preserving technical procedures and standards for data sharing and use; (2) focusing on responsibility in data use: establishing internal responsible data governance standards and (3) Keeping transparency, trust and user control at the centre: engaging all data stakeholders.

“Understanding Patterns of Human Mobility At Different Timescales”

September 2017

Lee Fiorio, Emilio Zagheni, Guy Abel, Johnathan Hill, Gabriel Pestre, Emmanuel Letouzé, Jixuan Cai

The main objective of this paper is to discuss whether and how the future of algorithms can be crafted such that their development and deployment—from their design to their use, including control, evaluation, auditing, governance—be based on and foster core democratic values such as accountability, transparency, participation, and collaboration. In doing so, we will focus on algorithms affecting public life and policies to maximize benefit for citizens, or ‘public good algorithms’, but the discussion aims to have broader applicability.

“Algorithmic accountability – Applying the concept to different country contexts”

July 2017

This paper has been adapted by the Web Foundation from a draft report commissioned to David Sangokoya of Data-Pop Alliance.

Drawing from interviews with global experts, topic workshops and content research, this scoping paper aims to provide the reader with an understanding of algorithmic decision-making processes and the challenges they pose to our existing understanding of accountability across different contexts. It offers a map of existing technical and governance mechanisms for both identifying and addressing algorithmic harms and bias, as well as a set of recommendations and entry points for the Web Foundation and other stakeholders to contribute to this emerging field most effectively.

“Understanding the Relationship Between Short and Long Term Mobility”

June 2017

Sveta Milusheva, Elisabeth zu Erbach-Schoenberg, Linus Bengtsson, Erik Wetter, Andy Tatem

Populations are highly mobile, both in terms of long term movements of individuals relocating their place of residence as well as shorter term mobility such as commuting, seasonal travel and recreational trips. Working with call detail record data from Namibia and Senegal, we study population migration and its link to short term movement. We compare the short term mobility estimates extracted from call detail records to census data in the two countries and find a strong annual relationship, as well as distinct daily patterns in the relationship between long and short term movement. 

“Big Data – Predicting and preventing climate-related shocks”

March 2017


Big Data as a socio-technological phenomenon has the potential to generate new insights on the functioning and interaction of human and natural ecosystems. In particular, Big Data can improve our understanding of how societies deal with shocks related to climate change, and inform policies and actions to foster adaptive mechanisms. However, such positive effects will not occur automatically and investments to address the technological, human, and ethical barriers of Big Data will be necessary. This article analyzes these factors and makes a series of recommendations on the potential for leveraging Big Data for climate change resilience in LAC, impediments in doing so, and requirements if this is to be effective.

“Fair, Transparent and Accountable Algorithmic Decision-making Processes”

February 2017

 


The combination of increased availability of large amounts of fine-grained human behavioral data and advances in machine learning is presiding over a growing reliance on algorithms to address complex societal problems. Algorithmic decision-making processes might lead to more objective and thus potentially fairer decisions than those made by humans who may be influenced by greed, prejudice, fatigue, or hunger. However, algorithmic decision-making has been criticized for its potential to enhance discrimination, information and power asymmetry, and opacity. In this paper, we provide an overview of available technical solutions to enhance fairness,accountability, and transparency in algorithmic decision-making.

"Socio-Physical Vulnerability to Flooding in Senegal"

February 2017

Bessie SchwarzElizabeth Tellman, Jonathan Sullivan, Catherine Kuhn, Richa Mahtta, Bhartendu Pandey, Laura Hammett, Gabriel Pestre

Each year thousands of people and millions of dollars in assets are affected by flooding in Senegal; over the next decade, the frequency of such extreme events is expected to increase.  However, no publicly available digital flood maps, except for a few aerial photos or post – disaster assessments from UNOSAT, could be found for the country. This report tested an experimental method for assessing the socio – physical vulnerability y of Senegal using high capacity remote sensing, machine learning, new social science, and community engagement.

“Mining Case Law to Improve Countries’ Accountability To Universal Periodic Review”

February 2017

Soline Aubry, Hansdeep Singh, Ivan Vlahinic, Abhimanyu Ramachandran, Sara Fischer, Robert O’Callaghan, Natalie Shoup, Jaspreet Singh, David SangokoyaGabriel Pestre, and Carson Martinez.

The United Nations (UN) Universal Periodic Review (UPR) is a process established by the Human Rights Council aiming to monitor and improve the human rights situation in each UN member state. In this study, we hypothesize that leveraging text mining and machine-learning algorithms is a viable strategy for monitoring gender discrimination in sentencing practices of Fiji’s judiciary system, which has been the object of recommendations from Norway and Belgium in the UPR cycles of 2010 and 2015, respectively.

“Leveraging Algorithms for Positive Disruption: On data, democracy, society and statistics”

December 2016

The main objective of this paper is to discuss whether and how the future of algorithms can be crafted such that their development and deployment—from their design to their use, including control, evaluation, auditing, governance—be based on and foster core democratic values such as accountability, transparency, participation, and collaboration. In doing so, we will focus on algorithms affecting public life and policies to maximize benefit for citizens, or ‘public good algorithms’, but the discussion aims to have broader applicability.

“Oportunidades y requerimientos para aprovechar el uso de Big Data para las estadísticas oficiales y los Objetivos de Desarrollo Sostenible en América Latina”

Mayo 2016

Este documento se realizó en el marco de un proyecto apoyado por el Banco Mundial e implementado por Data-Pop Alliance en asociación con el Departamento Administrativo Nacional de Estadística de Colombia – DANE. Data-Pop Alliance es una coalición sobre Big Data y el desarrollo creada conjuntamente por la Iniciativa Humanitaria de Harvard, el MIT Media Lab y el Instituto de Desarrollo de Ultramar (ODI por sus siglas en inglés) para promover una revolución de Big Data centrada en las personas.

“Big Data and Climate Change Resilience”

November 2015

Patrick Vinck with contributions from several members of CIESIN and Research Affiliates Simone SalaBessie Schwarz, and Elizabeth Tellman

This paper (in progress) outlines the Data–Pop Alliance’s ongoing research on Big Data, climate change and environmental resilience. The paper dives deeply into the conceptualization of climate change resilience, both specific and general; addresses Big Data contributions to understanding the components of climate risk; and identifies gaps and challenges to Big Data applications to climate resilience decision-making. Finally, authors offer suggestions for individual and community engagement in building resilience.

Event Paper – Big Data and Privacy: Understanding the Possibilities and Pitfalls of the Data Revolution in Germany

November 2015

As the first event paper in the digitising Europe’s series, this event paper captures the major key themes emerging from our events in Berlin in November 2015.
The Berlin workshop and public forum focused on the possibilities and pitfalls
of using Big Data analytics for economic growth and public good. Bringing together German academic institutions, think tanks, businesses and other thought-leaders, the expert workshop focused on the ongoing political discourse in Berlin surrounding the elements framing the GDPR and EU legislation on data protection.

“Correcting for Sample Bias with Application to the Case of Senegal”

November 2015

This paper sets out to explain modeling and correcting sample bias in Call Detail Records (CDRs). A proper understanding of sample bias is key to producing useful estimates derived from CDRs: such calculations rely heavily on a good understanding of how the sample (cell-phone users) relates to the larger populations it is drawn from. It could have major applications in crisis monitoring and response, as in the case of flood vulnerability predictions. Data-Pop Alliance uses both statistical and machine learning approaches, relying on data from Orange’s D4D challenges, official censuses and Demographic and Health Survey (DHS) program data.

“Beyond Data Literacy: Reinventing Community Engagement and Empowerment in the Age of Data”

September 2015

This paper attempts to delineate the broad contours of “data literacy” through analysis of its history, definition, expectations, application, and promotion. The paper thus defines “data literacy” as “the desire and ability to constructively engage in society through and about data” and argues that promotion of “data literacy” should be via and for social inclusion.

“Big Data for Climate Change and Disaster Resilience: Realising the Benefits for Developing Countries”

September 2015

This paper valuates the opportunities, challenges and required steps for leveraging the new ecosystem of Big Data to monitor and detect hazards, mitigate their effects, and assist in relief efforts as poor communities become more vulnerable to natural hazards. There have been increasing calls to make disaster risk reduction a core development concern and to build resilience so that vulnerable communities and countries as complex human ecosystems not only ‘bounce back’ but also learn to adapt to maintain equilibrium in the face of natural hazards.

“Moves on the Street: Classifying Crime Hotspots Using Aggregated Anonymized Data on People Dynamics”

September 2015

This paper highlights the potential societal benefits derived from big data applications with a focus on citizen safety and crime prevention. Authors detail a case study tackling the problem of crime hotspot classification, that is, the classification of which areas in a city are more likely to witness crimes based on past data. In the proposed approach demographic information is used along with human mobility characteristics as derived from anonymized and aggregated mobile network data.

“Group Privacy in the Age of Big Data”

September 2015

Lanah Kammourieh, Thomas Baar, Jos BerensEmmanuel LetouzéJulia Manske, John Palmer and Patrick Vinck with contributions from Augustin Chaintreau, Yves-Alexandre de Montjoye, and Natalie Shoup

This paper attempts to define what is a group and what is privacy in order to determine how a privacy right might attach to groups distinctly from the individual privacy rights of its members, and what might be the content of such a group privacy right. The challenge faced by group privacy is to enable the positive uses of Big Data while restricting the oppressive uses to the extent possible. This cannot be done by legislation or stakeholders alone; it also requires improving awareness and data literacy, and harnessing technology itself to improve data security and accountability for breaches.

“Big Data and Development: An Overview”

May 2015

Emmanuel Letouzé, in collaboration with SciDev.net and the World Bank Group

This paper describes the fundamental nature of Big Data as an ecosystem and how it engages with society. Although Big Data has promising applications to real-world problems, it is met with warnings and risks–the most severe being risk to individual privacy, identity and security. In response to these challenges and risks, the paper explores the future of Big Data and how it will be shaped by academic research, legal and technical frameworks for ethical use of data, and larger societal demands for greater accountability and participation.

“The Law, Politics and Ethics of Cell Phone Data Analytics”

April 2015

Emmanuel Letouzé and Patrick Vinck, in collaboration with the World Bank Group and
the D4D team

This paper examines Call Detail Records (CDRs) and their expanding role in providing insight into human behavior, movements, and social interactions. As a result of their growing application, certain ethical and legal questions need to be addressed. The paper summarizes current legal frameworks, explores structural socio-political parameters and incentives structuring the sharing of CDRs, proposes guiding ethical principles and discusses operational options and requirements.

“Quantifying the Data Deluge and the Data Drought”

April 2015

This paper investigates how the world’s Big Data capacity can be understood in terms of the world’s storage capacity and the telecommunication capacity to access this storage (‘the cloud’). This paper follows the methodology of what has become the standard reference in estimating the world’s technological information capacity: Hilbert and López (2011).

“Official Statistics, Big Data, and Human Development”

March 2015

Johannes Jütting and Emmanuel Letouzé, in collaboration with PARIS21

This paper aims to contribute to the ongoing and future debate about the relationships between Big Data, official statistics and development. This paper argues that Big Data needs to be seen as an entirely new ecosystem comprising new data, new tools and methods, and new actors motivated by their own incentives. The emergence of this new ecosystem provides both a historical opportunity, and a political and democratic obligation for official statistical systems to recall, retain or regain their primary role as the legitimate custodian of knowledge and creator of a deliberative public space for and about societies.

Big Data & SDGs note for Global Sustainable Development Report

February 2015

This paper focuses on the intersection of Big Data and Sustainable Development Goals (SDGs) and the spectrum of ways and channels through which Big Data as an entirely new ecosystem could impact—contribute to or hamper—human progress as called for and measured by the SDGs. Applications of Big Data to SDGs have the potential to advocate for causes, shape incentives and inform policies This paper argues that BIg Data contributions to the SDGs should expand beyond monitoring–Big Data must contribute directly to SDGs, which will require a data-educated citizenry.

AFD Paper: CDRs & Poverty and Population Analysis – Côte d’Ivoire and Senegal

In progress

Emmanuel Letouzé and Gabriel Pestre with support from Cyrille Bellier , Thomas Roca,
Nicolas de Cordes and Orange for facilitating access to the CDRs

This paper considers Big Data’s potential to partly fill some key data gaps and complement or even replace official statistics. Data-Pop Alliance offers the specific case of Côte d’Ivoire, using Call Records (CDRs) from Orange in conjunction with two other datasets, the WorldPop dataset, which provides population data derived from satellite imagery, and the recently released 2013 Demographic and Health Survey (DHS). The paper intends to predict multidimensional poverty at the sous-prefecture and sub-national levels; and to predict the population of the 11 sub-national regions of Côte d’Ivoire and its 255 sous-prefectures (sub-districts).

“Big Data and Mobility: Migration and Transportation”

In progress

This paper (in progress) discusses the linkages between Big Data and mobility—specifically migration and transportation. Its main objective is to give its readers—World Bank staff, policymakers, researchers, development project managers and other professionals—an overview of the main features and parameters of this nexus, as well as provide examples and discuss key considerations—technical, ethical, institutional, etc.—for developing projects, programs and other activities in the field.