The United Nations (UN) Universal Periodic Review (UPR) is a process established by the Human Rights Council aiming to monitor and improve the human rights situation in each UN member state. In this study, we hypothesize that leveraging text mining and machine-learning algorithms is a viable strategy for monitoring gender discrimination in sentencing practices of Fiji’s judiciary system, which has been the object of recommendations from Norway and Belgium in the UPR cycles of 2010 and 2015, respectively.
When focusing on Violence Against Women and Girls (VAWG) in Fiji, two types of offenses are of specific interest: sexual assault (SA) and domestic violence (DV). Legal action in cases of sexual assault and domestic violence is governed by several different laws in Fiji, but studies have shown that discriminatory practices in how and when these laws are applied may in some instances undermine their effectiveness. Determining whether or not gender discrimination has a systematic impact on the outcome of these sentences requires extensive analysis of case law archives.
Our hope is that the outcomes of this study, designed as a collaborative effort between data scientists and lawyers with known expertise in the UPR process, will encourage to develop more systematic and quantitative methodologies to track the implementation of recommendations, resulting in an increased accountability of countries towards the UPR process.
Este documento se realizó en el marco de un proyecto apoyado por el Banco Mundial e implementado por Data-Pop Alliance en asociación con el Departamento Administrativo Nacional de Estadística de Colombia – DANE. Data-Pop Alliance es una coalición sobre Big Data y el desarrollo creada conjuntamente por la Iniciativa Humanitaria de Harvard, el MIT Media Lab y el Instituto de Desarrollo de Ultramar (ODI por sus siglas en inglés) para promover una revolución de Big Data centrada en las personas.
Esta versión se benefició de los comentarios de funcionarios del DANE, especialmente de Mara Bravo, Julieth Solano, y Arleth Sorith. Comentarios y observaciones adicionales se incorporarán antes de finalizar el documento. Esta versión se benefició de contribuciones significativas por parte de Andrés Clavijo, Investigador Principal y Coordinador para Colombia de Data-Pop Alliance; Natalie Shoup Directora de Programas de Data-Pop Alliance; Carson Martinez, Asistente de Investigación de Data-Pop Alliance; y Lauren Barrett, Estratega de Medios y Comunicaciones de Data-Pop Alliance.
El financiamiento de este trabajo fue proporcionada por el Grupo del Banco Mundial cuyo apoyo se reconoce con profundo agradecimiento, así como por la Fundación Rockefeller quien provee un apoyo sustancial a las actividades de Data-Pop Alliance.
This White Paper was written by Julia Manske (Co-lead author), David Sangokoya (Co-lead author), Gabriel Pestre and Emmanuel Letouzé, with contributions from Andrés Clavijo, Natalie Shoup, Carson Martinez, and Lauren Barrett. It was produced as part of a World Bank-supported project implemented by Data-Pop Alliance in partnership with Colombia’s Departamento Administrativo Nacional de Estadística (DANE). The paper summarizes the recent debate on Big Data and its link to the operations of NSOs in particular in the context of the SDGs and scopes the opportunities and challenges that Big Data presents to NSOs in the Latin American region in the context of the SDGs. It identifies solutions to support NSOs of the region to play a role in the regional Big Data ecosystem. The paper analyses NSOs’ strength and weaknesses to engage with the Big Data ecosystem and discusses opportunities and the road ahead for Latin American NSOs for their further engagement with Big Data.
The main objective of this paper is to discuss whether and how the future of algorithms can be crafted such that their development and deployment—from their design to their use, including control, evaluation, auditing, governance—be based on and foster core democratic values such as accountability, transparency, participation, and
collaboration. In doing so, we will focus on algorithms affecting public life and policies tomaximize benefit for citizens, or ‘public good algorithms’, but the discussion aims to have broader applicability.
One of the arguments or observations it makes is that in discussions about the implications of Big Data for societies, algorithms have received both too much attention and too little consideration. Too much attention because there has been highly targeted media and public backlash on the nuts and bolds of algorithms as ‘black boxes’ that needed to be opened, at the expense of richer debates about the purpose of analysis and more importantly the nature of the data being used; too little consideration because algorithms seem to be too swiftly labeled as bad without a thorough enough reflection on the many unique levers and entry points they may offer to serve humanistic principles as part of new data life-cycles and ecosystems.
This paper (in progress) outlines the Data–Pop Alliance’s ongoing research on Big Data, climate change and environmental resilience. The paper dives deeply into the conceptualization of climate change resilience, both specific and general; addresses Big Data contributions to understanding the components of climate risk; and identifies gaps and challenges to Big Data applications to climate resilience decision-making. Finally, authors offer suggestions for individual and community engagement in building resilience.
As the first event paper in the digitising Europe’s series, this event paper captures the major key themes emerging from our events in Berlin in November 2015.
The Berlin workshop and public forum focused on the possibilities and pitfalls of using Big Data analytics for economic growth and public good. Bringing together German academic institutions, think tanks, businesses and other thought-leaders, the expert workshop focused on the ongoing political discourse in Berlin surrounding the elements framing the GDPR and EU legislation on data protection.
This paper sets out to explain modeling and correcting sample bias in Call Detail Records (CDRs). A proper understanding of sample bias is key to producing useful estimates derived from CDRs: such calculations rely heavily on a good understanding of how the sample (cell-phone users) relates to the larger populations it is drawn from. It could have major applications in crisis monitoring and response, as in the case of flood vulnerability predictions. Data-Pop Alliance uses both statistical and machine learning approaches, relying on data from Orange’s D4D challenges, official censuses and Demographic and Health Survey (DHS) program data.
This paper attempts to delineate the broad contours of “data literacy” through analysis of its history, definition, expectations, application, and promotion. The paper thus defines “data literacy” as “the desire and ability to constructively engage in society through and about data” and argues that promotion of “data literacy” should be via and for social inclusion.
This paper evaluates the opportunities, challenges and required steps for leveraging the new ecosystem of Big Data to monitor and detect hazards, mitigate their effects, and assist in relief efforts as poor communities become more vulnerable to natural hazards. There have been increasing calls to make disaster risk reduction a core development concern and to build resilience so that vulnerable communities and countries as complex human ecosystems not only ‘bounce back’ but also learn to adapt to maintain equilibrium in the face of natural hazards.
This paper highlights the potential societal benefits derived from big data applications with a focus on citizen safety and crime prevention. Authors detail a case study tackling the problem of crime hotspot classification, that is, the classification of which areas in a city are more likely to witness crimes based on past data. In the proposed approach demographic information is used along with human mobility characteristics as derived from anonymized and aggregated mobile network data. The findings support the hypothesis that aggregated human behavioral data captured from the mobile network infrastructure with basic demographic information can be used to predict crime.
This paper attempts to define what is a group and what is privacy in order to determine how a privacy right might attach to groups distinctly from the individual privacy rights of its members, and what might be the content of such a group privacy right. The challenge faced by group privacy is to enable the positive uses of Big Data while restricting the oppressive uses to the extent possible. This cannot be done by legislation or stakeholders alone; it also requires improving awareness and data literacy, and harnessing technology itself to improve data security and accountability for breaches.
This paper describes the fundamental nature of Big Data as an ecosystem and how it engages with society. Although Big Data has promising applications to real-world problems, it is met with warnings and risks–the most severe being risk to individual privacy, identity and security. In response to these challenges and risks, the paper explores the future of Big Data and how it will be shaped by academic research, legal and technical frameworks for ethical use of data, and larger societal demands for greater accountability and participation.
This paper examines Call Detail Records (CDRs) and their expanding role in providing insight into human behavior, movements, and social interactions. As a result of their growing application, certain ethical and legal questions need to be addressed. The paper summarizes current legal frameworks, explores structural socio-political parameters and incentives structuring the sharing of CDRs, proposes guiding ethical principles and discusses operational options and
This paper investigates how the world’s Big Data capacity can be understood in terms of the world’s storage capacity and the telecommunication capacity to access this storage (‘the cloud’). This paper follows the methodology of what has become the standard reference in estimating the world’s technological information capacity: Hilbert and López (2011). This methodology shows that the world’s technological capacity to store information has increased with a compound annual growth rate of 31 % during the three decades between 1986 and 2014 (from 2.6 exabytes to 4.6 zettabytes), while the world’s installed telecommunication capacity has grown with a compound annual growth rate of 35 % during the same period, from 7.5 petabites to 25 exabits).
This paper aims to contribute to the ongoing and future debate about the relationships between Big Data, official statistics and development. This paper argues that Big Data needs to be seen as an entirely new ecosystem comprising new data, new tools and methods, and new actors motivated by their own incentives. The emergence of this new ecosystem provides both a historical opportunity, and a political and democratic obligation for official statistical systems to recall, retain or regain their primary role as the legitimate custodian of knowledge and creator of a deliberative public space for and about societies.
This paper focuses on the intersection of Big Data and Sustainable Development Goals (SDGs) and the spectrum of ways and channels through which Big Data as an entirely new ecosystem could impact—contribute to or hamper—human progress as called for and measured by the SDGs. Applications of Big Data to SDGs have the potential to advocate for causes, shape incentives and inform policies This paper argues that BIg Data contributions to the SDGs should expand beyond monitoring–Big Data must contribute directly to SDGs, which will require a data-educated citizenry.
This paper considers Big Data’s potential to partly fill some key data gaps and complement or even replace official statistics. Data-Pop Alliance offers the specific case of Côte d’Ivoire, using Call Records (CDRs) from Orange in conjunction with two other datasets, the WorldPop dataset, which provides population data derived from satellite imagery, and the recently released 2013 Demographic and Health Survey (DHS). The paper intends to predict multidimensional poverty at the sous-prefecture and sub-national levels; and to predict the population of the 11 sub-national regions of Côte d’Ivoire and its 255 sous-prefectures (sub-districts).
This paper (in progress) discusses the linkages between Big Data and mobility—specifically migration and transportation. Its main objective is to give its readers—World Bank staff, policymakers, researchers, development project managers and other professionals—an overview of the main features and parameters of this nexus, as well as provide examples and discuss key considerations—technical, ethical, institutional, etc.—for developing projects, programs and other activities in the field.
Key Facts & Figures
What is big data, and could it transform development policy? Emmanuel Letouzé takes a close look at this emerging field.
Data “Inflation” Table
Each year since 2012, well over 1.2 zettabytes of data has been produced — 1021 bytes, enough to fill 80 billion 16GB iPhones (which would circle the earth more than 100 times).
Big data Bites|Early years and foundational pieces
Over the past couple of years, thousands of media articles and editorials have covered big data’s impact on society, including an early mention of the upcoming “Industrial Revolution of data” by Joe Hellerstein, a computer scientist at the University of California, Berkeley.
Big data | The dark side: ethical problems of big data
Issues of individual privacy, ethics and human rights around the use of big data are getting increasing attention.
Big data | Following the data: institutions and programmes
Responding to the promise of big data, several large foundations have already shown interest in the field.
Big data | Groups, networks, and events
Anyone interested in big data and statistics can join groups including Stanford University’s Statistics for Social Good working group and Google’s Data Science for Social Good group.