Open Algorithms: A New Paradigm for Using Private Data for Social Good

Blog

Open Algorithms: A New Paradigm for Using Private Data for Social Good

August 25, 2016

This article first appeared on Devex on July 18th, 2016.

Few data collected are actually used to improve people's lives or in the design of better public policies, write AFD’s Thomas Roca and Data-Pop Alliance’s Emmanuel Letouzé, in this guest column for Data Driven. How can we turn this around, ensuring that information is used in a privacy-conscientious and inclusive way? Photo by: reynermedia / CC BY

The term “datafication” was coined to describe the consequences of the digital revolution: the growing generation, collection and storage of digital information concerning all aspects of the physical world (including earth activity, weather, climate, and biosphere), human lives and activities (including DNA, vital signs, consumption, and credit scores), and societal patterns (including communications, economic activity, and mobility). The datafication of the world is fueled by the automatic generation of data through the billions of digital devices that surround us: cellphones and tablets, e-devices, security cameras, credit cards, badges and satellites. However, little of the generated information is actually used to improve people’s lives or in the design of better public policies.

Data, the commodity of the 21st century

The flow of information resulting from the data deluge is mainly stored within data centers, as a commodity, typically legally owned by the private companies collecting them – telecom operators, social media companies, or banks, among others. These data are analyzed for internal and commercial purposes – think of how Amazon or Facebook operates, for example – and hold tremendous fiduciary value. Companies whose investments, innovations and systems contribute to generating and storing these data cannot simply surrender it. But many private companies do not realize the public good value of these data – including how they could benefit from opening up “some” of their data if it helps to grow economies or prevent epidemics. Even when they do, they face not only commercial but also ethical and legal incentives not to open their data further. Indeed, not all data should be open. Personal data collected through our usage of social networks, our mobile phone activities, sensors and connected devices all inform pretty accurately our way of life: our location, whether real time or historical; the people we communicate with; the content of our private messages or emails; our heart rate, or even our most intimate feelings – we wouldn’t like such information to be publicly available. Meanwhile, the case for opening and using data has become clearer in recent years. First, the “open data movement” has shown how opening up data could foster public innovation, foster civic engagement, accountability and transparency. Second, a handful of companies – chief of which telecom operators, including Orange, Telefónica and Telecom Italia – have experimented with “data challenges,” whereby some data are made available to researchers in a tightly controlled manner – making them difficult to scale. The successes and results of these challenges revealed and stirred up growing demands for more “private” data to be made available. Some, such as Kenneth Cukier, the Economist's senior editor for data and digital, even consider that not using these data is “the moral equivalent of burning books.” But the dilemma remained between privacy and utility; between commercial, individual and societal considerations, and so on.

Which data should be accessed, for what and by whom?

Two recent developments further complicate the debate. One is the finding that “anonymizing” data was much harder than previously thought – the uniqueness of our behaviors and the interconnectedness of datasets in which we appear, makes “reidentifiability” possible. This all but rules out the option of “simply” releasing personal data without personally identifiable information as a long-term solution. Another development was the “Facebook emotion study,” where the social media giant used data and manipulated the newsfeeds of hundreds of thousands of users as part of an experiment that was perfectly legal but deemed unethical – putting the notion of what “informed consent” meant and entailed back at the forefront of these debates. The concern that algorithms operate as “black boxes” that could embed and help entrench biases and discriminations has also gained ground. And the pressures to use these data to improve people’s lives will most likely continue growing – including in support of the Sustainable Development Goals – alongside people’s demand to have greater control over this use – in ways that respect individual and group privacies, commercial interests, and of course prevailing legal standards.

Leverage and strengthen public-private-people partnerships and local ecosystems

The Open Algorithm project: Developing indicators, capacity and trust

To address the complex challenge of data access, Orange, MIT Media Lab, Data-Pop Alliance, Imperial College London and the World Economic Forum – supported by Agence Française de Développement and the World Bank – are developing a platform to unleash the power of “big data” held by private companies for public good in a privacy preserving, commercially sensible, stable, scalable and sustainable manner. In its initial phase of deployment, the Open Algorithm project, or OPAL, will focus on a small set of countries in Latin America, Africa and Asia, with technical support from a wide range of partners including Paris21, Microsoft, and Deloitte Consulting LLP. OPAL’s core will consist of an open technology platform and open algorithms running directly on the servers of partner companies, behind their firewalls, to extract key development indicators of relevance for a wide range of potential users, including national statistical offices, ministries, civil society organizations, media organizations, etc. Examples of potential indicators and maps produced with greater frequency and levels of geographic granularity currently available include poverty, literacy, population density, social cohesion – all on which the literature has shown that “big data” analysis could shed light.

As a “platform” to unleash the power of these “big data” held by private companies for public good, the AFD-supported OPAL initiative has three key aims:

To engage with data providers, users and analysts at all stages of its development, including during the development of algorithms.
Contribute to building local capacities, connections and help shape the future technological, political, ethical and legal frameworks that will govern the local collection, control and use of “big data” to foster social progress.By “sending the code to the data” rather than the other way around, OPAL seeks to address these challenges and spur dialogues and develop data services on the basis of greater trust between all parties involved – including between citizens, official statistical systems and private corporations.
To build data literacy among users and partners; not just data literacy defined as “the ability to use data,” but conceptualized in a broader and deeper sense as literacy in the age of data and defined as “the ability to constructively engage in society through and about data.” In the 21st century, being “data literate” in that sense will be as much a fundamental human capability as a useful professional skill set; both an enabler and marker of human agency.Mass data literacy will be as essential to development and democracy as mass literacy has been during the 20th century. Building this kind of data literacy across institutions and groups will require large-scale sustained initiatives and investments that have not yet materialized.