Democratizing Big Data Starts with Dialogue: Q&A with Emmanuel Letouzé

Blog

Democratizing Big Data Starts with Dialogue: Q&A with Emmanuel Letouzé

July 2, 2015

The Data-Pop Alliance director and co-founder discusses how the understanding of the relationship between Big Data and human rights is evolving.

Leaders in the field of Data for Good stress that the revolution in Big Data is moving so quickly that legal frameworks are barely able to keep pace with technology. That means that guidelines are being clarified through discussions among practitioners in the field, making these discussions key to watch to see how the field will evolve.

Emmanuel Letouzé, director and co-founder of Data-Pop Alliance, helped kick off one of the most important discussions this year with his presentation on Big Data & Human Rights at a meeting of the American Association for the Advancement of Science in January. This conference stressed the need for a nuanced understanding of the relationship between Big Data and human rights. As Letouzé has written, Big Data has the potential to detect threats such as human trafficking and the spread of disease—but without appropriate safeguards can also pose a risk to human rights such as privacy. We spoke with Letouzé on how the discourse on Big Data and human rights has since moved forward, and what guidelines for an ever-evolving field can ensure data are put to the greatest good.

Q: Half a year after your presentation, how do you feel the discourse on Big Data and human rights has changed in the Data for Good community?

A: Conceptually, there’s a greater realization of the need for clear and encompassing ethical principles, and we’ve arrived at a more comprehensive approach. Two years ago, all the discussions were very vague—“The responsible use of data” doesn’t mean much without a deeper context for discussion.

I distinguish between the “responsible use” of data, including non-disclosure of data, greater encryption, respect of current legal frameworks, etc., on the one hand, and consideration of the deeper issues of political participation, agency, empowerment, literacy, etc., on the other hand. Using a dichotomy with a long history in ethics, the former corresponds to a “thin” conceptualization of the ethics of data and the latter to a “thick” conceptualization—the difference is fundamental.

Q: What does this mean for ways of addressing the potential tension between Big Data and human rights you touched on in your presentation in January?

A: At the end of the day, the question of participation and the ownership of data should be front and center. You start linking this with human rights—political participation should be a human right—and you get into data-literacy and data-access requirements. I think that’s going to be extremely important. That’s why we’re going to work on a data-literacy program.

For some people, data literacy is reducible to data-science coursework. But, just like how being literate is not just about being able to read a simple text but being able to participate in conversations about that text, likewise the concept of data literacy should be “thickened” and should include what is done with your data.

This education needs to be for data staff, even in pretty big government agencies, but it also needs to go beyond institutions into the broader population. We have to engage with the community—it’s a new ecosystem that is still developing.

Part of the necessary cultural change is to define what we mean by “data culture.” We need to make it more complex than it currently is. The question of agency and empowerment is historically critical to fostering human development.

Q: How do you establish a workable regulatory approach for a field that’s constantly evolving?

A: Technology has moved so quickly that neither society nor its legislation has been able to catch up. We’ve been using legal frameworks that were devised in the sixties to keep up with current technology. So the risks are very diffuse and very real.

So, for example, we know that greater value comes from combining data sets—it results in much more dimension and granularity and gives researchers greater insight into the nuances of people’s behaviors. It’s very useful for machine learning. But combining data sets also leads to the increased risk of being able to consistently re-identify people using previously “anonymized” data where names have been deleted, etc.

We should have comprehensive but adaptable approaches to these questions. What I mean by that is that we won’t have ethical principles for every single situation—all of these different combinations. Instead, we should think about a framework for what should be legal and in what cases.