LWL #23 Harnessing Big Data for Mobility and Migration


In 2019, the International Organization for Migration (IOM) estimated that the number of migrants worldwide had reached 272 million, a figure that has tripled since 1970. Though most of these migrants fled their countries searching for better job opportunities, about 26 million people have been driven away from their homes by armed conflict, violence and climate change. In the hope of shedding light on the issue and calling for policy action, in 2000 the General Assembly of the United Nations proclaimed December 18 as International Migration Day. This global scenario brings with it many hardships and opportunities for migrants, as well as for the receiving countries.

For those seeking to understand more about the phenomenon and inform policymaking, the greatest challenge is quantifying it at a granular level and in a comprehensive manner. In that sense, the Pew Research Center has stated that the issue with migration data is first that by definition migrants are on the move, making it hard to know where they are heading or where they came from. Moreover, because they don’t always use legal means to enter a country, it is difficult to obtain registries of their activity. Finally, given the lack of a shared understanding of who a migrant is, data collection and comparability between datasets become highly complex matters.

Withstanding the hard work many national governments and international institutions have conducted to gather and assemble migration data using traditional methods (e.g. national population census, surveys or administrative records), a large gap in the quantity and quality of available data still remains. According to the IOM, the problem stems from lack of updated information and the significant financial cost attached to these methods. In recent years, Big Data has stepped in to try and fill the gap that traditional sources have left in mapping and analyzing the dynamics of national and international migration. Initiatives such as the Joint Data Center, created by the UNHCR and the World Bank, have dived into the use of different types of Big Data to monitor migration patterns and migrants’ locations around the world. In addition, different research papers (see Facebook Data Science team 2013, Clements et. al 2010, WIlliams and Ralph 2013, State, Weber and Zagheni 2020), have looked into how the use of Big Data sources IP geo locations, Google search queries, and Flickr geotags, etc. can contribute to gathering sociodemographic information, as well as the real-time location of migrants. 

The main argument for leveraging Big Data for migration is that these sources can inform the response of government and global institutions to increasing migration influxes, and therefore help in the development of adequate public policies that take into account migrants’ needs from a human rights standpoint. With that in mind, this edition of Links We Like addresses the topic of Big Data for migration and explores some of the key challenges, limitations and opportunities brought forward by this developing area of work.

The Migration Data Portal, managed by the International Organization for Migration, was developed in 2016 following the second Berlin Roundtable on Refugees and Migration. The article discusses different types of Big Data sources (i.e. Call Detail Records, Geo-located social media activity, repeated logins to the same website, and IP addresses from email activity), which can be used for migration-related studies. Furthermore, it builds on the potential that Big Data has to fill in the gaps of traditional migration data. For instance, the former can be collected at a lower cost and provide timely monitoring of public opinion. Some limitations, however, remain: ethical and privacy issues, biases, difficult access to data sources and methodological complications to unpack the large amount of “noisy” volumes of data.

In the University of Oxford Law Faculty Blog, researchers Stephan Scheel and Funda Ustek-Spilda discuss to what extent Big Data has changed the way in which migration is measured and its obstacles to solve the most representative issues in migration statistics. According to the authors, these limitations are threefold. 

  1. The politics of numbers: how numbers relating to migrants or refugees often change depending on who is reporting them and the interest behind it. 
  2. The politics of method: how different sources utilize different methods to analyze the information, and therefore, obtain different results. Additionally, there are inherent methodological problems when using some types of Big Data (e.g. social media and mobile phone records) since these tools are not used by all groups of migrants or refugees, thus the data is likely to present selection biases.
  3. The politics of (national) distinction: the different ways in which countries categorize migrants and asylum seekers.

In 2014, the United Nations Population Fund conducted a study to explore how online search data could be leveraged to better understand migration flows. This report uses Australia as a case study, and compares Google search query data from around the world with official statistics of migrants arriving in the country from 2008 to 2013. To conduct this analysis, the UNPF extracted monthly volumes of search queries related to employment (e.g. ‘jobs in Melbourne’ or ‘work visa’) from Google Correlate, and then compared the results with statistics available for the same time period. This study is an example of the way in which online search data can complement official data.

The Missing Migrants Project is a joint initiative between IOM’s Global Migration Data Analysis Center and the Media and Communications Division. It aims to track the deaths of migrants whose lives have been lost along migratory routes all around the world. The project uses official statistical data from governments, as well as sources from other agencies, NGOs, surveys and interviews with migrants and media reports. Based on the collected data, this initiative built a visualization map that identifies death incidents occurred in the European and Mediterranean area in 2015. Furthermore, given the many challenges regarding migrants casualties along migration routes (e.g. lack of available data and media coverage), the project further seeks to strengthen data availability and collection process in regions like the Middle East, North Africa, Sub-Saharan Africa, South Asia, Southeast Asia, and Latin America.

Screenshot from The Missing Migrants Map project

In this study, the Immigration Policy Lab at Stanford University and ETH Zurich used a machine learning model to simulate the allocation of 30,000 refugees aged 18-64 who had been placed by a major agency from 2011-2016. This simulation calculated, based on historical data, the likelihood of an individual refugee to find employment at each resettlement location in accordance to his or her demographic profile. The model also took into account real-world constraints such as the fixed number of job vacancies from each resettlement office. Lastly, they matched each incoming refugee case to the place that offered the highest probability of employment. In using the model, the researchers discovered that when compared to the actual historical outcomes, the average refugee was more than twice as likely to find a job if they had been placed in a hosting community by the algorithm. The authors highlight the potential value of using machine learning models to improve the experience of refugees in their host countries.


Project Report

Segundo Informe Nacional Voluntario de Guinea Ecuatorial 2024

El Segundo Informe Nacional Voluntario de Guinea Ecuatorial 2024 recoge el impacto

Journal Article

Unveiling Local Patterns of Child Pornography Consumption in France Using Tor

Child pornography—better known as child sexual abuse material (CSAM)—represents a severe form

Project Report

Stratégie Nationale des Données du Sénégal – Résumé

En 2023, DPA a fourni un soutien technique à l’élaboration de la