Overview of the Technical Workshop “Web Data Collection and Analysis: Web Scraping and API's Interaction”, held at IBGE Offices in Rio de Janeiro, November 2019

Blog

Overview of the Technical Workshop “Web Data Collection and Analysis: Web Scraping and API’s Interaction”, held at IBGE Offices in Rio de Janeiro, November 2019

December 9, 2019

On November 25-27, Data-Pop Alliance, ECLAC and IBGE conducted the first technical workshop in Rio de Janeiro. Moreover, for the first time, Data-Pop Alliance organized a three-day training tailored particularly for the needs of the staff at the Brazilian National Statistical Office (IBGE). The goal was helping them to build and strengthen internal capacities to leverage web data collection and analysis in their projects.

The workshop has emerged as part of a broader training program designed for development practitioners, policymakers, and researchers interested in enhancing Big Data knowledge and skills. In this particular case, the training’s program was built around the Technical Tutorial that Data-Pop Alliance has developed on internet data collection for previous Professional Workshops for Sustainable Development and Digital Economy (see past editions held in São Paulo, Mexico City, Santo Domingo and Bogotá).

During the three days, several interactive modules were introduced. In these, the facilitators used Python and Open Source libraries to collect and access web data, as well as to analyze it and to generate interactive visualizations. Over 20 participants were in attendance; among them were technical managers, analysts, and developers from IBGE, whose background includes statistics, computer science, and mathematics.

A closer look to the workshop

During Days 1 and 3, participants were exposed to seven modules whose objective was to produce “Purchase Power Parities” exchange rates based on a basket of goods that contained only real estate rent prices from 4 Latin American countries: Brazil, Mexico, Chile and Peru. The collected data came from several websites and reflected relevant insights around housing price levels on capital and important cities of these countries. The last module of this sequence, served for promoting new statistical methods from the field of Artificial Intelligence, to correct selection bias –intrinsic to Big Data sources–, and to estimate the uncertainty around the geospatial distribution of price per square meter in Rio de Janeiro.

On Day 2, we delivered 4 modules on collection, analysis and visualization of Venezuelan migration flows. Using Facebook Marketing, we were able to study migration into American countries and disaggregate these flows into several Brazilian states. These modules were inspired on the amazing work done by our colleagues from Qatar Computing Research Institute at HBKU, UNICEF, MIT Media Lab, iMMAP Colombia and the Global Protection Cluster from UNHCR, entitled “Real-Time Monitoring of the Venezuelan Exodus through Facebook’s Advertising Platform” (see publication here). Our visualizations were able to shed light on the demographics as age, gender and education level but also on proxies of wealth and consumer behavior of Venezuelan migrants.

We are pushing forward to deliver many more technical workshops tailored for technical staff in national statistical offices. In the meantime, we welcome you to join our learning community on the Big Data for Sustainable Development Open Learning Hub, a free learning platform developed by Data-Pop Alliance and UNSSC seeking to foster a global community of practitioners by leveraging local expertise and initiatives from data scientists, civil servants, UN experts and members of civil society.

For more information, including the workshop’s agenda, please visit this page.