Building data science capacities in statistical teams
Designed for professionals who have a role in defining how data can drive social progress from a technical perspective, the workshops aim to introduce attendees to the skills of Data Science needed to work with non-traditional sources of information, e.g. satellite images, Call Detail Records, bank transactional data or web derived data.
The use of Big Data is accelerating within the development and humanitarian practice. If used right, its implementation can foster inclusion, efficiency, and lower project costs which may benefit public and private organizations involved on development programs. Therefore, our courses cover different aspects on data science and data engineering relevant for the context of official statistics and sustainable development.
If programming is part of your daily tasks or you lead technical teams, these courses are for you (some programming experience is required; Python is preferable though not necessary).
All the programing material is provided in Python using the conventional Open Source libraries for Data Science. Most of the sessions are interactive and on a Jupyter Notebook (.ipynb). A practical exercise is completed at the end of each session.
We offer three different on-site courses of 18-teaching hours distributed along 3 days and designed for 20 participants. These are delivered by a team of 2 training specialists.
Web data collection and analysis
Whereas the website has interactive components or a fully fledged API, this course will teach you how to programmatically extract data from the web and which models can be applied to draw conclusions from these data. As case-studies, we develop a PPP exchange rate using only real estate rent prices on some countries in Latin America and we study migration flows using Facebook Marketing API.
“The whole course was excellent. It is great having the opportunity to participate in qualifications on modern issues relevant to IBGE’s work.”
Modern websites usually have interactive components. We focus on using web browser emulators, namely Selenium web driver, for exploiting those components programmatically.
The use-case of this module is a real-estate rental platform, where rent prices are collected.
The collection methods used for this platform are applicable for several e-commerce websites which present a similar catalogue structure for exhibiting their products.
Machine learning on satellite imagery
Skills in Machine Learning (ML) are ubiquitous for 21st century statisticians or Data Scientists. In this course you will be introduced to this field of science by covering the most popular tasks from supervised and unsupervised ML.
Motivated by the fact that remote-sensing imagery is already being used to address development issues, i.e. revealing changes in soil quality or water availability, informing agricultural interventions and even measuring poverty; we structured this course around ML methods which can be applied to satellite imagery, aiming to help statistical teams to leverage this modern and omnipresent data source.
Methods from supervised machine learning are those which have progressed more in both academic and industrial environments. We cover the task of classification, training a model for being able to categorize the observations we give to it. We examine algorithms as Logistic Regression, Support Vector Machines, Gradient Boosted Trees and Neural Networks going though its theoretical basis and applying them in practice.
Statistical methods for correcting selection bias
Just as questionnaires are the means for observing reality through surveys, electronic platforms have the same role for big data. Most of the big data sources offer a non-probabilistic sample of the population of study, where several errors are induced by self-selection of individuals present on the sample, targeting decisions from the owners of the electronic platform and limitations of the coverage of said platform.
In this course you will learn about the principal techniques for correcting bias through a statistical approach. This work is based on previous research work from Data-Pop Alliance on correcting bias on mobile network data and a fundamental book published by Eurostat.
We define a statistical approach to big data and highlight the main challenges and opportunities it presents when trying to use it as a statistical source. We go through the specific difficulties of different big data sources of interest such as mobile network data, bank transactional data and social media among others.