Technical Workshops

Building data science capacities in statistical teams

About

Designed for professionals who have a role in defining how data can drive social progress from a technical perspective, the workshops aim to introduce attendees to the skills of Data Science needed to work with non-traditional sources of information, e.g. satellite images, Call Detail Records, bank transactional data or web derived data.

The use of Big Data is accelerating within the development and humanitarian practice. If used right, its implementation can foster inclusion, efficiency, and lower project costs which may benefit public and private organizations involved on development programs. Therefore, our courses cover different aspects on data science and data engineering relevant for the context of official statistics and sustainable development.

“High quality level of both the course and the instructors”

Aline Visconti Rodrigues

Analyst, IBGE Brazil

“[I appreciated the] Promptness, cordiality and knowledge of the instructors.”

Gustavo Tavares Lameiro da Costa

Technologist, IBGE Brazil

Target audience

If programming is part of your daily tasks or you lead technical teams, these courses are for you (some programming experience is required; Python is preferable though not necessary).

Methodology

All the programing material is provided in Python using the conventional Open Source libraries for Data Science. Most of the sessions are interactive and on a Jupyter Notebook (.ipynb). A practical exercise is completed at the end of each session.

Format

We offer three different on-site courses of 18-teaching hours distributed along 3 days and designed for 20 participants. These are delivered by a team of 2 training specialists.

Course 1

Web data collection and analysis

About

Whereas the website has interactive components or a fully fledged API, this course will teach you how to programmatically extract data from the web and which models can be applied to draw conclusions from these data. As case-studies, we develop a PPP exchange rate using only real estate rent prices on some countries in Latin America and we study migration flows using Facebook Marketing API.

Testimonials

“The whole course was excellent. It is great having the opportunity to participate in qualifications on modern issues relevant to IBGE’s work.”

Syllabus

A. Collecting through web browser emulator

B. Collecting through API’s

C. Analyzing and visualizing web derived data

A. Collecting through web browser emulator

B. Collecting through API’s

C. Analyzing and visualizing web derived data

Course 2

Machine learning on satellite imagery

About

Skills in Machine Learning (ML) are ubiquitous for 21st century statisticians or Data Scientists. In this course you will be introduced to this field of science by covering the most popular tasks from supervised and unsupervised ML.

Motivated by the fact that remote-sensing imagery is already being used to address development issues, i.e. revealing changes in soil quality or water availability, informing agricultural interventions and even measuring poverty; we structured this course around ML methods which can be applied to satellite imagery, aiming to help statistical teams to leverage this modern and omnipresent data source.

Syllabus

A. Introduction to supervised machine learning

B. Introduction to unsupervised machine learning

C. Case study: Satellite imagery for measuring urban extent (SGD 11.3.1)

A. Introduction to supervised machine learning

B. Introduction to unsupervised machine learning

C. Case study: Satellite imagery for measuring urban extent (SGD 11.3.1)

Course 3

Statistical methods for correcting selection bias

About

Just as questionnaires are the means for observing reality through surveys, electronic platforms have the same role for big data. Most of the big data sources offer a non-probabilistic sample of the population of study, where several errors are induced by self-selection of individuals present on the sample, targeting decisions from the owners of the electronic platform and limitations of the coverage of said platform.

In this course you will learn about the principal techniques for correcting bias through a statistical approach. This work is based on previous research work from Data-Pop Alliance on correcting bias on mobile network data and a fundamental book published by Eurostat.

Syllabus

A. Main challenges of big data as an statistical source

B. Unit-level methods for correcting bias

C. Domain-level methods for correcting bias

A. Main challenges of big data as an statistical source

B. Unit-level methods for correcting bias

C. Domain-level methods for correcting bias

Contact

For more information or questions regarding these courses, please contact us at trainings@datapopalliance.org