Resources & Methodologies

Big data available in the digital ecosystem include an amount of information which goes beyond the human ability to capture, curate, manage, and process information within a tolerable span of time. There are three types of Big Data: unstructured (data that is usually human-generated content on the internet, such as text, pictures, audio and similar); semi-structured (data  generated as the result of the interaction between a man and a machine, for example tags in the ecosystem of a website or database); and structured (clean data in a database, that is usually the result of an automated process).

For the purpose of tracking SDGs and measuring specific Indicators, the main focus is on unstructured data, as this type of data production is increasing in developing countries. Despite the rise in volume of Big Data, the analysis requires a set of techniques and technologies that allows to reveal insights from low volume signals hidden within diverse, complex, massive scale datasets.

 

Big Data Sources

The main data sources to work with big data:

Online Social Networks (OSNs)

that provide semi-public profiles of users and can be exploited for data collection and analysis

Crowd-sourced data

open projects like wikipedia and openstreetmaps which provide a valuable source of data for analysis

CDRs & transactions

call records and financial transactions stored in private databases that can be a valuable source of information about users

Mobile & GPS data

locations can be traced by means of the data collected via GPS and mobile phone usage

Digital traces

left by web navigation. Examples are IP addresses and cookies, that can be used for tracking users’ behavior.

Learn more

To view more available resources and methodologies to use Big Data in Development, you can refer to the Data-Pop Alliance Toolkit

Methodologies

The main methodologies to work with Big Data in their ecosystem are listed below:

A/B testing

An operation that consists in the comparison between experimental and control data sets. It is common in research and can be performed by means of machine learning and natural language processing

Business intelligence

Developed and delivered in partnership with the UNSSC Knowledge Centre for Sustainable Development, as a part of its SD Talks Special Series initiative, this webinar series aims to examine the critical role that data can play in achieving sustainable development.

Automation of processes

an operation that consist of training the machine to automatically perform tasks that involve automatic data management or classification. It can be efficiently handled by tensor-based computation such as multilinear subspace learning or Massively Parallel-Processing (MPP)

Databases and storage

a set of tools that optimize search-based applications and infrastructures, like distributed file systems, distributed databases, cloud and High-Performance Computing infrastructure

For monitoring specific Indicators related to SDGs, the most useful technique to work with Big data is business intelligence, and the best data source is public data from OSNs. In particular we identified Facebook and Twitter as useful sources for the extraction of meaningful information by means of keywords or queries. We also employ a set of techniques to enrich and analyze the data, such as sentiment analysis and topic extraction, and procedures to visualize the meaningful data extracted and enriched, such as word clouds, charts, networks. Following the templates provided by Data-Pop Alliance,  we designed a Lab-type project:

purpose

Create and automatically manage a data-driven monitoring platform to ensure easy-to-access knowledge about information pertaining to the public sector

resources

Technical: platform/dashboard, batch dataset; People: Data scientists, governance specialists, Data: public data from OSNs

design

User-centered design and Interface; Public benefit rationale; Transparency and accountability