Resources & Methodologies

Big data available in the digital ecosystem include an amount of information which goes beyond the human ability to capture, curate, manage, and process information within a tolerable span of time. There are three types of Big Data: unstructured (data that is usually human-generated content on the internet, such as text, pictures, audio and similar); semi-structured (data  generated as the result of the interaction between a man and a machine, for example tags in the ecosystem of a website or database); and structured (clean data in a database, that is usually the result of an automated process).

For the purpose of tracking SDGs and measuring specific Indicators, the main focus is on unstructured data, as this type of data production is increasing in developing countries. Despite the rise in volume of Big Data, the analysis requires a set of techniques and technologies that allows to reveal insights from low volume signals hidden within diverse, complex, massive scale datasets.

Big Data Sources

The main data sources to work with big data:

Provide semi-public profiles of users and can be exploited for data collection and analysis.

Open projects like Wikipedia and OpenStreetMaps which provide a valuable source of data for analysis.

Call records and financial transactions stored in private databases that can be a valuable source of information about users

Locations can be traced by means of the data collected via GPS and mobile phone usage.

Left by web navigation. Examples are IP addresses and cookies, that can be used for tracking users’ behavior.

Learn more

To view more available resources and methodologies to use Big Data in Development, you can refer to the Data-Pop Alliance Toolkit

Methodologies

The main methodologies to work with Big Data in their ecosystem are listed below:

A/B
testing

An operation that consists in the comparison between experimental and control data sets. It is common in research and can be performed by means of machine learning and natural language processing

Business intelligence

Developed and delivered in partnership with the UNSSC Knowledge Centre for Sustainable Development, as a part of its SD Talks Special Series initiative, this webinar series aims to examine the critical role that data can play in achieving sustainable development.

Automation of processes

an operation that consist of training the machine to automatically perform tasks that involve automatic data management or classification. It can be efficiently handled by tensor-based computation such as multilinear subspace learning or Massively Parallel-Processing (MPP)

Databases and storage

a set of tools that optimize search-based applications and infrastructures, like distributed file systems, distributed databases, cloud and High-Performance Computing infrastructure