Resources & Methodologies
Big data available in the digital ecosystem include an amount of information which goes beyond the human ability to capture, curate, manage, and process information within a tolerable span of time. There are three types of Big Data: unstructured (data that is usually human-generated content on the internet, such as text, pictures, audio and similar); semi-structured (data generated as the result of the interaction between a man and a machine, for example tags in the ecosystem of a website or database); and structured (clean data in a database, that is usually the result of an automated process).
For the purpose of tracking SDGs and measuring specific Indicators, the main focus is on unstructured data, as this type of data production is increasing in developing countries. Despite the rise in volume of Big Data, the analysis requires a set of techniques and technologies that allows to reveal insights from low volume signals hidden within diverse, complex, massive scale datasets.
The main data sources to work with big data:
that provide semi-public profiles of users and can be exploited for data collection and analysis
open projects like wikipedia and openstreetmaps which provide a valuable source of data for analysis
call records and financial transactions stored in private databases that can be a valuable source of information about users
locations can be traced by means of the data collected via GPS and mobile phone usage
left by web navigation. Examples are IP addresses and cookies, that can be used for tracking users’ behavior.
To view more available resources and methodologies to use Big Data in Development, you can refer to the Data-Pop Alliance Toolkit
The main methodologies to work with Big Data in their ecosystem are listed below:
An operation that consists in the comparison between experimental and control data sets. It is common in research and can be performed by means of machine learning and natural language processing
Developed and delivered in partnership with the UNSSC Knowledge Centre for Sustainable Development, as a part of its SD Talks Special Series initiative, this webinar series aims to examine the critical role that data can play in achieving sustainable development.
Automation of processes
an operation that consist of training the machine to automatically perform tasks that involve automatic data management or classification. It can be efficiently handled by tensor-based computation such as multilinear subspace learning or Massively Parallel-Processing (MPP)
Databases and storage
a set of tools that optimize search-based applications and infrastructures, like distributed file systems, distributed databases, cloud and High-Performance Computing infrastructure
For monitoring specific Indicators related to SDGs, the most useful technique to work with Big data is business intelligence, and the best data source is public data from OSNs. In particular we identified Facebook and Twitter as useful sources for the extraction of meaningful information by means of keywords or queries. We also employ a set of techniques to enrich and analyze the data, such as sentiment analysis and topic extraction, and procedures to visualize the meaningful data extracted and enriched, such as word clouds, charts, networks. Following the templates provided by Data-Pop Alliance, we designed a Lab-type project: