In mathematics and computer science, an algorithm is a series of predefined instructions or rules – often written in a programming language intended for use by a computer – designed to define how to sequentially solve a recurrent problem through calculations and data processing. The use of algorithms for decision-making has grown in several sectors and services such as policing and banking.
The ecosystem created by the concomitant emergence of ‘the 3 Cs of Big Data’:
- Digital Crumbs—pieces of data passively emitted and/or collected by digital devices which constitute very large data sets and streams and contain unique insights about their behaviors and beliefs;
- Big Data Capacities—what has also been referred to as Big Data Analytics, that is the set of tools and methods, hardware and software, know-how and skills, necessary to process and analyze these new kinds of data—including visualization techniques, statistical machine-learning and algorithms, etc;
- Big Data Communities—which describe the various actors involved in the Big Data ecosystem, from the generators of data to their analysts and end-users—i.e. potentially the whole population.
Call Detail Records
The technical name for mobile phone data recorded by all telecom operators. CDRs contain information about the locations of those sending and receiving calls or text messages through operators’ networks, as well as data on time and duration.
A type of technology that enables citizen engagement or makes government more accessible, effective, and efficient for the economic and social good of society. This specific type of technology helps to connect people to resources, ideas, and other people needed to improve their societies or communities.
An object, variable, or piece of information that has the perceived capacity to be collected, stored, and identifiable. It comes largely in two forms: structured and unstructured. Structured data are essentially answers to questions asked by the collector of data, are generally easy to organize and identify and have a strict hierarchy that is not easily manipulated (i.e. responses to a survey organized in a table format and information about people’s years of education and income in a chart). Unstructured data are not readily amenable to automated analysis and often are used in ways that differ from the intended purpose when collected (such as photos, videos, tweets), and do not need to follow a hierarchical method of identification. Data is also used as a policy concept and social phenomena (e.g. “data is changing the world”), or as a shortcut for data ecosystems, Big Data, etc.
Complex adaptive systems that include data infrastructure, tools, media, producers, consumers, curators, and sharers. They are complex organizations of dynamic social relationships through which data/information moves and transforms in flows.
Data that are passively emitted from cell phones, sensors, social media and other platforms as digital translations of human actions and interactions.
The universal ability of people to create, control, access and use data.
A new form of journalism stimulated by the open data movement, in which stories are presented or supplemented through graphics or visualizations of analyzed datasets. These static or interactive graphics include databases, maps, diagrams, grids, charts and many other forms of illustrations that have transformed the look of mainstream news media.
The desire and ability to engage constructively in society through and with data.
Using existing datasets to infer current conditions or predict future outcomes. The process involves resolving complex relationships among datasets in order to understand what data means and how the elements relate.
A term that has become mainstream in the policy and development discourse since the High-Level Panel of Eminent Persons on the Post-2015 Development Agenda called for a “Data Revolution” to “strengthen data and statistics for accountability and decision-making purposes”. It refers to the applications and implications of data as a social phenomenon. The term “Industrial Revolution of Data” was coined by Computer Scientist Joseph Hellerstein in 2008.
A field of research and practice that focuses on solving real-world problems using large amounts of data by combining skills from often distinct areas of expertise: math, computer science (hacking and coding), statistics, social science, and even storytelling or art.
The differential access and ability to use information and communications technologies between individuals, communities and countries — and the resulting socioeconomic and political inequalities.
A false positive or type I error refers to a prediction or conclusion that turns out to be false — for example, a fire alarm going off when there is no fire, or an experiment indicating a medical treatment has worked when it had not. A false negative or type II error refers to cases when a study or a monitoring system fails to identify an event or effect that has occurred. Attempts to predict rare events, such as political revolutions, using increasingly rich data and powerful tools are expected to lead to more false positive than false negative results (also known as over-prediction).
Internal validity refers to the extent to which a causal relationship can be confidently established between two phenomena — a reduction in speed limit and a fall in road deaths, for example. This requires all other factors that may affect the outcome and offer alternative explanations to be taken into account; in this case, this would include a change in drinking habits. External validity refers to the extent to which a study’s conclusions can be confidently generalized to other situations and people. In other words, whether they would hold beyond the area and time for which they were established.
As defined by UNESCO, "the ability to identify, understand, interpret, create, communicate and compute, using printed and written materials associated with varying contexts. Literacy involves a continuum of learning in enabling individuals to achieve their goals, to develop their knowledge and potential, and to participate fully in their community and wider society."
Data that is easily accessible, machine-readable, accessible for free or at negligible cost, and with minimal limitations on its use, transformation, and distribution.
The practice of engaging, empowering and participatory approaches to data-driven presentation and decision-making (R. Bhargava).
Explicitly collected data – the data is collected in the open, with notice, and on purpose. Small Data can be analyzed by interested laymen. Small Data doesn’t depend on technology-assisted analysis, but can engage it as appropriate." (R. Bhargava)
Statistical Machine Learning
A subset of data science, falling at the intersection of traditional statistics and machine learning. Machine learning refers to the construction and study of computer algorithms — step-by-step procedures used for calculations and classification — that can ‘learn’ when exposed to new data. This enables better predictions and decisions to be made based on what was experienced in the past, as with filtering spam emails, for example. The addition of “statistical” reflects the emphasis on statistical analysis and methodology, which is the main approach to modern machine learning.