Internet and historical snapshots
Internet Archive
/ Wayback machine
The Internet Archive offers permanent access for researchers, historians,
scholars, people with disabilities, and the general public to historical
collections that exist in digital format. Founded in 1996, now the Internet
Archive includes texts, audio, moving images, and software as well as archived
Wikipedia
Wikipedia is the most famous cooperatively edited encyclopedia. Since every
change is stored, Web pages' history can offer a detailed subject-based
overview of the most important references of the past.
The Knowledge Centers
A collection of links to other resources for fifinding Web pages as they used
to exist in the past.
Whenago
Whenago provides quick access to historical information about what happened in
the past on a given day.
World Digital Library
The World Digital Library (WDL) makes available on the Internet, free of charge
and in multilingual format, signifificant primary materials from countries and
cultures around the world.
Information retrieval engines
Freebase
Freebase is an open, Creative Commons licensed repository of structured data of
more than 12 million entities. It provides collaborative tools to link entities
together and keep them updated.
Wolfram Alpha
Computational Knowledge Engine
An attempt to compute whatever can be computed about anything. It aims to
provide a single source that can be relied on by everyone for defifinitive
answers to factual queries.
Text mining on the Web
Google Trends
Google Trends shows visual statistics about how often keywords have been searched
on Google over time. Google Trends also shows how frequently topics have
appeared in Google News stories, and in which geographic regions people have
searched for them most.
Google Flu Trends
Google Flu Trends uses aggregated Google search data to estimate flu activity.
Data available for download as well.
The Observatorium
The Observatorium project focuses on complex network dynamics in the Internet,
proposing to monitor its evolution in real-time, with the general objective of
better understanding the processes of knowledge generation and opinion
dynamics.
We Feel Fine
A database of several million human feelings, harvested from blogs and social
pages in the Web. Using a series of playful interfaces, the feelings can be
searched and sorted across a number of demographic slices. Web api available as
well.
CyberEmotions
The CyberEmotions project focuses on the role of collective emotions in
creating, forming and breaking-up ecommunities. It makes available for download
three datasets containing news and comments from the BBC News forum, Digg and
MySpace, only for academic research and only after the submission of an
application form.
Social data sharing
Linked Data
Linked Data is about using the Web to connect related data that was not
previously linked, or using the Web to lower the barriers to linking data
currently linked using other methods.
Dataverse Network Project
The Dataverse Network is an application to publish, share, reference, extract
and analyze research data. It facilitates making data available to others, and
allows to replicate others work. Researchers and data authors get credit,
publishers and distributors get credit, affiliated institutions get credit.
Data360
Data360 is an open-source, collaborative and
free Web site. The site hosts a common and shared database,
which any person or organization, committed to neutrality and non-partisanship
(meaning let the data speak), can use for presentations and visualizations.
Swivel
Swivel is a web site where people share reports of charts and numbers. It is
free for public data, and charges a monthly fee to people who want to use it in
private.
Many
Eyes
A IBM initiative that allows users to upload their datasets and use a
collection of tools to obtain meaningful visualizations from them. Each
visualization is publicly stored on a dedicated page, where users can comment,
rate and tag it. Reuse of the data is possible and encouraged.
Conflict data
CSCW
Data on Armed Conflict
CSCW and Uppsala Conflict Data Program (UCDP) at the Department of Peace and
Conflict Research, Uppsala University, have collaborated in the production of a
dataset of armed conflicts, both internal and external, in the period 1946 to
the present. Currently, probably the most extensive dataset repository
available, in particular for historic data.
WarViews
The aim of the WarViews project is to create an easy-to-use front-end for the
exploration of GIS data on conflict. It can run on a Web browser or it can be
displayed using Google Earth.
The following are civil war specifific datasets with additional empirical
information:
Ethnic group
location dataset
Ethnic power
balances dataset
Collection of updated datasets and codebooks from the Uppsala
Conflict Data Program (UCDP).
ACLED
Partially contained in the PRIO dataset, ACLED (Armed Conflict Location and
Events Dataset) is designed for disaggregated conflict analysis and crisis
mapping. This dataset codes the location of all reported conflict events in 50
countries in the developing world. Data are currently being coded from 1997 to
2009 and the project continues to backdate conflict information for African
states to the year of independence.
CERAC
The Conflict Analysis Resource Center hosts several cross country conflict data
sets and a few datasets of particular countries. Repositories also have
datasets of political instability and conflict.
The
Cross-National Time-Series Data Archive
The Cross-National Time-Series Data Archive provides annual data for a range of
countries from 1815 to the present. Frequently cited, it is one of the \leading
datasets on political violence", according to Robert Bates at
Harvard University. It is \possibly the most widely used
event dataset" according to Henrik Urdal, International Peace Research
Institute, Oslo (PRIO).
Country specifific repositories:
Iraq,
Afghanistan
Collection
of datasets of terrorist acts.
Data in economics and finance
Bloomberg
International real-time data provider for decision makers in finance, business
and government.
Maddison Data
Historical statistics about GDP and population data.
UNCTAD Statistics
The UNCTAD Handbook of Statistics on-line provides time
series of economic data and development indicators, in some cases going back as
far as 1950; the Commodity Price Statistics Online Database; the UNCTAD-TRAINS
on the Internet (Trade Analysis and Information System) for trade control
measures as well as import flows by origin for over 130 countries; the Foreign
Direct Investment database (FDI).
OECD Statistics Portal
Large collection of datasets covering economics, demographics. Extractions are
freely available, full access requires subscription.
EUROSTAT
Detailed statistics on the EU and candidate countries, and various statistical
publications for sale.
Where's George?
Spatial tracking system for U.S. and Canadian dollars.
Eurobilltracker
Spatial tracking system for Euro banknotes.
Scientifific collaboration data
ISI Web of Knowledge
Comprehensive source of information in the sciences, social sciences, arts, and
humanities. It encompasses several datasets, among which the following are
maybe the most noteworthy:
Journal Citation Reports. It allows one to evaluate and compare journals using
citation data drawn from over 7,500 scholarly and technical journals;
Web of Science. It consists of seven databases containing information gathered
from thousands of scholarly journals, books, book series, reports, conferences,
and more.
Google Scholar
Google Scholar is search engine specialized in scholarly literature. It indexes
different sources (articles, books, abstract, thesis, etc.) from several
disciplines and sorts them according to number of citations, author and
journal impact factor.
Scholarometer
Scholarometer is a social tool to facilitate citation analysis and help
evaluate the impact of an author's publications. It works as a software plug-in
for the
Firefox browser.
Scopus
Scopus is a very large abstract and citation database of research literature.
It is available only for registered users.
Living Science
Living Science is a real time global science observatory based on publications
submitted to arXiv.org. It covers real time (daily) submissions of publications
in areas as diverse as Physics, Astronomy, Computer Science, Mathematics and
Quantitative Biology. Currently, contents are dynamically updated each day.
Living Science is a powerful analysis tool to identify the magnitude and impact
of scientifific work worldwide.
Social sciences
ICPSR
of the University of Michigan
ICPSR offers more than 500,000 digital fifiles containing social science
research data. Disciplines represented include political science, sociology,
demography, economics, history, gerontology, criminal justice, public health,
foreign policy, terrorism, health and medical care,
early education, education, racial and ethnic minorities,
psychology, law, substance abuse and mental health, and more.
UK Data Center of
the University of Essex
The UK's largest collection of digital research data in the social sciences and
humanities.
Berkeley's
UC DATA Archive
UC DATA's data holdings are primarily in the areas of Political, Social and
Health Sciences.
The Economic and Social
Data Service (ESDS)
The Economic and Social Data Service (ESDS) is a national data service
providing access and support for an extensive range of key economic and social
data, both quantitative and qualitative, spanning many disciplines and themes.
It contains a map of additional datasets from several European countries.
CESSDA
Wide data collections including sociological surveys, election studies,
longitudinal studies, opinion polls, and census data. Among the materials are
international and European data such as the European Social Survey, the
Eurobarometers, and the International Social Survey Programme.
Gapminder Data
Gapminder is a popular technology and Web application for cross-visualisation
of trends in time series of data. It also opens an archive of multiple datasets
on diverse socio-economic indicators.
World Value
Survey
The World Value Survey provides data about values and cultural changes in
societies all over the world.
Urban data
Global Urban
Observatory database
The Global Urban Observatory (GUO) offers policy-oriented urban indicators,
statistics and other urban information.
Urban
Observatory
U.S. based datasets about wealth, innovation and crime across cities.
Traffic data
NGSIM
The Next Generation Simulation (NGSIM) program was initiated by the United
States Department of Transportation (US DOT). The program developed a core of
open behavioral algorithms in support of traffic simulation, and collected
high-quality primary trac and trajectory data intended to support the research
and testing of the new algorithms.
Swiss Federal Roads Office FEDRO
The Swiss Federal Roads Office offers a comprehensive overview on traffic flows
in Switzerland. Data are collected by permanent automatic traffic counting
stations and complemented by regular manual checking since 1961.
TrafficData
The aim of the International Traffic Database (ITDb) project is to provide
traffic data to various groups (researchers, practitioners, public entities) in
a format according to their particular needs, ranging from raw measurement data
to statistical analysis. ITDb promotes a flexible traffic data provision format
based on user needs and standard habits.
Clearing House for Transport Data
The Clearing House for Transport Data in the German Aerospace Center is the
fifirst point of contact for a quick overview of the available data. It is
targeted at both organizations who gather transport-relevant data and those who
wish to use the results of such research. The information offered includes the
preparation of detailed metadata on the data sets, as well as notes on possible
uses and sources.
Desweiteren das
Regiolab Delft
The regiolab-delft initiative started just after 2000 as a joint project led by
TU Delft in association with the Municipality of Delft, the TRAIL research
school, the Province of South Holland, the Ministry of Transport and several
industrial partners. The archived dataset consists of over 6 years of 1 minute
averaged speed and aggregate flow data from densely spaced inductive loops on
the freeway network in the province of south Holland and other data from
intersection controllers, license plate detection camera's and much more.
RITA
The Research and Innovative Technology Administration (RITA) of the U.S. Department
of Transportation offers several datasets about maritime, freights, airline,
passengers, etc. traffic statistics.
ETH
Travel Data Archive (ETHTDA)
The ETH Travel Data Archive (ETHTDA) is a virtual platform allowing end users
to browse the archived travel data over the Web and enabling simple statistical
analysis.
Metropolitan Travel
Survey Archive
The Metropolitan Travel Survey Archive to store, preserve, and make publicly
available, via the Internet, travel surveys conducted by metropolitan areas,
states and localities.
Infoblu
Infoblu is a private company providing real-time traffic monitoring services
for Italy. All services are available for a fee.
Open maps
Google Maps
World-famous map service. It offers several additional services such as: Street
View, user-uploaded content (photos, comments and ratings) and personalized
overlays through service apis.
OpenStreetMap
OpenStreetMap (by UCL) is a free editable map of the whole world. OpenStreetMap
allows you to view, edit and use geographical data in a collaborative way from
anywhere on Earth.
Tracksource Brasil
Tracksource is a collaborative project aimed at creating and distributing for
free maps of Brasil.
Logistics data
National Household Travel Survey
The National Household Travel Survey (NHTS) collect data on both long-distance
and local travel by the American public. The joint survey gathers trip-related
data such as mode of transportation, duration, distance and purpose of trip. It
also gathers demographic, geographic, and economic data for analysis purposes.
It is part of RITA.
Commodity Flow Survey
The Commodity Flow Survey (CFS) is the primary source of national and
state-level data on domestic freight shipments by American establishments in
mining, manufacturing, wholesale, auxiliaries, and selected retail industries.
Data are provided on the types, origins and destinations, values, weights,
modes of transport, distance shipped, and ton-miles of commodities shipped. It
is part of RITA and it is conducted every fifive years (last sampling on 2007).
Climate data
Julich
Climate data from Julich Research Center.
Google.org
Google introduces its data-driven philanthropic projects, among which two
environmental satellite observatories:
the Earth Engine: for monitoring trends in world deforestation;
the Crisis Response: for monitoring the oil spill from the Deep Horizon sank
platform.
Reality mining
Reality Mining
Behavioral data collected from 100 mobile phones over 9 months. Includes both
proximity and phone usage statistics. Two anonymized datasets available: single
user (MySQL) and global (Matlab).
Other open data initiatives
Data.gov
Wide collection of public US datasets available for research.
Data.gov.uk
Wide collection of public UK datasets available for research.
Digging Into Data
Launched by the National Science Foundation (NSF), it offers a collection of
diverse data repositories.
Guardian
Data Blog
Data journalism initiative that posts public interest (primarily UK relevant)
datasets together with their analysis. A few collaborations with data
visualization artists are present as well.
Google Public
Data
Google offers several large datasets on diverse world socio-economic indicators
and provides tools for easy visualization.