Here are some data sets in this area often used by researchers:
This page contains links to some network data sets: Les Miserables, Word adjacencie, American College football, Dolphin social network, Political blogs, Books about US politics, Neural network, Power grid, Condensed matter collaborations 1999, Condensed matter collaborations 2003, Condensed matter collaborations 2005, Astrophysics collaborations, High-energy theory collaborations, Coauthorships in network science, Internet
This page contains links to some online social network data sets: Flickr, LiveJournal, Orkut , YouTube
PPLive is a p2p IPTV streaming system, which stands out due to the heterogeneous channels and increasing popularity. This project gathers data about PPLive overlay by crawling the real running PPLive network. Trace containing data on the node degree in the PP Live overlay, overlay structure, channel population size and node session lengths per overlay.
This resource contains traces on Bittorrent systems. You can find data of many thousands torrents containing the hosts (by IP) observed during the entire torrent lifecycle. The IP is the anonymized IP address of the observed peer; also the messages of the downloaders are captured.
A collection of large network datasets: Social networks, Communication networks, Citation networks, Collaboration networks, Web graphs, Blog and Memetracker graphs, Amazon networks, Internet networks, Road networks, Autonomous systems, Signed networks
This anonymized data set consists of the voting records for 3553 stories promoted to the front page over a period of a month in 2009. The voting record for each story contains id of the voter and time stamp of the vote. In addition, data about friendship links of voters was collected from Digg.
Wrappers facilitate access to Web-based information sources by providing a uniform querying and data extraction capability. When wrapper stops working due to changed in the layout of web pages, our task is to automatically reinduce the wrapper. The data sets used for experiments in our JAIR 2003 paper contain web pages downloaded from two dozen sources over a period of a year.
An ideal data set for learning tasks with rich social networking information. Especially suitable for prediction and community detection tasks with ground truth in place to verify your hypotheses. It has link information (i.e., friends), content information (e.g., tags, posts), and label information (i.e., user interests).
Flickr: a photo sharing dataset
It includes more than 35,000 users, with their joined groups, tags. It also includes the friendship and the commentship (i.e., who comments on whose photos) among the set of users. The joined groups can be treated as class labels in classification tasks, or ground truth for community detection tasks.
TWITTER’S SOCIAL GRAPH
We have been sharing a social graph (follow relationships) of Twitter at 2009. 41.7M users and 1.47B relationships are available.
You can download the file via torrent or http (if you cannot use torrent)
We have been sharing metadata of videos uploaded in YouTube at 2006. Over 2M videos’ metadata are available.
COMMUNITY IDENTIFICATION ALGORITHMS & NETWORKS
We have been sharing pointers to existing community identification algorithms that maximize modularity. We also pointers to well-known networks data including Karate, E. coli, WWW, Flickr, Orkut, etc.
Flickr personal taxonomies
This anonymized data set contains personal taxonomies constructed by 7,000+ Flickr users to organize their photos, as well as the tags they associated with the photos. Personal taxonomies are shallow hierarchies (trees) containing collections and their constituent sets (aka photo-albums) and collections.
Internet Topology: AS Graphs
Social network data set (datamob)
University of zurich dataset Mobile Web Standards 2011-05 and The Pirate Bay 2008-12 Dataset
distributed artificial intelligence Laboratory DAI-Labor
Jazz,musicians,network,PGP Alex Arenas Datasets
Center for Advanced Study of Communities and Information CASCI
Networking Group Wiki Page Athina Markopoulou datasets
Face book Emilio Ferrara