Sampling from Large Networks

Our researches in this area are focused on making inference from partially-observed complex network structures. Most of our current work is about understanding the strengths and limitations of data sampled with link-tracing designs such as snowball sampling, and random walk sampling.

Ongoing Projects

Sampling from Activity Networks

Recently researchers introduced a new concept that called activity network to differentiate between strong and weak edges. Activity network that exactly is made over the underlying network shows the real relationship between nodes and has different statistical characteristics from the underlying network. Using activity network will be so effective for analyzing the network and will remove some errors. In this project we will study sampling of complex network by considering activity network. The effect of this method will be examined by simulation or theoretical approaches on real datasets.

People involved: Ali Khodadadi, Mostafa Salehi, Hamid R. Rabiee

___________________________________

Past Projects

Sampling from Complex Networks with high Community Structures

To analyze a large-scale complex network where there is no sampling frame, such as online social interactions, one should study the collected network data by link-tracing based sampling methods. However, recent research indicates that community structures in a network, densely connected groups of nodes with only sparser connections between groups, lead to bias in network analysis. In this paper, we focus on this problem.

People involved: Mostafa Salehi, Arezo Rajabi Hamid R. Rabiee

___________________________________

Sampling from Directed Large Complex Networks

Despite a considerable amount of research recent studies on the characterization of complex networks, little attention has been given to developing tools that can characterize directed networks. An network is directed when the links between its nodes may not be reciprocated. In this work we propose an importance sampling estimator to estimate network characteristics.

People involved: Mostafa Salehi, Hamid R. Rabiee

___________________________________

Model-based Data gathering for Online Social Network Analysis

Online Social Networks (OSNs) with hundreds of millions of members, such as Twitter and Facebook, are considered as a powerful tool for conducting information flow. So studying and analyzing various aspects of these networks is very important and recently considered by many researchers.Since these OSNs are so large with a lot of users and relations among them, collecting complete data wouldn’t be possible. Therefore, network analysis is based on incomplete data collected by link-tracing sampling methods. In a new approach to data collection from network which has been proposed recently, is assumed that network graph complies from a specific growth model and an estimator is presented according to the model. In this project, we aim to collect data from OSNs based on this approach. For this purpose we will focus on sparse models too. The effect of various graph traversal methods on structural properties of network will be examined by simulation or theoretical approaches on real data sets.

People involved: Nasim Nabavi, Mostafa Salehi, Hamid R. Rabiee

___________________________________

Diffusion-Aware Sampling and Estimation in Information Diffusion Networks

partially-observed data collected by sampling methods is often being studied to obtain the characteristics of information diffusion networks. However, these methods usually do not consider the behavior of diffusion process. In this paper, we propose a novel two-step (sampling/estimation) measurement framework by utilizing the diffusion process characteristics. To this end, we propose a link-tracing based sampling design which uses the infection times as local information without any knowledge about the latent structure of diffusion network. To correct the bias of sampled data, we introduce three estimators for different categories; link-based, node-based, and cascade-based. To the best of our knowledge, this is the first attempt to introduce a complete measurement framework for diffusion
networks.

People involved: Motahareh Eslami Mehdiabadi, Mostafa Salehi, Hamid R. Rabiee

___________________________________

The Impact of Sampling Methods on Cooperativity Analysis in Complex Networks

Complex networks science is the science of modeling real-world systems for better analyzing them. Cooperation between selfish individuals is one of the most interesting collective phenomena in complex networks. Many researchers have considered game theory as a powerful tool in this field and games such as the Prisoner’s Dilemma (PD) have been taken as metaphor to investigate the evolution of cooperation. However running these algorithms on large networks is very time consuming. A natural approach is therefore to simplify the systems by sampling from networks and decreasing their size. In this research we address the impact of sampling methods on the analysis of cooperativity in a number of real-world networks including Online Social Network, Net-Science co-authorship, email communication, yeast-protein interactions and some networks with small-world and scale-free properties. To this end, the PD game is considered and the differences between cooperativity in main and sampled networks is obtained by KS test. We found that Sampling has an undesirable impact on cooperation analysis of a network, specially in real world social networks.

People involved: Mostafa Salehi, Fouzhan Forotan, Hamid R. Rabiee

___________________________________

Sampling Techniques for Characterizing Twitter

communication of people has changed and now its possible to communicate on online social networks. twitter is an online social network with micro blogging service which enables its users to post their statuses via website or cell phones or some other applications.these posts are visible in user’s page and his followers.thus, this network let people communicate and freely distribute their information. by the time, Twitter has about 25 million users all around the world and is spreading. In this project, we extract several sets of data by several algorithms from twitter and study some characteristics of a network on them, such as degree distribution, diameter,etc.

People involved: Mostafa Salehi, Nasim Nabavi, Shayan Pooya, Hamid R. Rabiee

Leave a Reply

Your email address will not be published. Required fields are marked *

You may use these HTML tags and attributes: <a href="" title=""> <abbr title=""> <acronym title=""> <b> <blockquote cite=""> <cite> <code> <del datetime=""> <em> <i> <q cite=""> <strike> <strong>

Complex Networks | Digital Media Lab