Posts

Maximizing Contrasting Opinions in Signed Social Networks

Maximizing Contrasting Opinions in Signed Social Networks


A summary of the IEEE BigData 2019 research paper by Kaivalya Rawal and Arijit Khan.
Background: A central characteristic of social networks is that it facilitates rapid dissemination of information among large groups of individuals [1]. Online social networks, such as Facebook, Twitter, LinkedIn, Flickr, and Digg are used for spreading ideas and messages. Users’ behaviors and opinions are highly affected by their friends in social networks, which is defined as the social influence. Motivated by various real-world applications, e.g., viral marketing [2], social and political campaigning [3], social influence studies have attracted extensive research attention. The classic influence maximization problem [4], [2] identifies the top-k seed users in a social network such that the expected number of influenced users in the network, starting from those seeds and following an influence diffusion model, is maximized. The budget k on the…

Distance-generalized Core Decomposition

Image
Distance-generalized Core Decomposition

A summary of the SIGMOD 2019 research paper by Francesco Bonchi, Arijit Khan, and Lorenzo Severini
Background: Extracting dense structures from large graphs has emerged as a key graph-mining primitive in a variety of application scenarios, ranging from web mining, to biology, and finance. Many different definitions of dense subgraphs have been proposed (e.g., cliques, n-cliques, n-clans, k-plexes, f-groups, n-clubs, lambda sets), but most of them are NP-hard or at least quadratic. In this respect, the concept of core decomposition is particularly appealing because (i) it can be computed in linear time, and (ii) it is related to many of the various definitions of a dense subgraph and it can be used to speed-up or approximate their computation. The k-core of a graph is defined as a maximal subgraph in which every vertex is connected to at least k other vertices within that subgraph. The set of all k-cores of a graph, for each k, forms its core decom…

In-Depth Comparison of st Reliability Algorithms over Uncertain Graphs

Image
An In-Depth Comparison of s-t Reliability Algorithms over Uncertain Graphs

A summary of the PVLDB 2019 research paper by Xiangyu Ke, Arijit Khan, and Leroy Lim Hong Quan

Uncertain, or probabilistic, graphs have been increasingly used to represent noisy linked data in many emerging applications, and have recently attracted the attention of the database research community [7]. A fundamental problem on uncertain graphs is the s-t reliability, which measures the probability that a target node t is reachable from a source node s in a probabilistic (or uncertain) graph, i.e., a graph where every edge is assigned a probability of existence.  This s - t reliability estimation has been used in many applications such as measuring the quality of connections between two terminals in a sensor network, finding other proteins that are highly probable to be connected with a specific protein in a protein-protein interaction (PPI) network, identifying highly reliable peers containing some file to transfe…
Image
Notes on Expressibility of Vertex Centric Graph Processing Paradigm

A summary of the IEEE BigData 2018 research paper by Siyuan Liu and Arijit Khan

We study the vertex-centric (VC) paradigm for distributed graph processing, popularized by Google’s Pregel system [1]. In Pregel, which was inspired by the Bulk Synchronous Parallel (BSP) model [2], graph algorithms are expressed as a sequence of iterations called supersteps (Figure 1). Each superstep is an atomic unit of parallel computation. During a superstep, Pregel executes a user-defined function for each vertex in parallel. The user-defined function specifies the operation at a single vertex v and at a single superstep S. The supersteps are globally synchronous among all vertices, and messages are usually sent along the outgoing edges from each vertex. In 2012, Yahoo! launched the Apache Giraph [3] as an open-source project, which clones the concepts of Pregel. Since then, there were several attempts to implement many graph algorithms…

On Smart Query Routing: For Distributed Graph Querying with Decoupled Storage

Image
On Smart Query Routing: For Distributed Graph Querying with Decoupled Storage

A summary of the USENIX ATC 2018 research paper by Arijit Khan, Gustavo Segovia, and Donald Kossmann

Background: Distributed Graph Querying & Graph Partitioning
Graphs with millions of nodes and billions of edges are ubiquitous to represent highly interconnected structures including the World Wide Web, social networks, knowledge graphs, genome and scientific databases, medical and government records. To support online search and query services (possibly from many clients) with low latency and high throughput, data centers and cloud operators consider scale-out solutions, in which the graph and its data are partitioned horizontally across cheap commodity servers. We assume that the graph topology and the data associated with nodes and edges are co-located, since they are often accessed together. Keeping with the modern database trends to support low-latency operations, we target a fully in-memory system, and…