Posts

On Smart Query Routing: For Distributed Graph Querying with Decoupled Storage

Image
On Smart Query Routing: For Distributed Graph Querying with Decoupled Storage

A summary of the USENIX ATC 2018 research paper by Arijit Khan, Gustavo Segovia, and Donald Kossmann

Background: Distributed Graph Querying & Graph Partitioning
Graphs with millions of nodes and billions of edges are ubiquitous to represent highly interconnected structures including the World Wide Web, social networks, knowledge graphs, genome and scientific databases, medical and government records. To support online search and query services (possibly from many clients) with low latency and high throughput, data centers and cloud operators consider scale-out solutions, in which the graph and its data are partitioned horizontally across cheap commodity servers. We assume that the graph topology and the data associated with nodes and edges are co-located, since they are often accessed together. Keeping with the modern database trends to support low-latency operations, we target a fully in-memory system, and…

Jointly Finding Seeds and Relevant Tags in Social Networks

Image
Jointly Finding Seeds and Relevant Tags in Social Networks for Targeted Influence Maximization
A summary of the SIGMOD 2018 research paper by Xiangyu KE, Arijit KHAN and Gao CONG
[Influence Maximization]
Nowadays online social networks have become an important medium for campaigners to promote their products. It is quite normal that a social network has millions of users, and it is impossible for a campaigner to directly reach out to all of them via advertisements, free samples, or discounted prices. If we see our friends' tweets praising some products, we probably also adopt them. Therefore, it is important to identify a small set of users (with cardinality k) who can influence as many other users as possible in the social network. It is known as the Influence Maximization problem, and has been widely studied [1, 2]. Moreover, a campaigner can further specify her target customers, either explicitly, or via some constraints (e.g., people in age group 20~30), which is referred to as …
Image
Probabilistic Entity Resolution with Imperfect Crowd: PERC
A SUMMARY OF THE CIKM 2017 RESEARCH PAPER and VLDB 2018 DEMOnstration BY VIJAYA KRISHNA YALAVARTHI, XIANGYU KE, AND ARIJIT KHAN
Introduction: In real world scenarios, we encounter many situations where two objects may not look exactly identical, but they represent the same entity.
Example 1: 1. Vijaya Krishna Yalavarthi  2. Yalavarthi Vijaya Krishna  3. Xiangyu Ke  4. Arijit Khan
In the above table, among the four names given, first two names represent the same person - Vijaya Krishna Yalavarthi, though they are not written in an identical way.
Example 2:


Among the three images given, A and B represent Taj Mahal (India), whereas C represents Merlion (Singapore). Though A and B do not look identical, they represent the same entity - Taj Mahal.
What is entity resolution? Entity resolution (ER) [2] is the task of linking and clustering of all records that represent the same real-world entity. ER, also known as record linkage or deduplication…