Posts

Image
Most Probable Densest Subgraphs A summary of the IEEE International Conference on Data Engineering (ICDE) 2023 research  paper  by Arkaprava Saha, Xiangyu Ke, Arijit Khan, and Cheng Long Background:  The discovery of dense subgraphs has attracted extensive attention in the data management community [1], [2], [3], [4], [5].They may correspond to communities [6], filter bubbles and echo chambers [7], [8] in social networks, brain regions responding to stimuli [9] or related to diseases [10], and commercial value motifs in financial domains [11]. They also have wide applications in graph compression and visualization [12], [13], [14], indexing for reachability and distance queries [15], [16], and social piggybacking [17]. Densest subgraphs usually maximize some notion of density in a given graph, e.g., the edge density [1], defined as the ratio of the number of induced edges to the number of nodes in a subgraph. Although there are an exponential number of subgraphs, a densest subgraph
Image
  Voting-based Opinion Maximization A summary of the IEEE International Conference on Data Engineering (ICDE) 2023 research paper by Arkaprava Saha, Xiangyu Ke, Arijit Khan, and Laks V. S. Lakshmanan Background: We investigate the novel problem of voting-based opinion maximization in a social network. In the presence of competing campaigns, we find a given number of seed nodes for a target campaigner that maximize a voting-based score for the target campaigner at a given time-horizon. Our work bridges two different paradigms: (1) seed selection for opinion formation and diffusion till a given finite time-horizon, and (2) voting-based winning criteria (e.g., plurality, Copeland) with multiple campaigners. The classic influence maximization (IM) problem [1, 2] identifies the top-k seed users in a social network to maximize the expected number of influenced users in the network, starting from those seed nodes and following an influence diffusion model, e.g., independent cascade (IC
Image
Graph Analysis of the Ethereum Blockchain Data: A Survey of Datasets, Methods, and Future Work A summary of the IEEE Blockchain  2022 research  paper   by  Arijit Khan. Objective: Ethereum [1], currently the most actively-used and the second-largest blockchain platform [2], consists of a heterogeneous ecosystem, cohabited by human users, smart contracts (autonomous agents), ether (native cryptocurrency), tokens (digital assets), dApps (decentralized applications), and DeFi (decentralized finance). These key actors in the Ethereum interact with each other via transactions and contract calls. Given the highly connected structure, graph-based modeling is an optimal tool to analyze the data stored in Ethereum blockchain. Data stored in a public blockchain such as in Ethereum can be considered as big data -- Ethereum archive nodes that store a complete snapshot of the Ethereum blockchain, including all the transaction records, take up to 4TB of space [3], thus data analytic methods can
Image
Aggregate Queries on Knowledge Graphs: Fast Approximation with Semantic-aware Sampling A summary of the IEEE ICDE  2022 research  paper   by  Yuxiang Wang, Arijit Khan, Xiaoliang Xu, Jiahui Jin, Qifan Hong, and Tao Fu . Background: Knowledge graphs (KGs), such as DBpedia [1], YAGO [2], Freebase [3], and NELL [4], manage large-scale and real-world facts as big graphs in a schema-flexible manner. The same kind of information can be represented as diverse substructures [5], [6]. This schema-flexible nature should be carefully considered in the study of KG querying. Consider the factoid query [7]: e.g., “Find all cars produced in Germany” (Q117 from QALD-4 benchmark [8]). Given the KG in Figure 1(a), we expect answers as all entities having type Automobile that satisfy the semantic relation product to the specific entity Germany, e.g., Audi TT (u10), BMW 320 (u6), etc. Notice that these correct answers are linked with Germany in structurally different ways in Figure 1(a), for instance, u
Image
  Graph Classification with Minimum DFS Code:  Improving Graph Neural Network Expressivity A summary of the IEEE BigData  2021 research  paper   by  Jhalak Gupta and Arijit Khan  [presented in Machine Learning on Big Data (MLBD 2021), special session of IEEE BigData 2021]. Background: Graph Classification Given a set of graphs with different structures and sizes, the graph classification problem predicts the class labels of unseen graphs [1, 2, 3]. Developing machine learning tools for classifying graphs can be found in cheminformatics [1, 4] and bioinformatics [6], malware detection [7], telecommunication networks, internet-of-things [8], trajectories and social networks [9]. This is challenging because network data contain graphs with different numbers of nodes and edges, and a generic node order is often not available. Graphs do not have regular grid structures, since the neighborhood size of each node differs. The lack of ordered vector representation complicates machine learning