Neighborhood-based Hypergraph Core Decomposition

A summary of the PVLDB 2023 research paper by Naheed Anjum Arafat, Arijit Khan, Arpit Kumar Rai, and Bishwamittra Ghosh

Background: Many real-world relations consist of polyadic entities, e.g., relations between individuals in co-authorships [1], legislators in parliamentary voting [2], items in e-shopping carts [3], proteins in protein complexes, and metabolites in a metabolic process [4, 5]. In these scenarios, the hypergraph is a natural data model where an edge may connect more than two entities. Recently, there has been a growing interest in hypergraph data management [6, 7, 8, 9, 10, 11].

We study neighborhood-based hypergraph core decomposition (Figure 1(a)): a novel way of decomposing hypergraphs into hierarchical neighborhood-cohesive subhypergraphs. It decomposes a hypergraph into nested, strongly-induced maximal subhypergraphs such that all the nodes in every subhypergraph have at least a certain number of neighbors in that subhypergraph. Being strongly-induced means that a hyperedge is only present in a subhypergraph if and only if all its constituent nodes are present in that subhypergraph.

Example. The hypergraph 𝐻 in Figure 1(a) is constructed based on four events (𝑇, 𝑅, 𝑃1, 𝑃2) from the Nashville Meetup Network dataset [12]. Each hyperedge denotes an event, nodes in a hyperedge are participants of that event. Two nodes are neighbors if they co-participate in an event. We notice that Hall has 4 neighbors: Popiden, Hagler, Wallace, and Matson. Similarly, Matson, Hagler, Jeannie, Sumner, Annie, Newton, Wallace, and Popiden have 13, 10, 9, 9, 6, 6, 6, and 6 neighbors, respectively. As every node has ≥ 4 neighbors, the neighborhood based 4-core, denoted by 𝐻[𝑉4], is the hypergraph 𝐻 itself. The neighborhood based 6-core is the subhypergraph 𝐻[𝑉6] = {𝑇, 𝑅} because participants of 𝑇 (e.g., Newton, Annie) and 𝑅 (e.g., Matson) respectively have 6 and 7 neighbors in 𝐻[𝑉6]. Finally, the neighborhood based 7-core is the subhypergraph {𝑇}.

Earlier, core decomposition of graphs is shown to be an important tool for solving many graph data management problems, e.g., community detection [13], densest subgraph discovery [14], identifying influential nodes [22], and network visualization [15, 16].

Why Existing Approaches Do Not Work? Alternative approaches to decomposing hypergraphs, e.g., reduction to clique or bipartite graphs, are not meaningful in certain applications, the later also results in inefficient decomposition, while existing degree-based hypergraph decomposition does not distinguish nodes with different neighborhood sizes.

Existing Approach-1. Transform the hypergraph into other objects (e.g., a graph), apply existing decomposition approaches [17, 18] on that object, and then project the decomposition back to the hypergraph. For instance, a hypergraph is transformed to a clique graph by replacing the hyperedges with cliques, and classical graph core decomposition [17] is applied. A hypergraph can also be transformed into a bipartite graph by representing the hyperedges as nodes in the second partition and creating an edge between two cross-partition nodes if the hyperedge in the second partition contains a node in the first partition. Finally, distance-2 core decomposition [18, 19] is applied. In distance-2 core-decomposition, nodes in π‘˜-core have at least π‘˜ 2-hop neighbors in the subgraph. First, the decomposition they yield may be different from that of ours: 𝑐ore(π‘ƒπ‘œπ‘π‘–π‘‘π‘’π‘›) = 5 in both clique graph and dist-2 bipartite graph decompositions, whereas Popiden has core number 4 in our decomposition. Clique graph and bipartite graph decompositions fail to identify that the relation Popiden-Hagler should not exist without the existence of event 𝑃2, since 𝑃2 is the only event where they co-participate. Second, the resulting decomposition maybe unreasonable: low-importance events 𝑃1 and 𝑃2 by the same interest group are placed in different cores causing difficulty in separating less-important events from more-important ones.  Third, such transformations inflate the problem size. A hypergraph in [20] with 2M nodes and 15M hyperedges is converted to a bipartite graph with 17M nodes and 1B edges. A π‘˜-uniform hypergraph with π‘š hyperedges causes its clique graph to have O(π‘šπ‘˜²) edges. The bipartite graph representation also requires distance-2 core decomposition [18, 19], which is more expensive due to inflated problem-size.

Existing Approach-2. The degree 𝑑(𝑣) of a node 𝑣 in hypergraph 𝐻 is the number of hyperedges incident on 𝑣, i.e., 𝑑(𝑣) = |{𝑒∈𝐸|𝑣∈𝑒}|. Sun et al. [21] define the deg-π‘˜-core of a hypergraph: In degree-based core decomposition, every node in the π‘˜-core has degree at least π‘˜ in that core. It does not consider hyperedge sizes. As a result, nodes in the same core may have vastly different neighborhood sizes. For instance, Hall and Matson have 4 and 13 neighbors, respectively, yet they belong to the same 1-core in the degree-based decomposition (Figure1(b)). There are applications, e.g., propagation of contagions in epidemiology, diffusion of information in viral marketing, where it is desirable to capture such differences because nodes with the same number of neighbors in a subhypergraph are known to exhibit similar diffusion characteristics [23]. Indeed, from a practical viewpoint, if the infectious diseases control authority is looking for some key events that cause higher infection spread (e.g., meeting with 6 neighbors per participant) and hence such events need to be intervened, they are events {𝑇, 𝑅} (nbr-6-core) as they cause meeting with at least 6 neighbors per participant. Such distinction across various events and participants is not possible according to degree-based core decomposition of 𝐻. Moreover, the neighborhood-based decomposition is also logical: innermost core contains 𝑇, which is a tech event organized by a relatively popular group with 918 members, followed by the second innermost core containing {𝑅, 𝑇}. 𝑅 is an event organized by a secular organization with 525 members which often discusses religion matters. The outermost core contains two events held by a niche social activist group with only 185 members.

Challenges. A hyperedge can relate to more than two nodes, and a pair of nodes maybe related by multiple, distinct hyperedges. Thus, a trivial adaptation of core-decomposition algorithms for graphs to hypergraphs is difficult. In the classic peeling algorithm for graph core decomposition, when a node is removed, the degree of its neighbors is decreased by 1: this allows important optimizations which makes the peeling algorithm linear-time and efficient [17, 24, 25]. However, in a neighborhood-based hypergraph core decomposition, deleting a node may reduce the neighborhood size of its neighboring node by more than 1. Hence, to recompute the number of neighbors of a deleted node’s neighbor, one must construct the residual hypergraph after deletion, which makes the decomposition polynomial-time and expensive. Furthermore, the existing lower bound that makes graph core decomposition more efficient [18] does not work for neighborhood-based hypergraph core decomposition. In the following, we also discuss the challenges associated with adopting the local approach [26], one of the most efficient methods for graph core decomposition to hypergraphs.

In a local approach, a core-number estimate is updated iteratively [27, 26] or in a distributed manner [28] for every node in a graph. The initial value of a node’s core-number estimate is an upper bound of it score-number. In subsequent rounds, this estimate is iteratively decreased based on estimates of neighboring nodes. [26] uses β„Ž-index [29] for such update. They have shown that the following invariant must hold: every node with core-number π‘˜ has β„Ž-index at least π‘˜, and the subgraph induced by nodes with β„Ž-index at least π‘˜ has at least π‘˜ neighbors per node in that subgraph. The former holds but the later may not hold in a hypergraph, because the subhypergraph induced by nodes with β„Ž-index π‘˜ may not include hyperedges that partially contain other nodes. Due to those ‘missing’ hyperedges, the number of neighbors of some nodes in that subhypergraph may drop below π‘˜, violating the coreness condition. Thus, a local approach used for computing the π‘˜-core [27, 26, 28] or(π‘˜β„Ž)-core [19] in graphs results in incorrect neighborhood-based hypergraph cores.

Algorithms: We make three main contributions.

Exact algorithms. We propose three exact algorithms with their correctness and time-complexity analyses. Two of them, Peel and its enhancement E-Peel, adopt classic peeling [17], incurring global changes to the hypergraph. For E-Peel, we derive novel lower-bound on core-number that eliminates many redundant neighborhood re-computations. Our third algorithm, Local-core only makes node level local computations. Even though the existing local method [26, 28] fails to correctly find neighborhood-based core-numbers in a hypergraph, our algorithm Local-core applies a novel Core-correction procedure, ensuring correct core-numbers.

Optimization and parallelization strategies. We propose four optimization strategies to improve the efficiency of Local-core. Compressed representations for hypergraph and the family of optimizations for efficient Core-correction are novel to core-decomposition literature. The other optimizations, though inspired from graph literature, have not been adopted in earlier hypergraph-related works. We also propose parallelization of Local-core for the shared-memory programming paradigm.

(Neighborhood, degree)-core. We define a more general hypergraph-core model, (neighborhood, degree)-core by considering both neighborhood and degree constraints, propose its decomposition algorithm Local-core+Peel, and demonstrate its superiority in diffusion spread.

Experiments and applications: Our empirical evaluation on real and synthetic hypergraphs shows that the proposed algorithms are effective, efficient, and practical. Our OpenMP parallel implementation Local-core(P) decomposes hypergraph with 27M nodes, 17M hyperedges in 91 seconds.

We show our decomposition to be more effective in disrupting diffusion than other decompositions. Our greedy algorithm proposed for volume-densest subhypergraph recovery achieves (π‘‘π‘π‘Žπ‘–π‘Ÿ(π‘‘π‘π‘Žπ‘Ÿπ‘‘−2)+2)-approximation, where hyperedge cardinality (#nodes in that hyperedge) and node-pair co-occurrence (#hyperedges containing that pair) are at most π‘‘π‘π‘Žπ‘Ÿπ‘‘ and π‘‘π‘π‘Žπ‘–π‘Ÿ, respectively. If the hypergraph is a graph (π‘‘π‘π‘Žπ‘Ÿπ‘‘=2), our result generalizes Charikar’s 2-approximation guarantee for densest subgraph discovery [14]. Our volume-densest subhypergraphs capture differently important meetup events compared to degree and clique graph decomposition-based densest subhypergraphs.

For more information, click here for our paper.

References:

[1] Yi Han, Bin Zhou, Jian Pei, and Yan Jia. 2009. Understanding importance of collaborations in co-authorship networks: a supportiveness analysis approach. In SIAM International Conferenceon Data Mining (SDM). 1112–1123.

[2] Austin R. Benson, Rediet Abebe, Michael T. Schaub, Ali Jadbabaie, and Jon Kleinberg. 2018. Simplicial closure and higher-order link prediction. Proceedings of the National Academy of Sciences 115, 48(2018), E11221–E11230.

[3] Xin Xia, Hongzhi Yin, Junliang Yu, Qinyong Wang, Lizhen Cui, and Xiangliang Zhang. 2021. Self-supervised hypergraph convolutional networks for session based recommendation. In AAAI Conference on Artificial Intelligence. 45034511.

[4] Christoph Flamm, BΓ€rbel M. R. Stadler, and Peter F. Stadler. 2015. Generalized topologies: hypergraphs, chemical reactions, and biological evolution. In Advances in Mathematical Chemistry and Applications. Bentham Science, 300–328.

[5] Thomas Gaudelet, NoΓ«l Malod-Dognin, and Natasa Przulj. 2018. Higher-order molecular organization as a source of biological function. Bioinformatics 34, 17 (2018), i944–i953.

[6] Pit Fender and Guido Moerkotte. 2013. Counterstrike: generic top-down join enumeration for hypergraphs. Proc. VLDB Endow. 6, 14 (2013), 1822–1833.

[7] Igor Kabiljo, Brian Karrer, Mayank Pundir, Sergey Pupyrev, Alon Shalita, Yaroslav Akhremtsev, and Alessandro Presta. 2017. Social hash partitioner: a scalable distributed hypergraph partitioner. Proc. VLDB Endow. 10, 11 (2017), 1418–1429.

[8] Geon Lee, Jihoon Ko, and Kijung Shin. 2020. Hypergraph motifs: concepts, algorithms, and discoveries. Proc. VLDB Endow. 13, 11 (2020), 2256–2269.

[9] Julian Shun. 2020. Practical parallel hypergraph algorithms. In The ACM SIGPLAN Symposium on Principles and Practice of Parallel Programming. 232–249.

[10] Justin Sybrandt, Ruslan Shaydulin, and Ilya Safro. 2022. Hypergraph partitioning with embeddings. IEEE Trans. Knowl. Data Eng. 34, 6 (2022), 2771–2782.

[11] Joyce Jiyoung Whang, Rundong Du, Sangwon Jung, Geon Lee, Barry L. Drake, Qingqing Liu, Seonggoo Kang, and Haesun Park. 2020. MEGA:multi-view semi-supervised clustering of hypergraphs. Proc. VLDB Endow. 13, 5 (2020), 698–711.

[12] Stephen Bailey. 2017. Nashville meetup network. https://www.kaggle.com/datasets/stkbailey/nashville-meetup .

[13] Irene Malvestio, Alessio Cardillo, and Naoki Masuda. 2020. Interplay between k-core and community structure in complex networks. Scientific Reports 10, 14702 (2020).

[14] Moses Charikar. 2000. Greedy approximation algorithms for finding dense components in a graph. In Approximation Algorithms for Combinatorial Optimization, Third International Workshop (Lecture Notes in Computer Science), Vol. 1913. Springer,84–95.

[15] J. Ignacio Alvarez-Hamelin, Luca Dall’Asta, Alain Barrat, and Alessandro Vespignani. 2005. Large scale networks fingerprinting and visualization using the k-core decomposition. In Advances in Neural Information Processing Systems (NeurIPS). MITPress, 41–50.

[16] Vladimir Batagelj, Andrej Mrvar, and MatjaΕΎ ZaverΕ‘nik. 1999. Partitioning approach to visualization of large graphs. In Graph Drawing (Lecture Notes in Computer Science), Vol.1731.

[17] Vladimir Batagelj and MatjaΕΎ ZaverΕ‘nik. 2011. Fast algorithms for determining (generalized) core groups in social networks. Advances in Data Analysis and Classification 5, 2 (2011), 129–145.

[18] Francesco Bonchi, Arijit Khan, and Lorenzo Severini. 2019. Distance-generalized core decomposition. In The International Conference on Management of Data (SIGMOD). ACM, 1006–1023.

[19] Qing Liu, Xuliang Zhu, Xin Huang, and Jianliang Xu. 2021. Local algorithms for distance-generalized core decomposition over large dynamic graphs. Proc. VLDB Endow. 14, 9 (2021), 1531–1543.

[20] Jaewon Yang and Jure Leskovec. 2015. Defining and evaluating network communities based on ground-truth. Knowledge and Information Systems 42, 1 (2015), 181–213.

[21] Bintao Sun, T. -H. Hubert Chan, and Mauro Sozio. 2020. Fully dynamic approximate k-core decomposition in hypergraphs. ACM Trans. Knowl. Discov. Data 14, 4 (2020), 39:1–39:21.

[22] Fragkiskos D. Malliaros, Maria-Evgenia G. Rossi, and Michalis Vazirgiannis. 2016. Locating influential nodes in complex networks. Scientific Reports 6, 19307 (2016).

[23] Maksim Kitsak, Lazaros K. Gallos, Shlomo Havlin, Fredrik Liljeros, Lev Muchnik, H. Eugene Stanley, and HernΓ‘n A. Makse. 2010. Identification of influential spreaders in complex networks. Nature physics 6, 11 (2010), 888–893.

[24] James Cheng, Yiping Ke, Shumo Chu, and M. Tamer Γ–zsu. 2011. Efficient core decomposition in massive networks. In IEEE International Conference on Data Engineering (ICDE). 51–62.

[25] Ahmet Erdem SarΓ­yΓΌce, Bu˘gra Gedik, Gabriela Jacques-Silva, Kun-Lung Wu, and Ümit V. Γ‡atalyΓΌrek. 2013. Streaming algorithms for k-core decomposition. Proc. VLDB Endow. 6, 6 (2013), 433–444.

[26] Linyuan LΓΌ, Tao Zhou, Qian-Ming Zhang, and H. Eugene Stanley. 2016. The H-index of a network node and its relation to degree and coreness. Nature communications 7, 1 (2016), 1–7.

[27] Wissam Khaouid, Marina Barsky, Venkatesh Srinivasan, and Alex Thomo. 2015. K-core decomposition of large networks on a single PC. Proc. VLDB Endow. 9, 1 (2015), 13–23.

[28] Alberto Montresor, Francesco De Pellegrini, and Daniele Miorandi. 2012. Distributed k-core decomposition. IEEE Transactions on parallel and distributed systems 24, 2 (2012), 288–300.

 

Comments

Popular posts from this blog

Measurements, Analyses, and Insights on the Entire Ethereum Blockchain Network