Neighborhood-based Hypergraph Core Decomposition
A summary of the PVLDB
2023 research paper by Naheed Anjum Arafat, Arijit Khan, Arpit Kumar Rai, and
Bishwamittra Ghosh
Background: Many
real-world relations consist of polyadic entities, e.g., relations between
individuals in co-authorships [1], legislators in parliamentary voting [2],
items in e-shopping carts [3], proteins in protein complexes, and metabolites
in a metabolic process [4, 5]. In these scenarios, the hypergraph is a natural
data model where an edge may connect more than two entities. Recently, there
has been a growing interest in hypergraph data management [6, 7, 8, 9, 10, 11].
We study neighborhood-based hypergraph core decomposition (Figure 1(a)):
a novel way of decomposing hypergraphs into hierarchical neighborhood-cohesive
subhypergraphs. It decomposes a hypergraph into nested, strongly-induced
maximal subhypergraphs such that all the nodes in every subhypergraph have at
least a certain number of neighbors in that subhypergraph. Being
strongly-induced means that a hyperedge is only present in a subhypergraph if
and only if all its constituent nodes are present in that subhypergraph.
Example. The hypergraph π» in Figure 1(a) is constructed based on four events (π, π , π1, π2) from the Nashville Meetup Network dataset [12]. Each hyperedge denotes an event, nodes in a hyperedge are participants of that event. Two nodes are neighbors if they co-participate in an event. We notice that Hall has 4 neighbors: Popiden, Hagler, Wallace, and Matson. Similarly, Matson, Hagler, Jeannie, Sumner, Annie, Newton, Wallace, and Popiden have 13, 10, 9, 9, 6, 6, 6, and 6 neighbors, respectively. As every node has ≥ 4 neighbors, the neighborhood based 4-core, denoted by π»[π4], is the hypergraph π» itself. The neighborhood based 6-core is the subhypergraph π»[π6] = {π, π } because participants of π (e.g., Newton, Annie) and π (e.g., Matson) respectively have 6 and 7 neighbors in π»[π6]. Finally, the neighborhood based 7-core is the subhypergraph {π}.
Earlier, core decomposition of graphs is shown to be an important tool
for solving many graph data management problems, e.g., community detection [13],
densest subgraph discovery [14], identifying influential nodes [22], and network
visualization [15, 16].
Why Existing Approaches Do Not Work? Alternative approaches to decomposing hypergraphs, e.g., reduction to
clique or bipartite graphs, are not meaningful in certain applications, the
later also results in inefficient decomposition, while existing degree-based
hypergraph decomposition does not distinguish nodes with different neighborhood
sizes.
Existing Approach-1. Transform the hypergraph into other objects (e.g., a graph), apply existing decomposition approaches [17, 18] on that object, and then project the decomposition back to the hypergraph. For instance, a hypergraph is transformed to a clique graph by replacing the hyperedges with cliques, and classical graph core decomposition [17] is applied. A hypergraph can also be transformed into a bipartite graph by representing the hyperedges as nodes in the second partition and creating an edge between two cross-partition nodes if the hyperedge in the second partition contains a node in the first partition. Finally, distance-2 core decomposition [18, 19] is applied. In distance-2 core-decomposition, nodes in π-core have at least π 2-hop neighbors in the subgraph. First, the decomposition they yield may be different from that of ours: πore(πππππππ) = 5 in both clique graph and dist-2 bipartite graph decompositions, whereas Popiden has core number 4 in our decomposition. Clique graph and bipartite graph decompositions fail to identify that the relation ⟨Popiden-Hagler⟩ should not exist without the existence of event π2, since π2 is the only event where they co-participate. Second, the resulting decomposition maybe unreasonable: low-importance events π1 and π2 by the same interest group are placed in different cores causing difficulty in separating less-important events from more-important ones. Third, such transformations inflate the problem size. A hypergraph in [20] with 2M nodes and 15M hyperedges is converted to a bipartite graph with 17M nodes and 1B edges. A π-uniform hypergraph with π hyperedges causes its clique graph to have O(ππ²) edges. The bipartite graph representation also requires distance-2 core decomposition [18, 19], which is more expensive due to inflated problem-size.
Existing Approach-2. The degree π(π£) of a node π£ in hypergraph
π» is the number of hyperedges incident on π£, i.e., π(π£) = |{π∈πΈ|π£∈π}|. Sun et al. [21] define the deg-π-core of a hypergraph:
In degree-based core decomposition, every node in the π-core has
degree at least π in that core. It does not consider hyperedge sizes.
As a result, nodes in the same core may have vastly different neighborhood
sizes. For instance, Hall and Matson have 4 and 13 neighbors, respectively, yet
they belong to the same 1-core in the degree-based decomposition (Figure1(b)).
There are applications, e.g., propagation of contagions in epidemiology,
diffusion of information in viral marketing, where it is desirable to capture
such differences because nodes with the same number of neighbors in a
subhypergraph are known to exhibit similar diffusion characteristics [23]. Indeed,
from a practical viewpoint, if the infectious diseases control authority is
looking for some key events that cause higher infection spread (e.g., meeting
with 6 neighbors per participant) and hence such events need to be intervened,
they are events {π, π
} (nbr-6-core) as they cause meeting with at least 6
neighbors per participant. Such distinction across various events and
participants is not possible according to degree-based core decomposition of π». Moreover, the
neighborhood-based decomposition is also logical: innermost core contains π, which is a tech
event organized by a relatively popular group with 918 members, followed by the
second innermost core containing {π
, π}. π
is an event organized
by a secular organization with 525 members which often discusses religion matters.
The outermost core contains two events held by a niche social activist group with
only 185 members.
Challenges. A hyperedge can relate
to more than two nodes, and a pair of nodes maybe related by multiple, distinct
hyperedges. Thus, a trivial adaptation of core-decomposition algorithms for graphs
to hypergraphs is difficult. In the classic peeling algorithm for graph core
decomposition, when a node is removed, the degree of its neighbors is decreased
by 1: this allows important optimizations which makes the peeling algorithm linear-time
and efficient [17, 24, 25]. However, in a neighborhood-based hypergraph core decomposition,
deleting a node may reduce the neighborhood size of its neighboring node by more
than 1. Hence, to recompute the number of neighbors of a deleted node’s neighbor,
one must construct the residual hypergraph after deletion, which makes the decomposition
polynomial-time and expensive. Furthermore, the existing lower bound that makes
graph core decomposition more efficient [18] does not work for
neighborhood-based hypergraph core decomposition. In the following, we also discuss
the challenges associated with adopting the local approach [26], one of the most
efficient methods for graph core decomposition to hypergraphs.
In a local approach, a core-number estimate is updated iteratively [27, 26]
or in a distributed manner [28] for every node in a graph. The initial value of
a node’s core-number estimate is an upper bound of it score-number. In subsequent
rounds, this estimate is iteratively decreased based on estimates of neighboring
nodes. [26] uses β-index [29] for such update. They have shown that the following
invariant must hold: every node with core-number π has β-index at least
π, and the subgraph induced by nodes with β-index at least
π has at least π neighbors per node in that subgraph. The former holds but
the later may not hold in a hypergraph, because the subhypergraph induced by nodes
with β-index ≥ π may not include hyperedges that partially contain
other nodes. Due to those ‘missing’ hyperedges, the number of neighbors of some
nodes in that subhypergraph may drop below π, violating the
coreness condition. Thus, a local approach used for computing the π-core [27, 26,
28] or(π, β)-core [19] in graphs results in incorrect neighborhood-based
hypergraph cores.
Algorithms: We make three main
contributions.
Exact algorithms. We propose three
exact algorithms with their correctness and time-complexity analyses. Two of them,
Peel and its enhancement E-Peel, adopt classic peeling [17], incurring global changes
to the hypergraph. For E-Peel, we derive novel lower-bound on core-number that eliminates
many redundant neighborhood re-computations. Our third algorithm, Local-core only
makes node level local computations. Even though the existing local method [26,
28] fails to correctly find neighborhood-based core-numbers in a hypergraph, our
algorithm Local-core applies a novel Core-correction procedure, ensuring correct
core-numbers.
Optimization and parallelization strategies. We propose four optimization strategies to improve the
efficiency of Local-core. Compressed representations for hypergraph and the
family of optimizations for efficient Core-correction are novel to core-decomposition
literature. The other optimizations, though inspired from graph literature, have
not been adopted in earlier hypergraph-related works. We also propose parallelization
of Local-core for the shared-memory programming paradigm.
(Neighborhood, degree)-core. We define a more general hypergraph-core model, (neighborhood, degree)-core
by considering both neighborhood and degree constraints, propose its decomposition
algorithm Local-core+Peel, and demonstrate its superiority in diffusion spread.
Experiments and applications: Our empirical evaluation on real and synthetic hypergraphs shows that
the proposed algorithms are effective, efficient, and practical. Our OpenMP
parallel implementation Local-core(P) decomposes hypergraph with 27M nodes, 17M
hyperedges in 91 seconds.
We show our decomposition to be more effective in disrupting diffusion than
other decompositions. Our greedy algorithm proposed for volume-densest subhypergraph
recovery achieves (πππππ(πππππ−2)+2)-approximation, where hyperedge cardinality (#nodes
in that hyperedge) and node-pair co-occurrence (#hyperedges containing that pair)
are at most πππππ
and πππππ, respectively. If the hypergraph is a graph (πππππ=2), our result generalizes Charikar’s 2-approximation
guarantee for densest subgraph discovery [14]. Our volume-densest subhypergraphs
capture differently important meetup events compared to degree and clique graph
decomposition-based densest subhypergraphs.
For more information, click here for our paper.
References:
[1] Yi Han, Bin Zhou, Jian Pei, and Yan Jia. 2009. Understanding importance
of collaborations in co-authorship networks: a supportiveness analysis approach.
In SIAM International Conferenceon Data Mining (SDM). 1112–1123.
[2] Austin R. Benson, Rediet Abebe, Michael T. Schaub, Ali Jadbabaie,
and Jon Kleinberg. 2018. Simplicial closure and higher-order link prediction. Proceedings
of the National Academy of Sciences 115, 48(2018), E11221–E11230.
[3] Xin Xia, Hongzhi Yin, Junliang Yu, Qinyong Wang, Lizhen Cui, and Xiangliang
Zhang. 2021. Self-supervised hypergraph convolutional networks for session based
recommendation. In AAAI Conference on Artificial Intelligence. 45034511.
[4] Christoph Flamm, BΓ€rbel M. R. Stadler, and Peter F. Stadler. 2015.
Generalized topologies: hypergraphs, chemical reactions, and biological evolution.
In Advances in Mathematical Chemistry and Applications. Bentham Science, 300–328.
[5] Thomas Gaudelet, NoΓ«l Malod-Dognin, and Natasa Przulj. 2018. Higher-order
molecular organization as a source of biological function. Bioinformatics 34, 17
(2018), i944–i953.
[6] Pit Fender and Guido Moerkotte. 2013. Counterstrike: generic top-down
join enumeration for hypergraphs. Proc. VLDB Endow. 6, 14 (2013), 1822–1833.
[7] Igor Kabiljo, Brian Karrer, Mayank Pundir, Sergey Pupyrev, Alon Shalita,
Yaroslav Akhremtsev, and Alessandro Presta. 2017. Social hash partitioner: a
scalable distributed hypergraph partitioner. Proc. VLDB Endow. 10, 11 (2017),
1418–1429.
[8] Geon Lee, Jihoon Ko, and Kijung Shin. 2020. Hypergraph motifs: concepts,
algorithms, and discoveries. Proc. VLDB Endow. 13, 11 (2020), 2256–2269.
[9] Julian Shun. 2020. Practical parallel hypergraph algorithms. In The
ACM SIGPLAN Symposium on Principles and Practice of Parallel Programming. 232–249.
[10] Justin Sybrandt, Ruslan Shaydulin, and Ilya Safro. 2022. Hypergraph
partitioning with embeddings. IEEE Trans. Knowl. Data Eng. 34, 6 (2022), 2771–2782.
[11] Joyce Jiyoung Whang, Rundong Du, Sangwon Jung, Geon Lee, Barry L.
Drake, Qingqing Liu, Seonggoo Kang, and Haesun Park. 2020. MEGA:multi-view
semi-supervised clustering of hypergraphs. Proc. VLDB Endow. 13, 5 (2020),
698–711.
[12] Stephen Bailey.
2017.
Nashville meetup network. https://www.kaggle.com/datasets/stkbailey/nashville-meetup .
[13] Irene Malvestio, Alessio Cardillo, and Naoki Masuda. 2020. Interplay
between k-core and community structure in complex networks. Scientific Reports 10,
14702 (2020).
[14] Moses Charikar. 2000. Greedy approximation algorithms for finding
dense components in a graph. In Approximation Algorithms for Combinatorial Optimization,
Third International Workshop (Lecture Notes in Computer Science), Vol. 1913.
Springer,84–95.
[15] J. Ignacio Alvarez-Hamelin, Luca Dall’Asta, Alain Barrat, and Alessandro
Vespignani. 2005. Large scale networks fingerprinting and visualization using the
k-core decomposition. In Advances in Neural Information Processing Systems (NeurIPS).
MITPress, 41–50.
[16] Vladimir Batagelj, Andrej Mrvar, and MatjaΕΎ ZaverΕ‘nik. 1999. Partitioning
approach to visualization of large graphs. In Graph Drawing (Lecture Notes in
Computer Science), Vol.1731.
[17] Vladimir Batagelj and MatjaΕΎ ZaverΕ‘nik. 2011. Fast algorithms for
determining (generalized) core groups in social networks. Advances in Data Analysis
and Classification 5, 2 (2011), 129–145.
[18] Francesco Bonchi, Arijit Khan, and Lorenzo Severini. 2019. Distance-generalized
core decomposition. In The International Conference on Management of Data
(SIGMOD). ACM, 1006–1023.
[19] Qing Liu, Xuliang Zhu, Xin Huang, and Jianliang Xu. 2021. Local algorithms
for distance-generalized core decomposition over large dynamic graphs. Proc. VLDB
Endow. 14, 9 (2021), 1531–1543.
[20] Jaewon Yang and Jure Leskovec. 2015. Defining and evaluating network
communities based on ground-truth. Knowledge and Information Systems 42, 1 (2015),
181–213.
[21] Bintao Sun, T. -H. Hubert Chan, and Mauro Sozio. 2020. Fully dynamic
approximate k-core decomposition in hypergraphs. ACM Trans. Knowl. Discov. Data
14, 4 (2020), 39:1–39:21.
[22] Fragkiskos D. Malliaros, Maria-Evgenia G. Rossi, and Michalis Vazirgiannis.
2016. Locating influential nodes in complex networks. Scientific Reports 6, 19307
(2016).
[23] Maksim Kitsak, Lazaros K. Gallos, Shlomo Havlin, Fredrik Liljeros,
Lev Muchnik, H. Eugene Stanley, and HernΓ‘n A. Makse. 2010. Identification of influential
spreaders in complex networks. Nature physics 6, 11 (2010), 888–893.
[24] James Cheng, Yiping Ke, Shumo Chu, and M. Tamer Γzsu. 2011. Efficient
core decomposition in massive networks. In IEEE International Conference on Data
Engineering (ICDE). 51–62.
[25] Ahmet Erdem SarΓyΓΌce, Bu˘gra Gedik, Gabriela Jacques-Silva, Kun-Lung
Wu, and Γmit V. ΓatalyΓΌrek. 2013. Streaming algorithms for k-core decomposition.
Proc. VLDB Endow. 6, 6 (2013), 433–444.
[26] Linyuan LΓΌ, Tao Zhou, Qian-Ming Zhang, and H. Eugene Stanley. 2016.
The H-index of a network node and its relation to degree and coreness. Nature
communications 7, 1 (2016), 1–7.
[27] Wissam Khaouid, Marina Barsky, Venkatesh Srinivasan, and Alex Thomo.
2015. K-core decomposition of large networks on a single PC. Proc. VLDB Endow. 9,
1 (2015), 13–23.
[28] Alberto Montresor, Francesco De Pellegrini, and Daniele Miorandi.
2012. Distributed k-core decomposition. IEEE Transactions on parallel and distributed
systems 24, 2 (2012), 288–300.
Comments
Post a Comment