Reliability Maximization in Uncertain Graphs

Reliability Maximization in Uncertain Graphs


A summary of the IEEE TKDE 2020 research paper by Xiangyu Ke, Arijit Khan, Mohammad Al Hasan, and Rojin Rezvansangsari

[Background]  Rich expressiveness of probabilistic graphs and their utility to model the inherent uncertainty in a wide range of applications have prompted a large number of research works on probabilistic graphs by the data management research communities [1]. In an uncertain graph setting, Network Reliability is a well-studied problem [2], [3], which requires to measure the probability that a target node is reachable from a source node. Reliability has been widely studied in device networks, i.e., networks whose nodes are electronic devices and the (physical) links between such devices have a probability of failure [4]. More recently, the attention has been shifted to social, communication, transportation, genomic, and logistic networks [5]–[7]. 

Figure 1: Reliability in a sensor network: the probabilities of successful package delivery rate from a sensor to another.

[Our Problem] Given an uncertain graph, a source node s, a target node t, a probability threshold ζ ∈ (0, 1], and a small positive integer k, we investigate to find the top-k edges to add in the graph, each with probability ζ, so that the reliability from the source s to the target t is maximized.

[Application Scenarios] (1) In mobile ad-hoc networks, the connectivity between sensor nodes and devices is estimated using noisy measurements, thus leading to edges naturally associated with a probability of existence [7]. Road networks can be modeled as uncertain graphs because of unexpected traffic congestion [6]. In these networks, creating new connections between nodes (e.g., building new roads, flyovers, adding Ethernet cables) is limited by physical constraints and budget. One can introduce only k new edges where k is decided based on resource constraints. (2) In social networks, finding k best shortcut edges could maximize the information diffusion probability from an early adopter to a target customer [8], thus the network host can actively recommend these links to the respective users. (3) In case of protein-interaction networks, interactions are established for a limited number of proteins through noisy and error-prone experiments—each edge is associated with a probability accounting for the existence of the interaction. Therefore, finding the top-k shortcut edges can assist in de-noising protein-interaction networks [9].

[Challenges and Our Theoretical Contributions] (1) The computation of s-t reliability is #P-hard [2][3]. Even if polynomial time reliability estimation methods are employed, we prove the following. (2) Our problem is NP-hard, and does not admit any PTAS. (3) The underlying objective function is neither submodular, nor supermodular. (4) The optimal solution changes if input probabilities on new edges vary. (5) The optimal solution changes if the probabilities in the  input graph vary. (6) The optimal solution for budget (#new edges) k is not a subset of that for budget r when k<r. (7) If the direct edge from s to t, st, is missing in the input graph, and is allowed to be added, st will always be included in the top-k optimal solution.

[Algorithmic Contributions] In spite of such challenges, we devise efficient algorithms. (1) We investigate several baselines, analyse their complexities, and discuss the shortcomings. They include individual top-k edge selection, hill-climbing method, centrality-based method, and eigenvalue-based method. (2) We eliminate the search space (can be up to O(n^2) in sparse real networks) via finding the promising nodes with high original reliability from the source, or to the target. (3) After adding all possible missing edges only between pairs of promising nodes, we further filter out those paths with low probabilities from s to t. The intuition is that what really matters in computing the reliability between two nodes is the set of highly reliable paths between them [10]-[12]. (4) We adapt the single-path-inclusion algorithm in [10] to our problem setting, and refine it in a path-batch manner to significantly improve the accuracy. 

[Other Queries] We also consider a restricted and one extended version of our problem to support a wider family of queries. (1) In the restricted version, the reliability is estimated only by the most reliable path, thus it can be solved in polynomial-time. (2) In the extended version, multiple sources and targets can be provided as input. We investigate several possible reliability aggregation functions. The proposed algorithms are generalized to each case.

[Case Study] The Intel Lab Data (http://db.csail.mit.edu/labdata/labdata.html) dataset contains the sensor network information with 54 sensors deployed in the Intel Berkeley Research Lab (map given in Figures 2 and 3). The probabilities on links denote the percentages of messages from a sender successfully reached to a receiver. Only those links with probabilities higher than the average value (0.33) are shown in the figures, and the thickness represents link probabilities. Due to budget constraints, only 3 new links are allowed for each case. We further assume that the probability of each new link would be the same as the average edge probability of the original dataset, that is, 0.33. We notice that if two sensors are more than 20 meters away, the original link probability between them is usually close to 0. Thus, we only allow establishing new links between a pair of sensors that are at most 15 meters away.


Figure 2: Improving the reliability from sensor 21 (right) to 46 (left) with 3 new links (marked by dotted lines)
The original reliability is 0.4, and the resultant reliability is 0.88.

Figure 3: Improving the reliability from sensor 15 to 40 (on the diagonal) with 3 new links (marked by dotted lines)
The original reliability is 0.28, and the resultant reliability is 0.58.

    For case (1), sensor 46 has very weak connections from outside, while sensor 21 is connected with the bottom part of the lab, where a very dense network exists. Therefore, our solution for this case is to connect sensor 46 with sensors in the bottom part of the lab. By establishing three links 2 to 46, 35 to 46, and 37 to 46, we improve the reliability between 21 to 46 from 0.40 to 0.88.

    For case (2), notice in Figure 7 that the sensors in the center part of lab are well-connected, and the links in this region are thicker than those in the bottom part. The source sensor 15 has a few connections with sensors in the bottom part, but no link with those in the center part. The destination sensor 40 has limited connections beyond its physical neighbors. Existing configuration offers a poor reliability of 0.28 for the connection between source and destination which we like to improve. The smart decision made by our algorithm is as follows: First, connect sensor 35 to 40, thus making sensor 35 a bridge between the center and the bottom region of the network; Second, enable connection from sensor 15 to the center part (by establishing link from 15 to 10, and from 15 to 11). This results in 0.58 overall reliability from sensor 15 to 40, which is more than double of the original reliability value. 

    These results illustrate how our proposed solution for the budgeted reliability maximization problem can be useful in solving real-life problems.

For more information, click here for our paper. 
Blog post contributed by: Xiangyu Ke


[Reference]

[1] A. Khan, Y. Ye, and L. Chen, “On Uncertain Graphs”, Morgan & Claypool Publishers, Synthesis Lectures on Data Management, 2018.

[2] M. O. Ball, “Computational Complexity of Network Reliability Analysis: An Overview”, IEEE Tran. Rel., vol. 35, no. 3, pp. 230–239, 1986.

[3] L. G. Valiant, “The Complexity of Enumeration and Reliability Problems”, SIAM J. on Computing, vol. 8, no. 3, pp. 410–421, 1979.

[4] K. K. Aggarwal, K. B. Misra, and J. S. Gupta, “Reliability Evaluation: A Comparative Study of Different Techniques” Micro. Rel., vol. 14, no. 1, 1975.

[5] D. Kempe, J. M. Kleinberg, and E. Tardos, “Maximizing the Spread of Influence through a Social Network”, in KDD, 2003.

[6] M. Hua and J. Pei, “Probabilistic Path Queries in Road Networks: Traffic Uncertainty aware Path Selection”, in EDBT, 2010.

[7] J. Ghosh, H. Q. Ngo, S. Yoon, and C. Qiao, “On a Routing Problem Within Probabilistic Graphs and its Application to Intermittently Connected Networks”, in INFOCOM, 2007.

[8] C. Chen, H. Tong, B. A. Prakash, T. Eliassi-Rad, M. Faloutsos, and C. Faloutsos, “Eigen-Optimization on Large Graphs by Edge Manipulation”, ACM Trans. Knowl. Discov. Data, vol. 10, no. 4, pp. 49:1–49:30, 2016.

[9] O. Kuchaiev, M. Rasajski, D. J. Higham, and N. Przulj, “Geometric De-noising of Protein-Protein Interaction Networks”, PLOS Computational Biology, vol. 5, no. 8, pp. 1–10, 08 2009.

[10] A. Khan, F. Bonchi, F. Gullo, and A. Nufer, “Conditional Reliability in Uncertain Graphs”, IEEE Trans. Knowl. Data Eng., vol. 30, no. 1, pp. 2078–2092, 2018.

[11] W. Chen, C.Wang, and Y.Wang, “Scalable Influence Maximization for Prevalent Viral Marketing in Large-Scale Social Networks”, in KDD, 2010.

[12] A. Khan, B. Zehnder, and D. Kossmann, “Revenue Maximization by Viral Marketing: A Social Network Host’s Perspective”, in ICDE, 2016.

Comments

Popular posts from this blog

Measurements, Analyses, and Insights on the Entire Ethereum Blockchain Network