Jointly Finding Seeds and Relevant Tags in Social Networks
Jointly Finding Seeds and Relevant Tags in Social Networks for Targeted Influence Maximization
A summary of the SIGMOD 2018 research paper by Xiangyu KE, Arijit KHAN and Gao CONG
[Influence Maximization]
Nowadays online social networks have become an important medium for campaigners to promote their products. It is quite normal that a social network has millions of users, and it is impossible for a campaigner to directly reach out to all of them via advertisements, free samples, or discounted prices. If we see our friends' tweets praising some products, we probably also adopt them. Therefore, it is important to identify a small set of users (with cardinality k) who can influence as many other users as possible in the social network. It is known as the Influence Maximization problem, and has been widely studied [1, 2]. Moreover, a campaigner can further specify her target customers, either explicitly, or via some constraints (e.g., people in age group 20~30), which is referred to as the Targeted Influence Maximization [3].
[Conditional Influence Probability]
Users generally apply tags to characterize their contents in an online social network, e.g., hashtags in Twitter and Instagram. The probability that a tweet originated by a user will be re-tweeted by her followers clearly depends on the hashtags and other keywords in that tweet. In the above figure, the gentleman's tweets about arts can attract much more attention from the lady than those about bars. Assuming independence of tags, the influence probability between two users, for a given set of tags C1 can be calculated as follows:
Now during campaigning, setting a small budget on the number of tags is critical for the following reasons:
(a) to avoid information overload against campaign effectiveness (e.g., if one wants to write an advertisement or a blog, she may want to focus on a small number of relevant topics).
(b) for cost-effective planning and due to other physical constraints (e.g., an album can accommodate a limited number of genres/songs).
[Our Problem]
Given a set of target customers, a small budget k on the number of seed users, and a small budget r on the number of relevant tags, what are the top-k seed nodes and the top-r most relevant tags that maximize the expected spread of the campaign within the target set of customers?
[Case Study]
To demonstrate the effectiveness of our problem, we conduct a case study on the real-world reviewing-based Yelp social network (yelp.com.sg/dataset/challenge), where users can assign reviews to a business after visiting it. As shown in the above table, the most relevant tags (found by our method) to influence people in each target city are quite different, e.g., for Las Vegas, popular categories are more about tourism; whereas for Pittsburgh, various food items appear in the top-10 list. Furthermore, the above figure shows that an optimal set of advertisement tags for a specific city may not work in the same way for other cities. It is interesting to note that even only with 10 most relevant tags found by our method, we can obtain nearly 90% of the influence spread achieved by all 195 tags in the dataset. This demonstrates the applicability and usefulness of our novel query.
(a) The classic influence maximization problem is NP-hard [1], and the computation of influence spread is #P-hard [4], following the Independent Cascade (IC) model of influence spread. We consider the same IC model in this work.
(b) Finding the top-r tags that maximize the influence spread from a given set of seeds to target nodes in a social network is NP-hard, and also hard to approximate (does not admit any PTAS) [5]. This is neither sub-modular, nor super-modular.
[Our Solution]
We propose an iterative framework for solving this problem. The framework alternatively and in an iterative manner, optimizes the top-k seed users and the top-r relevant tags, assuming that the others are fixed. We prove that our method converges to a local optimum [6], because at every round, the expected influence spread within the target set monotonically increases until convergence.
For seed finding part, we refine the classic TIM algorithm [7] to adapt to targeted influence maximization setting, and equip it with smart indexing strategies for better efficiency. We build θc indexes for each tag beforehand and combine them as working graphs when processing online queries. We have both theoretical and empirical study on θc to ensure a good quality [6]. Moreover, we propose lazy and local indexing strategies for further efficiency improvement (30%).
For tag finding part, we modify the single path selection algorithm [5] in a batch manner and significantly improve its accuracy. We discuss the limitations of the original objective function and the algorithm in [5], and also the intuition of our modifications in details in our paper [6], which improves the accuracy by up to 30% with similar running time.
For more information, click here for our paper.
Blog post contributed by: Xiangyu KE
[Reference]
1. D. Kempe, J. M. Kleinberg, and E. Tardos. Maximizing the Spread of Influence through a Social Network. In KDD, 2003.
2. W. Chen, L.V.S. Lakshmanan, and C. Castillo. Information and Influence Propagation in Social Networks. Synthesis Lectures on Data Management. Morgan& Claypool Publishers, 2013.
3. T. Lappas, E. Terzi, D. Gunopulos, and H. Mannila. Finding Effectors in Social Networks. In KDD, 2010.
4. W. Chen, C. Wang, and Y. Wang. Scalable Influence Maximization for Prevalent Viral Marketing in Large-Scale Social Networks. In KDD, 2010.
5. A.Khan, F. Bonchi, F. Gullo, and A. Nufer. Conditional Reliability in Uncertain Graphs. In TKDE, 2018.
6. X. Ke, A. Khan, and G. Cong. Finding Seeds and Relevant Tags Jointly: For Targeted Influence Maximization in Social Networks. In SIGMOD, 2018.
7. Y. Tang, X. Xiao, and Y. Shi. Influence Maximization: Near-Optimal Time Complexity Meets Practical Efficiency. In SIGMOD, 2014.
Comments
Post a Comment