Event Detection and Identification of Influential Spreaders in Social Media Data Streams
Abstract
Micro blogging, a popular social media service platform, has become a new information channel for users to receive and exchange the most up-to-date information on current events. Consequently, it is a crucial platform for detecting newly emerging events and for identifying influential spreaders who have the potential to actively disseminate knowledge about events through microblogs. However, traditional event detection models require human intervention to detect the number of topics to be explored, which significantly reduces the efficiency and accuracy of event detection. In addition, most existing methods focus only on event detection and are unable to identify either influential spreaders or key event-related posts, thus making it challenging to track momentous events in a timely manner. To address these problems, we propose a Hypertext-Induced Topic Search (HITS) based Topic-Decision method (TD-HITS), and a Latent Dirichlet Allocation (LDA) based Three-Step model (TS-LDA). TDHITS can automatically detect the number of topics as well as identify associated key posts in a large number of posts. TS-LDA can identify influential spreaders of hot event topics based on both post and user information. The experimental results, using a Twitter dataset, demonstrate the effectiveness of our proposed methods for both detecting events and identifying influential spreaders.
Existing Works
In recent years, event detection has been the focus of a wide range of research, especially from the social media perspective, due to its openness and data availability (e.g., Twitter access through Twitter API and Facebook access through Facebook API. Existing event detection models for social media are categorized as either feature-pivot[14] or document-pivot[15] models. Feature-pivot models are used to study the distributions of words and to detect events by grouping words together. For example, Mathioudakis and Koudas detected events by grouping bursty words. However, this method does not have a robust probabilistic foundation and focuses only on event detection; it fails to identify key event-related posts or the influential spreaders involved in these critical incidents. Wavelet analysis[21] has been applied to the frequency-based raw signals of words in building signals for individual words, and filters trivial words by examining their corresponding signal auto-correlations. This method detects events using a modularity-based graph partitioning technique. However, it also focuses only on event detection and does not take into account key posts or influential spreaders, which increases the complexities involved in promptly tracking and controlling events.
Proposed System
In this paper, we propose the TD-HITS method, which can automatically detect the number of topics and identify key posts from a a large number of noisy posts. Based on the TD-HITS model, we further propose the TS-LDA model, which is a document-pivot model. In the proposed TD-HITS model, noisy posts and ordinary users are effectively removed from the selection, and with the proposed TS-LDA model, there is no need to set up the number of topics manually in advance as it effectively detects hot events and identifies influential spreaders. Finally, our proposed methods exhibit better efficiency and accuracy in event detection and the identification of influential spreaders by addressing the above-noted drawbacks of existing methods The TD-HITS method has two modules: first, the HITS algorithm is used to create a smaller highquality training data set by extracting high-quality posts and influential users from the large pool of posts and users. Second, a topic-decision method is used to automatically detect the number of topics and to discover key posts from a a large number of posts.
Conclusion
In this paper, we proposed the HITS-based topicdecision method, TD-HITS. This proposed approach creates a smaller, high-quality training data set by filtering high-quality posts and high-quality users from a collection of users and posts. This approach largely reduces the impact of unrelated posts and occasional users, thereby improving the efficiency and accuracy of the event detection process. Moreover, this approach can automatically detect the correct number of topics and identifies event-related key posts to realize higher precision. In addition, we also proposed an LDA-based three-step model TS-LDA, which detects critical events by analyzing the number of topics and identifying the influential spreaders linked to them. This approach utilizes both post and user information, which can enable a better understanding in a timely and accurate manner of the users involved in these critical incidents.
References
[1] X. M. Zhou and L. Chen, Event detection over twitter social media streams, VLDB J., vol. 23, no. 3, pp. 381– 400, 2014.
[2] A. Aldhaheri and J. Lee, Event detection on large social media using temporal analysis, in Proc. 7th Annu. Computing and Communication Workshop and Conf., Las Vegas, NV, USA, 2017, pp. 1–6.
[3] P. Yan, MapReduce and semantics enabled event detection using social media, J . Artif. Intell. Soft Comput. Res., vol. 7, no. 3, pp. 201–213, 2017.
[4] Y. D. Zhou, H. Xu, and L. Lei, Event detection based on interactive communication streams in social network, in Proc. 9th EAI Int. Conf. Mobile Multimedia Communications, Xi’an, China, 2016, pp. 54–57.
[5] T. Hofmann, Probabilistic latent semantic indexing, in Proc. 22nd Annu. Int. ACM SIGIR Conf. Research and Development in Information Retrieval, Berkeley, CA, USA, 1999, pp. 50–57.
[6] T. Hofmann, Probabilistic latent semantic indexing, in Proc. 22nd Annu. Int. ACM SIGIR Conf. Research and Development in Information Retrieval, Berkeley, CA, USA, 1999, pp. 50–57.
[7] D. M. Blei, A. Y. Ng, and M. I. Jordan, Latent Dirichlet allocation, J . Mach. Learn. Res., vol. 3, pp. 993–1022, 2003.
[8] Q. M. Diao, J. Jiang, F. D. Zhu, and E. P. Lim, Finding bursty topics from microblogs, in Proc. 50th Annu. Meeting of the Association for Computational Linguistics: Long Papers–Volume 1, Jeju Island, Korea, 2012, pp. 536– 544.
[9] X. H. Wang, C. X. Zhai, X. Hu, and R. Sproat, Mining correlated bursty topic patterns from coordinated text streams, in Proc. 13th ACM SIGKDD Int. Conf. Knowledge Discovery and Data Mining, San Jose, CA, USA, 2007, pp. 784–793.
[10] L. AlSumait, D. Barbara, and C. Domeniconi, On-Line LDA: Adaptive topic models for mining text streams with applications to topic detection and tracking, in Proc. 8th IEEE Int. Conf. Data Mining, Pisa, Italy, 2008, pp. 3–12.
[11] J. X. Li, Z. Y. Tai, R. C. Zhang, W. R. Yu, and L. Liu, Online bursty event detection from microblog, in Proc. 7th IEEE/ACM Int. Conf. Utility and Cloud Computing, London, UK, 2014, pp. 865–870.
[12] S. Chakrabarti, B. Dom, P. Raghavan, S. Rajagopalan, D. Gibson, and J. Kleinberg, Automatic resource compilation by analyzing hyperlink structure and associated text, Comput. Netw. ISDN Syst., vol. 30, nos. 1–7, pp. 65–74, 1998.
[13] J. Bao, Y. Zheng, and M. F. Mokbel, Location-based and preference-aware recommendation using sparse geo-social networking data, in Proc. 20th Int. Conf. Advances in Geographic Information Systems, Redondo Beach, CA, USA, 2012, pp. 199–208.