Event Detection and Identification of Influential Spreaders in Social Media Data Streams


Micro blogging, a popular social media service platform, has become a new information channel for users  to receive and exchange the most up-to-date information on current events. Consequently, it is a crucial platform  for detecting newly emerging events and for identifying influential spreaders who have the potential to actively  disseminate knowledge about events through microblogs. However, traditional event detection models require  human intervention to detect the number of topics to be explored, which significantly reduces the efficiency and  accuracy of event detection. In addition, most existing methods focus only on event detection and are unable to  identify either influential spreaders or key event-related posts, thus making it challenging to track momentous events  in a timely manner. To address these problems, we propose a Hypertext-Induced Topic Search (HITS) based  Topic-Decision method (TD-HITS), and a Latent Dirichlet Allocation (LDA) based Three-Step model (TS-LDA). TDHITS  can automatically detect the number of topics as well as identify associated key posts in a large number  of posts. TS-LDA can identify influential spreaders of hot event topics based on both post and user information.  The experimental results, using a Twitter dataset, demonstrate the effectiveness of our proposed methods for both  detecting events and identifying influential spreaders.

Existing Works 

In recent years, event detection has been the focus of a  wide range of research, especially from the social media  perspective, due to its openness and data availability  (e.g., Twitter access through Twitter API and  Facebook access through Facebook API. Existing  event detection models for social media are categorized  as either feature-pivot[14] or document-pivot[15] models.  Feature-pivot models are used to study the  distributions of words and to detect events by  grouping words together. For example, Mathioudakis  and Koudas detected events by grouping bursty  words. However, this method does not have a robust  probabilistic foundation and focuses only on event  detection; it fails to identify key event-related posts  or the influential spreaders involved in these critical  incidents. Wavelet analysis[21] has been applied to  the frequency-based raw signals of words in building  signals for individual words, and filters trivial words by  examining their corresponding signal auto-correlations.  This method detects events using a modularity-based  graph partitioning technique. However, it also focuses  only on event detection and does not take into account  key posts or influential spreaders, which increases  the complexities involved in promptly tracking and  controlling events.

Proposed System 

In this  paper, we propose the TD-HITS method, which can  automatically detect the number of topics and identify  key posts from a a large number of noisy  posts. Based on the TD-HITS model, we further  propose the TS-LDA model, which is a document-pivot  model. In the proposed TD-HITS model, noisy posts  and ordinary users are effectively removed from the  selection, and with the proposed TS-LDA model, there  is no need to set up the number of topics manually  in advance as it effectively detects hot events and  identifies influential spreaders. Finally, our proposed  methods exhibit better efficiency and accuracy in event  detection and the identification of influential spreaders  by addressing the above-noted drawbacks of existing  methods The TD-HITS method has two modules: first, the  HITS algorithm is used to create a smaller highquality  training data set by extracting high-quality posts  and influential users from the large pool of posts  and users. Second, a topic-decision method is used to automatically detect the number of topics and to  discover key posts from a a large number of posts.


In this paper, we proposed the HITS-based topicdecision  method, TD-HITS. This proposed approach creates a smaller, high-quality training data set by  filtering high-quality posts and high-quality users from  a collection of users and posts. This approach largely  reduces the impact of unrelated posts and occasional  users, thereby improving the efficiency and accuracy of  the event detection process. Moreover, this approach  can automatically detect the correct number of topics  and identifies event-related key posts to realize higher  precision. In addition, we also proposed an LDA-based  three-step model TS-LDA, which detects critical events  by analyzing the number of topics and identifying the  influential spreaders linked to them. This approach  utilizes both post and user information, which can  enable a better understanding in a timely and accurate  manner of the users involved in these critical incidents.


[1] X. M. Zhou and L. Chen, Event detection over twitter  social media streams, VLDB J., vol. 23, no. 3, pp. 381–  400, 2014.

[2] A. Aldhaheri and J. Lee, Event detection on large  social media using temporal analysis, in Proc. 7th Annu.  Computing and Communication Workshop and Conf., Las  Vegas, NV, USA, 2017, pp. 1–6.

[3] P. Yan, MapReduce and semantics enabled event detection  using social media, J . Artif. Intell. Soft Comput. Res., vol.  7, no. 3, pp. 201–213, 2017.

[4] Y. D. Zhou, H. Xu, and L. Lei, Event detection  based on interactive communication streams in social  network, in Proc. 9th EAI Int. Conf. Mobile Multimedia  Communications, Xi’an, China, 2016, pp. 54–57.

[5] T. Hofmann, Probabilistic latent semantic indexing, in  Proc. 22nd Annu. Int. ACM SIGIR Conf. Research and Development in Information Retrieval, Berkeley, CA,  USA, 1999, pp. 50–57.

[6] T. Hofmann, Probabilistic latent semantic indexing, in  Proc. 22nd Annu. Int. ACM SIGIR Conf. Research and  Development in Information Retrieval, Berkeley, CA,  USA, 1999, pp. 50–57.

[7] D. M. Blei, A. Y. Ng, and M. I. Jordan, Latent Dirichlet  allocation, J . Mach. Learn. Res., vol. 3, pp. 993–1022,  2003.

[8] Q. M. Diao, J. Jiang, F. D. Zhu, and E. P. Lim, Finding  bursty topics from microblogs, in Proc. 50th Annu.  Meeting of the Association for Computational Linguistics:  Long Papers–Volume 1, Jeju Island, Korea, 2012, pp. 536–  544.

[9] X. H. Wang, C. X. Zhai, X. Hu, and R. Sproat,  Mining correlated bursty topic patterns from coordinated  text streams, in Proc. 13th ACM SIGKDD Int. Conf.  Knowledge Discovery and Data Mining, San Jose, CA,  USA, 2007, pp. 784–793.

[10] L. AlSumait, D. Barbara, and C. Domeniconi, On-Line  LDA: Adaptive topic models for mining text streams with  applications to topic detection and tracking, in Proc. 8th  IEEE Int. Conf. Data Mining, Pisa, Italy, 2008, pp. 3–12.

[11] J. X. Li, Z. Y. Tai, R. C. Zhang, W. R. Yu, and L. Liu,  Online bursty event detection from microblog, in Proc.  7th IEEE/ACM Int. Conf. Utility and Cloud Computing,  London, UK, 2014, pp. 865–870.

[12] S. Chakrabarti, B. Dom, P. Raghavan, S. Rajagopalan, D.  Gibson, and J. Kleinberg, Automatic resource compilation  by analyzing hyperlink structure and associated text,  Comput. Netw. ISDN Syst., vol. 30, nos. 1–7, pp. 65–74,  1998.

[13] J. Bao, Y. Zheng, and M. F. Mokbel, Location-based and  preference-aware recommendation using sparse geo-social  networking data, in Proc. 20th Int. Conf. Advances in  Geographic Information Systems, Redondo Beach, CA,  USA, 2012, pp. 199–208.