Automatic Segmentation of Dynamic Network Sequences with Node Labels
Abstract
Given a sequence of snapshots of flu propagating over a population network, can we find a segmentation when the patterns of the disease spread change, possibly due to interventions? In this paper, we study the problem of segmenting graph sequences with labeled nodes. Memes on the Twitter network, diseases over a contact network, movie-cascades over a social network, etc. are all graph sequences with labeled nodes. Most related work on this subject is on plain graphs and hence ignores the label dynamics. Others require fix parameters or feature engineering. We propose SNAPNETS, to automatically find segmentations of such graph sequences, with different characteristics of nodes of each label in adjacent segments. It satisfies all the desired properties (being parameter free, comprehensive and scalable) by leveraging a principled, multi-level, flexible framework which maps the problem to a path optimization problem over a weighted DAG. Also, we develop the parallel framework of SNAPNETS which speeds up its running time. Finally, we propose an extension of SNAPNETS to handle the dynamic graph structures and use it to detect anomalies (and events) in network sequences. Extensive experiments on several diverse real datasets show that it finds cut points matching ground-truth or meaningful external signals and detects anomalies outperforming non-trivial baselines. We also show that the segmentations are easily interpretable, and that SNAPNETS scales near-linearly with the size of the input. Finally, we show how to use SNAPNETS to detect anomaly in a sequence of dynamic networks.
Proposed System
In this paper, we study the problemof segmenting a graph sequence with varying node-label distributions. First, we assume binary labels and fixed structure for graphs.Next, we propose an extension to our algorithm to handle dynamic graphs with varying nodes and edges then leverage it to detect anomalies and important events. For diseases/memes, the labels can be ‘infected’/‘active’ (1) & ‘healthy’/‘inactive’ (0), and the network can be the underlying contact-network. SNAPNETS is a novel multi-level approach which summarizes the given networks/labels in a very general way at multiple different time-granularities, and then converts the problem into an appropriate optimization problem on a data structure. Further it gives naturally interpretable segments, enhancing its applicability. Also, we easily extend SNAPNETS to segment dynamic graph sequences which can be used to detect anomalies and important events. Finally, we demonstrate SNAPNETS’s usefulness via multiple experiments for segmentation and anomaly detection on diverse real-datasets.
CONCLUSIONS
We presented SNAPNETS, an intuitive and effective method to segment AS-Sequences with binary node labels and extended it to work with dynamic graph sequences as well. It is the first method to satisfy all the desired properties P1, P2, P3. It efficiently finds high-quality segmentations, detects anomalies and events and gives useful insights in diverse complex datasets. Also, we propose ANOMALYSNAPNETS as an expansion of SNAPNETS to detect anomalies and important events in dynamic graph sequences. Finally we parallelize SNAPNETS and ANOMALY-SNAPNETS to accelerate the computations. SNAPNETS is a ‘global’ method and not simply a change-point detection method. We are not just looking for local changes; rather we track the ‘total variation’ using Gs. The SNAPNETS framework is very flexible, as our formulations are very general
REFERENCES
[1] L. Li, J. McCann, N. S. Pollard, and C. Faloutsos, “Dynammo: Mining and summarization of coevolving sequences with missing values,” in Proc. 15th ACM SIGKDD Int. Conf. Knowl. Discovery Data Mining, 2009, pp. 507–516.
[2] A. Likas, N. Vlassis, and J. J. Verbeek, “The global k-means clustering algorithm,” Pattern Recog., vol. 36, no. 2, pp. 451–461, 2003.
[3] K. Henderson, et al., “Metric forensics: A multi-level approach for mining volatile graphs,” in Proc. 16th ACM SIGKDD Int. Conf. Knowl. Discovery Data Mining, 2010, pp. 163–172.
[4] N. Shah, D. Koutra, T. Zou, B. Gallagher, and C. Faloutsos, “TimeCrunch: Interpretable dynamic graph summarization,” in Proc. 21th ACM SIGKDD Int. Conf. Knowl. Discovery Data Mining, 2015, pp. 1055–1064.
[5] S. Shih, “Vog: Summarizing and understanding large graphs,” in Proc. SIAM Int. Conf. Data Mining, 2014, pp. 91–99.
[6] J. Ferlez, C. Faloutsos, J. Leskovec, D. Mladenic, and M. Grobelnik, “Monitoring network evolution using MDL,” in Proc. IEEE 24th Int. Conf. Data Eng., 2008, pp. 1328–1330.
[7] Q. Qu, S. Liu, C. S. Jensen, F. Zhu, and C. Faloutsos, “Interestingness-driven diffusion process summarization in dynamic networks,” in Joint European Conference on Machine Learning and Knowledge Discovery in Databases. Berlin, Germany: Springer, 2014, pp. 597–613.
[8] R. M. Anderson, R. M. May, and B. Anderson, Infectious Diseases of Humans: Dynamics and Control. Hoboken, NJ, USA: Wiley Online Library, 1992.
[9] D. Kempe, J. Kleinberg, and _E. Tardos, “Maximizing the spread of influence through a social network,” in Proc. 9th ACM SIGKDD Int. Conf. Knowl. Discovery Data Mining, 2003, pp. 137–146.
[10] M. Purohit, B. A. Prakash, C. Kang, Y. Zhang, and V. Subrahmanian, “Fast influence-based coarsening for large networks,” in Proc. 20th ACM SIGKDD Int. Conf. Knowl. Discovery Data Mining, 2014, pp. 1296–1305.
[11] K. Henderson, et al., “Metric forensics: A multi-level approach for mining volatile graphs,” in Proc. 16th ACM SIGKDD Int. Conf. Knowl. Discovery Data Mining, 2010, pp. 163–172.
[12] M. Mathioudakis, F. Bonchi, C. Castillo, A. Gionis, and A. Ukkonen, “Sparsification of influence networks,” in Proc. 17th ACM SIGKDD Int. Conf. Knowl. Discovery DataMining, 2011, pp. 529–537.
[13] B. A. Prakash, D. Chakrabarti, N. C. Valler, M. Faloutsos, and C. Faloutsos, “Threshold conditions for arbitrary cascade models on arbitrary networks,” Knowl. Inform. Syst., vol. 33, no. 3, pp. 549–575, 2012.
[14] G. Li, M. Semerci, B. Yener, and M. J. Zaki, “Effective graph classification based on topological and label attributes,” Statistical Anal. Data Mining, vol. 5, no. 4, pp. 265–283, 2012.
[15] D. Salvi, J. Zhou, J. Waggoner, and S. Wang, “Handwritten text segmentation using average longest path algorithm,” in Proc. IEEE Workshop Appl. Comput. Vis., 2013, pp. 505–512.