Online Internet Traffic Monitoring System Using Spark Streaming

 

Abstract

Owing to the explosive growth of Internet traffic, network operators must be able to monitor the entire  network situation and efficiently manage their network resources. Traditional network analysis methods that usually  work on a single machine are no longer suitable for huge traffic data owing to their poor processing ability. Big data  frameworks, such as Hadoop and Spark, can handle such analysis jobs even for a large amount of network traffic.  However, Hadoop and Spark are inherently designed for offline data analysis. To cope with streaming data, various  stream-processing-based frameworks have been proposed, such as Storm, Flink, and Spark Streaming. In this  study, we propose an online Internet traffic monitoring system based on Spark Streaming. The system comprises  three parts, namely, the collector, messaging system, and stream processor. We considered the TCP performance  monitoring as a special use case of showing how network monitoring can be performed with our proposed system.  We conducted typical experiments with a cluster in standalone mode, which showed that our system performs well  for large Internet traffic measurement and monitoring.

Existing System 

Cyberspace is dynamical and vulnerable to attacks.  Therefore, it requires network providers to monitor the  status of their network in real time. Online Internet  traffic monitoring technologies have been extensively  studied. In 1999, Paxson[16] proposed the Bro system to  detect network intruders in real time. Bro first captured  a packet stream using libpcap and then reduced the  incoming stream into a series of higher-level events  using an event engine. They also proposed a custom  scripting language called Bro scripts, which can be  executed by the policy script interpreter to deal with  events. Although Bro is single threaded, it can be set  up in a high throughput cluster environment. Similar  studies include Snort and Suricata, which are  inherently based on single-machine computing.  Various related studies have been conducted on  online Internet traffic measurement and monitoring  using Spark. Gupta et al. used Spark Streaming  to analyze the network in real time.

Proposed System 

In this study, we propose an online Internet traffic monitoring system based on Spark Streaming, which  is a big data platform that can efficiently process a  huge amount of traffic data so that we can monitor the  network status in real time and is robust enough so as to  suffer a failure without aborting the entire monitoring  process. Big data platforms, such as Hadoop and Spark,  provide an efficient way of processing a huge amount  of data. For example, the MapReduce model and  its open-source version, Hadoop, have been widely  adopted by the big data analytics community due to  their simplicity and ease of programming. Our contributions in this study are as follows:  _ We propose a distributed architecture as an online  Internet traffic measurement and monitoring  system.  _ We implement a parallel algorithm for monitoring  TCP performance parameters, such as delay and  retransmission ratio with a very short delay.  _ We conduct typical experiments showing that the  proposed system is feasible and efficient.

 

Conclusion  

With the growth of Internet traffic, traditional network  analysis methods that work on single machines are no longer suitable. Existing approaches take advantage of  big data frameworks to improve processing efficiency.  However, these approaches mainly focus on offline  data analysis. In this study, we proposed an online  Internet traffic monitoring system that utilizes Spark  Streaming. We demonstrated that Internet measurement  and monitoring can be treated as a stream analysis  problem and can be handled via a streaming processing  platform. Extensive experimental results show that our  system achieved good performance and robustness.  In future, we will implement collectors to capture  packets from switches through port mirroring so  that our system can analyze all the traffics passing  through monitored networks. Finally, we will test  its performance in practice and compare it with some  traditional single server systems in terms of scalability  and reliability.

References 

[1] Cisco Visual Networking Index, Forecast and  methodology, 2016-2021, White Paper, San Jose,  CA, USA: Cisco, 2016.

[2] Y. Lee, W. Kang, and H. Son, An Internet traffic analysis  method with MapReduce, in Proc. 2010 IEEE/IFIP  Network Operations and Management Symposium  Workshops (NOMS Wksps), Osaka, Japan, 2010, pp.  357–361.

[3] D. Brauckhoff, B. Tellenbach, A. Wagner, M. May, and A. Lakhina, Impact of packet sampling on anomaly  detection metrics, in Proc. 6th ACM SIGCOMM Conf. IntMeasurement, Rio de Janeriro, Brazil, 2006, pp. 159–164.

[4] Y. Y. Qiao, Z. M. Lei, L. Yuan, and M. J. Guo, Offline  traffic analysis system based on Hadoop, J . China UnivPosts Telecommun., vol. 20, no. 5, pp. 97–103, 2013.

[5] Hadoop, http://hadoop.apache.org/, 2017

[6] K. Kambatla, G. Kollias, V. Kumar, and A. Grama, Trends  in big data analytics, J . Parallel Distrib. Comput., vol. 74,  no. 7, pp. 2561–2573, 2014.

[7] Apache Spark, http://spark.apache.org/, 2017.

[8] M. Zaharia, M. Chowdhury, M. J. Franklin, S. Shenker, and  I. Stoica, Spark: Cluster computing with working sets, in  Proc. 2nd USENIX Conf. Hot Topics in Cloud Computing,  Boston, MA, USA, 2010, p. 10.

[9] J. Liu, F. Liu, and N. Ansari, Monitoring and analyzing big  traffic data of a large-scale cellular network with Hadoop,  IEEE Netw., vol. 28, no. 4, pp. 32–39, 2014.

[10] Y. Lee and Y. Lee, Toward scalable internet traffic  measurement and analysis with Hadoop, ACM SIGCOMM  Comput. Commun. Rev., vol. 43, no. 1, pp. 5–13, 2013.