Algorithm Optimization of Anomaly Detection Based on Data Mining

 

Abstract

 

In this paper, firstly two improved algorithm  methods are introduced, namely INFLOF and COF, which  are based on LOF, then the motivation of each algorithm,  the definition of the algorithm and the specific steps of the  algorithm are described respectively. Then through  summarizing LOF, INFLOF and COF it can find out the  intrinsic link between them: INFLOF can solve the problem  of edge misjudgment caused by different density cluster’s  closing to each other in data set, while COF can solve the  problem of outliers, but these kinds of two algorithms are  from different steps to solve the outlier factor. Finally, the  advantages of the these two algorithms are presented, thus  the algorithm of this paper is introduced. Moreover, the  definition of the algorithm, as well as the specific steps of  the algorithm is respectively introduced, besides it also  analyzed the time complexity of algorithm.

 

Existing System

 

At present, data mining has played an  irreplaceable role in all aspects of social life. The  traditional data mining out its focus on the model that  most of the data are concerned with, such as the frequent  pattern and the discovery of association rule, categories of  judgment and description and clustering analysis and so  on, outlier detection is the relatively sparse and isolated  abnormal data mode that is found from massive data.  Since LOF is put forward, many scholars put forward the  improved algorithm, which can be divided into two  aspects: one is to improve the efficiency of outlier  detection, the other is to improve the accuracy of outlier  detection in the complex data distribution. For the former,  it is mainly to remove the class or region which cannot  contain outlier by clustering or partitioning, so as to  reduce the amount of data.

 

Proposed System 

 

In this paper, it will firstly introduce the two main  algorithm methods based on LOF, namely, INLOF and  COF, then putting focus on the proposed improved  algorithms according to the shortcomings of these two  kinds of algorithms, moreover it analyzes the time  complexity of the algorithm, in the next chapter it will  analyze the effectiveness of the proposed algorithm  through the experiment. In this paper, it studies on the  second aspect of the problem, which out its focus on how  to improve the accuracy of outlier detection through  improved definition of outlier factor, so as to make the  data points have outlier factor with higher degree.  Wenetal as well as other people proposed an outlier factor  based on symmetric neighborhood. INFLOF (Influenced  Local Outlier Factor) can define the outlier factor based  on the symmetric neighbor relationship, the higher  INFLO value of the data is, the greater possibility of data  become the outlier points.

 

CONCLUSION 

 

In this paper, an improved local outliers detection  algorithm based on density is proposed. Through having  in-depth analysis on two improved algorithms of outliers  detection algorithm based on density namely, INFLOF  and COF, we can find out their shortcomings, through  integrating the advantages of two algorithms, an  improved algorithm is proposed in this paper, thus the  algorithm and specific steps are given, moreover it also  analyzes the time complexity of the algorithm in this  paper.

 

REFERENCES 

 

[1] Sun Huanliang,Bao Yubin,Yu Ge,et al.Analgorithm based on  partition for outlier detection [J].Journal of  sofeware,2006,17(5):1009-1016.

[2] Breunig M,Kriegel H,Ng R,et al.lof:Identifying density-based  local outliers[C]//Proc.SIGMOD Conf.IEEE,2000:93-104.

[3] Knorr M E,Ng R T,Tucakov V.Distance-based outliers:  Algorithms and applications [J].The VLDB  Journal,2000,8(3-4) :237-253.

[4] Ester M,Kriegel H,Sander J,et al. A density-based algorithm for  discovering clusters in large spatial databases with  noise[C]//Proceeding of the 2nd International Conference on  Knowledge Discovery and Data Mining.1996: 226-231.  .