**OUTLIERS DISCOVERY FROM SMART METERS DATA USING ASTATISTICAL BASED DATA MINING APPROACH**

** **

*ABSTRACT*

**The paper presents a statistical approach used fordetection of outliers from load curves recorded on the electricsubstation of distribution networks. The load curves provided bysmart meters were processed and their main indicators werecalculated. By outliers elimination, the remaining data have ledto the discovery of accurate patterns that characterized verywell the load curves characteristics through indicators. Theproposed approach was tested using a real database with 60substations from a rural area. With the help of these patterns,the operation and planning of electric distribution systems couldbe made more efficient.**

**EXISTING SYSTEM:**

The analysis of data represents the starting point for manyapplications, in the design or operation phase for onlinecontrol or complex processes. Nowadays, the need to extendthe capabilities of human analysis for handling theoverwhelming quantity of data that we are able to collect hasbecome increasingly necessary. Since computers haveenabled the possibility of storing large amounts of data, it isonly natural to resort to computational techniques to help usdiscover meaningful patterns and massive structures volume data.The load curve plays a fundamental role in the operation and planning of the power systems. Unfortunately, due tovarious random factors, the load curves always containabnormal, deviation, unrepresentative, noisy, strange,anomalous and missing data. It is fundamental for powersystems operators to detect and repair anomalous or abnormaldata before the use of load curve in planning and modellingprocess. The extraction of useful load curve information (loadfactor, loss duration, loss factor, fill factor, etc.) from some large databases, the clustering or statistical techniques can beused.

** **

** **

**PROPOSED SYSTEM:**

in the paper it proposes an approach fordetection of outliers using two stages. In first step, a datamining technique for extraction of the load curves mainindicators computed with information provided by SmartMeters was used. The second stage coincide with thedetection of outliers from the above computed indicators,using a statistical processing. The proposed approach was tested using a real databasewith 60 substations that served rural area. The resultshighlight the ability of proposed approach to be efficientlyused by distribution operators in decision making. Thus, inthe operation and planning of the distribution systems can behighly useful the characteristic information obtained with thehelp of data mining techniques from the load curves providedby the smart meters.

**CONCLUSIONS**

The paper presents a comprehensive method that use a statistical based data mining for load curves characterizationby detection of outliers using information provided by SmartMeters in real distribution networks. A database with 60 ruralsubstations was tested. The results demonstrate by outlierseliminating, the ability of the proposed approach to beefficiently used by distribution operators in accurate patternsdiscovery of the load curves characteristics with the help ofwhich the operation and planning of power systems, could bemade. The load curve characteristic information provided bya large database from smart meters by data mining techniquesare useful both for distribution operators and users. From theanalysis of results, it can be observed that there are somecharacteristics with a more important influence on thedetection of the outliers using statistical approach i.e.maximum load duration (ML), load factor (T), loss duration(LD), loss factor (LS), irregularity factor (I), fill factor (K).Thus, the planning of the development of the distributionsubstations knowing only few indicators of the load curvescould simplify the work of distribution operators.

**REFERENCES **

[1] G. Hebrail, “Practical Data Mining in Large Utility Company”,www.upcommons.upc.edu/revistes/bitstream/2099/4160/4/article. pdf.

[2] G. Grigoras, Fl. Scarlatache, B.C. Neagu, “Clustering in powersystems. Applications”, LAP LAMBERT Academic Publishing,Saarbrücken, Germany, 2016.

[3] Z. Guo, “X-outlier detection and periodicity detection in load curvedata in power systems.” Diss. Thesis in Applied Science: School ofComputing Science, 2011.

[4] R. Godina, E. M. G. Rodrigues, J. C. O. Matias, J. P. S. Catalão,“Effect of Loads and Other Key Factors on Oil-Transformer Ageing:Sustainability Benefits and Challenges”, in *Energies*, vol. 8, no. 10, pp.12147-12186, 2015.

[5] W. Chen, K. Zhou, S. Yang, C. Wu, “Data quality of electricityconsumption data in a smart grid environment”, *Renew. Sustain.Energy Rev.*, 2017, in press.

[6] E. Rakhshani, I. Sariri, K. Rouzbehi, “Application of data mining onfault detection and prediction in boiler of power plant using artificialneural network”. *Int. J. Electr. Power Energy Syst*., pp. 473-478, 2009.

[7] S. Cateni, V. Colla, M. Vannucci, “A fuzzy logic-based method foroutliers detection.” in Artif. Intellig. and App., pp. 605–610, 2007.

[8] A. Loureiro, L. Torgo, C. Soares, “Outlier Detection using ClusteringMethods: a Data Cleaning Application”, in Proc. of KDNet Symp. onKnowledge-based Syst. for the Public Sector, Bonn, Germany, 2004.

[9] S. Kiware, “Detection of outliers in time series data.” Master’s thesis,Marquette University, Department of Mathematics, Statistics andComputer Science, Milwaukee, March 2010.

[10] B. Neagu, G. Georgescu, M. Gusa, “Load Curves Characteristics ofConsumers Supplied From Electricity Repartition and DistributionPublic Systems”, Buletinul Institutului Politehnic din Iasi Tomul LVII(LXI) Fasc 1, pp. 141-157, 2011.

[11] W. J. Frawley, G. Piatetsky-Sh