Privacy preserving big data mining: association rule hiding using fuzzy logic approach
Abstract
Recently, privacy preserving data mining has been studied widely. Association rule mining can cause potential threat toward privacy of data. So, association rule hiding techniques are employed to avoid the risk of sensitive knowledge leakage. Many researches have been done on association rule hiding, but most of them focus on proposing algorithms with least side effect for static databases (with no new data entrance), while now the authors confront with streaming data which are continuous data. Furthermore, in the age of big data, it is necessary to optimise existing methods to be executable for large volume of data. In this study, data anonymisation is used to fit the proposed model for big data mining. Besides, special features of big data such as velocity make it necessary to consider each rule as a sensitive association rule with an appropriate membership degree. Furthermore, parallelisation techniques which are embedded in the proposed model, can help to speed up data mining process.
Exiting System
In terms of definition, big data refers to high volume of structured, semi-structured and unstructured data with high velocity which can be mined for information . Big data mining refers to the capability of extracting information from massive datasets that due to specific features cannot be done using existing data mining techniques. In many situations, it is infeasible to store this huge amount of data, so the knowledge extraction should be done real time. Processing big data needs a cluster of computers with high computing performance and this framework would be practical with parallel programming models such as MapReduce . Owing to the novel features of big data such as high volume and variety in data structures, essential updates should be considered in mentioned techniques to satisfy related requirements. In this model, generalisation technique is used for anonymity, while suppression technique is not suitable for quantity data and randomisation technique imposes significant overhead to systems.
Proposed System
In this research, in order to hide sensitive association rules in big data mining, instead of removing repeated instance of sensitive association rules, anonymisation methods are used to hide sensitive rules. By doing this, besides hiding sensitive information, undesired side effect of deleting frequent item-sets (ISs) on new entrance data, would be removed. To make this approach suitable for big data analysing, parallelisation and scalability features are considered, too. Sensitive degree of each association rule is determined using appropriate membership functions and anonymisation would be done based on it. Although, many researches have been done in association rule hiding, there are significant drawbacks in most of them: • Boolean logic (versus fuzzy logic) approach to determine whether the association rule is sensitive or not. • Undesired side effect of hiding sensitive association rules on non-sensitive rules. • The impossibility of using in big data analysis.
Conclusion
Association rule mining, besides its benefits in discovering unclear relationships between data, will result privacy violation. Association rule hiding can help to protect sensitive association rules to be discovered. Many different techniques have been considered to hide certain association rules, but most of them try to select ISs in order to decrease the confidence value to be less than the defined threshold. None of existing approaches can be executed in a parallel and scalable manner, to be appropriate for big data mining. Besides, removing ISs from the database can cause serious information loss as new data stream arrive. In this research, new big data association rule hiding technique is presented, which uses fuzzy logic approach, tries to decrease undesired side effect of sensitive rule hiding on non-sensitive rules in data streams. Features such as parallelism and scalability are embedded in the proposed model to provide the facility of implementing this model for huge volume of data. Results show that the proposed model can be more effective in big data mining than existing rule hiding approaches. As future work, we will try to decrease undesired side effect of the proposed model to gain less information loss.
References
[1] Philip, C.L.C., Zh, C.-Y.: ‘Data-intensive applications, challenges, techniques and technologies: a survey on big data’, Inf. Sci., 2014, 275, pp. 314–347
[2] Ohbyung, K., Namyeon, L., Bongsik, S.: ‘Data quality management, data usage experience and acquisition intention of big data analytics’, Int. J. Inf. Manage., 2014, 34, (3), pp. 387–394
[3] Alfredo, C., Carson, K.S.L., Richard, K.M.: ‘Mining constrained frequent item-sets from distributed uncertain data’, Future Gener. Comput. Syst., 2014, 37, pp. 117–126
[4] Xuyun, Z., Chang, L., Surya, N.S., et al.: ‘A hybrid approach for scalable subtree anonymization over big data using MapReduce on cloud’, J. Comput. Syst. Sci., 2014, 80, (5), pp. 1008–1020
[5] Yaping, L., Minghua, C., Qiwei, L., et al.: ‘Enabling multilevel trust in privacy preserving data mining’, IEEE Trans. Knowl. Data Eng., 2012, 24, (9), pp. 1589–1612
[6] Yi-Huang, W., Chia-Ming, C., Arbee, L.P.C.: ‘Hiding sensitive association rules with limited side effects’, IEEE Trans. Knowl. Data Eng., 2007, 19, (1), pp. 29–42
[7] Aris, G.D., Vassilios, S.V.: ‘Exact knowledge hiding through database extension’, IEEE Trans. Knowl. Data Eng., 2009, 21, (5), pp. 699–713
[8] Hai, Q.C., Somjit, A.I., Huy, X.N., et al.: ‘Association rule hiding in risk management for retail supply chain collaboration’, Comput. Ind., 2013, 64, (4), pp. 776–784
[9] Yu-Chiang, L., Jieh-Shan, Y., Chin-Chen, C.: ‘MCIF: an effective sanitization algorithm for hiding sensitive patterns on data mining’, Adv. Eng. Inf., 2007, 21, (3), pp. 269–280
[10] Bettahally, N.K., Durga, T., Bhavani, K.E.: ‘Hiding co-occurring prioritized sensitive patterns over distributed progressive sequential data streams’, J. Netw. Comput. Appl., 2012, 35, (3), pp. 1116–1129
[11] Xin, W., Xingquan, Z., Gong-Qing, W., et al.: ‘Data mining with big data’, IEEE Trans. Knowl. Data Eng., 2014, 26, (1), pp. 97–107
[12] Mehmet, E.N., Muhammed, Z.G.: ‘Hybrid K-anonymity’, Comput. Secur., 2014, 44, pp. 51–63
[13] Bing, L., Esra, E., Mehmet, H.G., et al.: ‘An overview of anonymity technology usage’, Comput. Commun., 2013, 36, (12), pp. 1269–1283
[14] Anna, M., Gennady, A., Natalia, A., et al.: ‘Movement data anonymity through generalization’, Trans. Data Priv., 2010, 3, (2), pp. 1–31
[15] Slava, K., Lior, R., Yuval, E., et al.: ‘Efficient multidimensional suppression for K-anonymity’, IEEE Trans. Knowl. Data Eng., 2010, 22, (3), pp. 334–347