Fuzzy-folded Bloom Filter-as-a-Service for Big Data Storage in the Cloud

Abstract:

With the ongoing trend of smart and Internet-connected objects being deployed across a broad range of applications, there is also a corresponding increase in the amount of data movement across different geographical regions. This, in turn, poses a number of challenges with respect to big data storage across multiple locations, including cloud computing platform. For example, the underlying distributed file system has a large number of directories and files in the form of gigantic trees, which are difficult to parse in polynomial time. Moreover, with the exponential increase of (big) data streams (i.e. unbounded sets of continuous data flows), challenges associated with indexing and membership queries are compounded. The capability to process such significant amount of data with high accuracy can have significant impact on decision-making and formulation of business and risk-related strategies, particularly in our current Industrial Internet of Things environment (IIoT). However, existing storage solutions are deterministic in nature. In other words, they tend to consume considerable memory and CPU time to yield accurate results. This necessitates the design of efficient quality of service (QoS)-aware IIoT applications that are able to deal with the challenges of data storage and retrieval in the cloud computing environment. In this paper, we present an effective space-effective strategy for massive data storage using bloom filter (BF). Specifically, in the proposed scheme, the standard BF is extended to incorporate fuzzy-enabled folding approach, hereafter referred to as Fuzzy Folded BF (FFBF). In FFBF, fuzzy operations are used to accommodate the hashed data of one BF into another to reduce storage requirements. Evaluations on UCI ML AReM and Facebook datasets demonstrate the efficacy of FFBF, in terms of dealing with approximately 1.9 times more data as compared to using the standard BF. This is also achieved without affecting the false positive rate and query time.

Existing System:

BFs is that query complexity increases as the size grows. Initial size of filter is an important factor in dynamic BFs as the small initial sized array may lead to computational overhead, slice addition and query complexity overhead. On the other hand, a larger initial dynamic BF size may result in memory wastage. Further, streaming applications, such as-approximate cache, duplicate detection, and membership query, require one-pass processing of data. In such applications, results are required within a stipulated time-bound. Thus, to serve this purpose, BF size should be small and constant to be optimally mapped with cache. In order to accommodate new data, some data needs to be deleted from the BF. Thus, staling of data is required to manage the trade-off between false positives and false negatives [21].

Proposed System:

We propose a novel technique of compression of two BFs into one filter without losing any data. The proposed approach uses fuzzy logic to store data optimally and efficiently use the storage capacity: Compression of two BFs into one BF using fuzzy fold operation, wherein large number of elements are accommodated in a single BF of size m. Slow decay of data which allows streaming data to reside in memory for substantial amount of time. Efficient and optimal utilization of storage space without any loss of accuracy.  Significant reduction in computational cost by leveraging double hashing to compute the k hash functions. False positives in the proposed FFBF are not affected by the use of compression operation.

CONCLUSION:

IIoT is likely to be increasingly the norm in our society, particularly in our critical infrastructure sectors such as the Chemical Sector, the Commercial Facilities Sector, the Communications Sector, the Critical Manufacturing Sector, the Dams Sector, the Defense Industrial Base Sector, the Emergency Services Sector, the Energy Sector, the Food and Agriculture Sector, the Government Facilities Sector, and so on. IIoT also has applications in a conflict and adversarial environment such as Industrial Internet of Military Things. Hence, there is a pressing need to address some of the existing challenges, including the challenge we were seeking to address in this paper. Specifically in this paper, our proposed filter uses a novel fuzzy based technique to resolve the space requirement problem in BF. We demonstrated that the proposed approach can accommodate a higher number of elements in the same space, as compared to SBF. The cost of folding and operations associated with it is almost negligible because the proposed filter only contains simple fuzzy operation on binary sets. The false positive rate in compressed, and representation remains the same as that of the standard BF. The computational time in hashing is also significantly reduced due to the use of double hashing technique, since it uses only two hash functions to generate k hash functions. The query complexity of FFBF is dependent on the number of blocks in which BF is divided. Searching an element from a m sized BF and same sized compressed representation remains unchanged (i.e., O(k)). Findings from our evaluations using both UCI ML AReM and Facebook datasets also demonstrated the efficiency of FFBF.

REFERENCES

[1] A. Rajaraman and J. D. Ullman, Mining of Massive Datasets. New York, NY, USA: Cambridge University Press, 2011.

[2] S. Al-Rubaye, E. Kadhum, Q. Ni, and A. Anpalagan, “Industrial Internet of Things Driven by SDN Platform for Smart Grid Resiliency,” IEEE Internet of Things Journal, 2017.

[3] S. Mumtaz, A. Alsohaily, Z. Pang, A. Rayes, K. F. Tsang, and J. Rodriguez, “Massive Internet of Things for Industrial Applications: Addressing Wireless IIoT Connectivity Challenges and Ecosystem Fragmentation,” IEEE Industrial Electronics Magazine, vol. 11, no. 1, pp. 28–33, 2017.

[4] L. Jiang, L. D. Xu, H. Cai, Z. Jiang, F. Bu, and B. Xu, “An IoTOriented Data Storage Framework in Cloud Computing Platform,” IEEE Transactions on Industrial Informatics, vol. 10, no. 2, pp. 1443–1451, May 2014.

[5] F. Tao, J. Cheng, and Q. Qi, “IIHub: an Industrial Internetof-Things Hub Towards Smart Manufacturing Based on CyberPhysical System,” IEEE Transactions on Industrial Informatics, 2017.

[6] A. R. Sfar, E. Natalizio, Y. Challal, and Z. Chtourou, “A roadmap for security challenges in the internet of things,” Digital Communications and Networks, 2017.

[7] “Gartner says a thirty-fold increase in internet-connected physical devices by 2020 will significantly alter how the supply chain operates,” Gartner, Mar. 2014, [Accessed on: Oct 2017]. [Online]. Available: {http://www.gartner.com/newsroom/id/2688717}

[8] A. Velosa, “Internet of things — architecture remains a core opportunity and challenge: A gartner trend insight report,” Gartner, vol. G00317007, 2017.

[9] “Big data and cloud computing-challenges and opportunities,” Big Data Made Simple, Jun. 2017, [Accessed on: Mar. 2018]. [Online]. Available: http://bigdata-madesimple.com/ big-data-and-cloud-computing-challenges-and-opportunities/

[10] X. Liu, R. Deng, K.-K. R. Choo, Y. Yang, and H. Pang, “Privacypreserving outsourced calculation toolkit in the cloud,” IEEE Transactions on Dependable and Secure Computing, 2018.

[11] S. Kaisler, F. Armour, J. A. Espinosa, and W. Money, “Big data: issues and challenges moving forward,” in System Sciences (HICSS), 2013 46th Hawaii International Conference on. IEEE, 2013, pp. 995– 1004.

[12] A. Broder and M. Mitzenmacher, “Network applications of bloom filters: A survey,” Internet mathematics, vol. 1, no. 4, pp. 485–509, 2004.

[13] S. Tarkoma, C. E. Rothenberg, and E. Lagerspetz, “Theory and Practice of Bloom Filters for Distributed Systems,” IEEE Communications Surveys Tutorials, vol. 14, no. 1, pp. 131–155, First 2012.

[14] “What are the best applications of bloom filters?” https://www.quora.com/What-are-the-best-applications-ofBloom-filters, [Online].