Finding Top-k Dominance on Incomplete Big Data Using MapReduce Framework
Abstract:
Incomplete data is one major kind of multi-dimensional dataset that has random-distributed missing nodes in its dimensions. It is very difficult to retrieve information from this type of dataset when it becomes large. Finding top-k dominant values in this type of dataset is a challenging procedure. Some algorithms are present to enhance this process, but most are efficient only when dealing with small incomplete data. One of the algorithms that make the application of TKD query possible is the Bitmap Index Guided (BIG) algorithm. This algorithm greatly improves the performance for incomplete data, but it is not designed to find top-k dominant values in incomplete big data. Several other algorithms have been proposed to find the TKD query, such as Skyband Based and Upper Bound Based algorithms, but their performance is also questionable. Algorithms developed previously were a the first attempts to apply TKD query on incomplete data; however, these algorithms suffered from weak performance. This paper proposes MapReduced Enhanced Bitmap Index Guided Algorithm (MRBIG) for dealing with the aforementioned issues. MRBIG uses the MapReduce framework to enhance the performance of applying top-k dominance queries on large incomplete datasets. The proposed approach uses the MapReduce parallel computing approach involving multiple computing nodes. The framework separates the tasks between several computing nodes to independently and simultaneously work to find the result. This method has achieved up to two times faster processing time in finding the TKD query result when compared to previously proposed algorithms.
Existing System:
the first and easiest approach that might come to mind is to compare each two items in the dataset individually. This naive approach implies the pair-wise comparison between the values of different objects in the same dimension. In this approach, a comparison is required for every two values existing in the dataset. The observed dominant value in each comparison can be used to find the final top-k dominant values later. Indicating the top-k dominant values can be challenging when facing big data; processing time can become astronomical even if the computational cost can be ignored. Therefore, this approach may not be the best way of solving this problem.
Proposed System:
to apply the TKD queries with performance more acceptable and efficient than the naive approach. Based on [2], [3], applying the top-k dominance query is possible in the incomplete dataset using Skyline query processing. The Skyline algorithm separates the uniform values into different buckets. The buckets group the dataset by dividing it into chunks that have same missing-value dimension. The buckets are easier and faster to process. By having separate top-k dominant values for each bucket and combining them together, determining the top-k values of the whole dataset becomes possible.
Conclusion :
we proposed an algorithm to apply top-k dominating queries using MapReduce framework on incomplete big data. MapReduced Enhanced Bitmap Indexed Guided algorithm (MRBIG) is the basis of the work that develops a new way to handle large incomplete data and uses the MapReduce framework to enable parallel computing to manage the problem faster. the single machine algorithm has been detailed, compared, and contrasted with the MRBIG algorithm. Based on the experiments, the single machine algorithm cannot be an optimal way for applying TKD queries on big files. Not being resource-efficient, process failure due to resource insufficiency, and having exponential processing time are a the major defects when it comes to finding top-k dominant values in massive incomplete data.
REFERENCES
[1] Taxicab Fact Book, NYC Taxi Limousine Commission, New York, NY, USA, 2014.
[2] Greenhouse Gases Equivalences Calculator, USA Environ. Protection Agency, Washington, DC, USA, 2017.
[3] After Hong Kong Failure, China’s BYD Joins Singapore Launch, South China Morning Post, Hong Kong, 2017.
[4] E. Wilhelm et al., “Cloudthink: A scalable secure platform for mirroring transportation systems in the cloud,” Transport, vol. 30, no. 3, pp. 320–329, 2015.
[5] D. Zhang et al., “Understanding taxi service strategies from taxi GPS traces,” IEEE Trans. Intell. Transp. Syst., vol. 16, no. 1, pp. 123–135, Feb. 2014.
[6] S. Liu, Y. Yue, and R. Krishnan, “Non-myopic adaptive route planning in uncertain congestion environments,” IEEE Trans. Knowl. Data Eng., vol. 27, no. 9, pp. 2438–2451, Sep. 2015.
[7] S. Liu and S. Wang, “Trajectory community discovery and recommendation by multi-source diffusion modeling,” IEEE Trans. Knowl. Data Eng., vol. 29, no. 4, pp. 898–911, Apr. 2017.
[8] J. Zhao, Q. Qu, F. Zhang, C. Xu, and S. Liu, “Spatio-temporal analysis of passenger travel patterns in massive smart card data,” IEEE Trans. Intell. Transp. Syst., vol. 18, no. 11, pp. 3135–3146, Nov. 2017.
[9] J. Yuan, Y. Zheng, L. Zhang, X. Xie, and G. Sun, “Where to find my next passenger,” in Proc. ACM Int. Conf. Ubiquitous Comput. (UbiComp), 2011, pp. 109–118.
[10] M. Qu, H. Zhu, J. Liu, G. Liu, and H. Xiong, “A cost-effective recommender system for taxi drivers,” in Proc. ACM SIGKDD Int. Conf. Knowl. Discovery Data Mining (KDD), 2014, pp. 45–54.
[11] H. Rong, X. Zhou, C. Yang, Z. Shafiq, and A. Liu, “The rich and the poor: A Markov decision process approach to optimizing taxi driver revenue efficiency,” in Proc. ACM Int. Conf. Inf. Knowl. Manage. (CIKM), 2016, pp. 2329–2334.
[12] C.-M. Tseng and C.-K. Chau, “Viability analysis of electric taxis using New York city dataset,” in Proc. ACM e-Energy 8th Int. Conf. Future Energy Syst., 2017, pp. 328–333.
[13] C.-M. Tseng and C.-K. Chau, “Personalized prediction of vehicle energy consumption based on participatory sensing,” IEEE Trans. Intell. Transp. Syst., vol. 18, no. 11, pp. 3103–3113, Nov. 2017.
[14] C.-K. Chau, K. Elbassioni, and C.-M. Tseng, “Drive mode optimization and path planning for plug-in hybrid electric vehicles,” IEEE Trans. Intell. Transp. Syst., vol. 18, no. 12, pp. 3421–3432, Dec. 2017.
[15] T. Carpenter, A. R. Curtis, and S. Keshav, “The return on investment for taxi companies transitioning to electric vehicles—A case study in San Francisco,” Transportation, vol. 41, pp. 785–818, Jul. 2014.
[16] Electric Vehicle Charging Stations in New York, New York Government, New York, NY, USA, 2017.
[17] A. Furieri, “The Gaia-SINS federated projects,” Tech. Rep., 2017.
[18] Real-World Range Ramifications: Heating and Air Condition, FleetCarma, Waterloo, ON, Canada, 2014.
[19] C. Bingham, C. Walsh, and S. Carroll, “Impact of driving characteristics on electric vehicle energy consumption and range,” IET Intell. Transp. Syst., vol. 6, no. 1, pp. 29–35, 2012.
[20] L. Feng and B. Chen, “Study the impact of driver’s behavior on the energy efficiency of hybrid electric vehicles,” in Proc. ASME Int. Design Eng. Tech. Conf., 2013.