Evaluation of Predictive Data Mining  Algorithms in Soil Data Classification for  Optimized Crop Recommendation

Abstract 

Agricultural research has strengthened the  optimized economical profit, internationally and is very  vast and important field to gain more benefits. However,  it can be enhanced by the use of different technological  resources, tool, and procedures. Today, the term data  mining  is an interdisciplinary process of analyzing,  processing and evaluating the real-world datasets and  prediction on the basis of the findings. Our case-based  analysis provides empirical evidence that we can use  different data mining classification algorithms to classify  the dataset of agricultural regions on the basis of soil  properties. Additionally, we have investigated the most  performing algorithm having powerful prediction  accuracy to recommend the best crop for better yield.

Proposed System 

In this research, we intended to understand the related  domain, analyzed the behavior of different data mining  classification algorithms on the soil dataset and evaluating  the most predictive and accurate algorithm. The dataset  has been accumulated from different soil surveys that were  conducted at numerous agricultural areas located in Kasur  District, Punjab, Pakistan.To  maintain a system that can classify the soil in adequate  quantities for best practices. The primary objectives of our  study are:  i) To classify the soil under different agroecological  zones in Kasur district, Punjab,  Pakistan by different classification algorithm  available in data mining.  ii) To recommend the relevant crops depending on  their classification.  iii) To evaluate the performance of predictive  algorithms for better knowledge extraction.

CONCLUSION 

In this study, we have presented the research possibilities  for the classification of soil by using well-known  classification algorithms as J48, BF Tree, and OneR and  Naïve Bayes; in data mining. The experiment was  conducted on data instances from Kasur district, Pakistan.  We have observed the comparative analysis of these  algorithms have the different level of accuracy to  determine the effectiveness and efficiency of predictions.  However, the benefits of the better understanding of soils  classes can improve the productivity in farming, reduce  dependence on fertilizers and create better predictive rules  for the recommendation of the increase in yield. In the  future, we contrive to create a Soil Management and  Recommendation System, which can be utilized  effectively by agriculturist and laboratories for Soil  Testing. This System will help to recommend a suitable  fertilizer and predict for better yield.

REFERENCE 

[1] Goebel, M., and Gruenwald, L. (1999). A survey of data  mining and knowledge discovery software tools. ACM SIGKDD  explorations newsletter, 1 (1), 20-33.

[2] Han, J., Pei, J., & Kamber, M. (2011). Data mining: concepts  and techniques. Elsevier.

[3] Kumar, A., & Kannathasan, N. (2011). A survey on data  mining and pattern recognition techniques for soil data mining.  IJCSI International Journal of Computer Science Issues, 8(3),  1694-0814.

[4] Rokach, L., & Maimon, O. (2008). Data mining with decision  trees: theory and applications

[5] Wahbeh, A. H., Al-Radaideh, Q. A., Al-Kabi, M. N., & Al-  Shawakfa, E. M. (2011). A comparison study between data  mining tools over some classification methods. International  Journal of Advanced Computer Science and Applications, 8(2),  18-26.

[6] Heß, A., Dopichaj, P., & Maaß, C. (2008). Multi-value  classification of very short texts. KI 2008: Advances in Artificial  Intelligence, 70-77.

[7] Zhou, S., Ling, T. W., Guan, J., Hu, J., & Zhou, A. (2003,  March). Fast text classification: a training-corpus pruning based  approach. In Database Systems for Advanced Applications,  2003.(DASFAA 2003). Proceedings. Eighth International  Conference on (pp. 127-136). IEEE.

[8] Li, Y., & Bontcheva, K. (2008). Adapting support vector  machines for f-term-based classification of patents. ACM  Transactions on Asian Language Information Processing  (TALIP), 7(2), 7.

[9] Eiben, A. E., Raue, P. E., & Ruttkay, Z. (1994, October).  Genetic algorithms with multi-parent recombination. In  International Conference on Parallel Problem Solving from  Nature (pp. 78-87). Springer, Berlin, Heidelberg.

[10] Tubiello, F. N., Salvatore, M., Cóndor Golec, R. D., Ferrara,  A., Rossi, S., Biancalani, R., … & Flammini, A. (2014).  Agriculture, forestry and other land use emissions by sources and  removals by sinks. Rome, Italy..

[11] Agriculture Statistics of Pakistan, Pakistan Bureau of  Statistical, Retrieved 10 September, 2016 by  http://www.pbs.gov.pk/content/agriculture-statistics

[12] Doran, J. W., & Parkin, T. B. (1994). Defining and assessing  soil quality. Defining soil quality for a sustainable environment,  (definingsoilqua), 1-21.

[13] Fayyad, U., Piatetsky-Shapiro, G., & Smyth, P. (1996).  From data mining to knowledge discovery in databases. AI  magazine, 17(3), 37.

[14] Crone, S. F., Lessmann, S., & Stahlbock, R. (2006). The  impact of preprocessing on data mining: An evaluation of  classifier sensitivity in direct marketing. European Journal of  Operational Research, 173(3), 781-800.

[15] Larose, D. T. (2014). Discovering knowledge in data: an  introduction to data mining. John Wiley & Sons.