Comparative Analysis of Data Mining Classification Algorithms in Type-2 Diabetes Prediction Data Using WEKA Approach

Kawsar Ahmed, Tasnuba Jesmin



The goal of this paper discusses about different types of data mining classification algorithms accuracies that are widely used to extract significant knowledge from huge amounts of data. Here illustrate 20 classifications of supervised data mining algorithms base on type-2 diabetes disease dataset perspective to Bangladeshi populations. In this paper we compare 20 classification algorithms by measuring accuracies, speed and robustness of those algorithms using WEKA toolkit version 3.6.5. Accuracies of classification algorithms are measured in 3 cases like Total Training data set, 10 fold Cross Validation and Percentage Split (66% taken). Speed (CPU Execution Time) and error rate also measured as like as accuracy. Firstly checked top perform algorithms that have best outcome for different cases and then ranked top outcomes algorithms. Finally ranked best 5 algorithms among 20 algorithms based on their accuracies.


Accuracy; Classification Algorithms; Confusion Matrix; Data Mining; Error Rate; Type-2 Diabetes in Bangladesh; WEKA toolkit

Full Text:



K. Ahmed, T. Jesmin, U. Fatima, Md. M., Abdullah-al-E., Md. Z. Rahman. (2012). Intelligent and Effective Diabetes Prediction System Using Data Mining Approach. ORIENTAL JOURNAL OF COMPUTER SCIENCE & TECHNOLOGY, 5(1):215-21;

Unwin N, Whiting D, Gan D, Jacqmain O, Ghyoot G, editors. (2009). IDF Diabetes Atlas, 4th ed. Brussels: International Diabetes Federation.

Frawley and Piatetsky-Shapiro. (1996). Knowledge Discovery in Databases: An Overview. The AAAI/MIT Press, Menlo Park, C.A.

Hian Chye Koh and Gerald Tan. (2011). Data Mining Applications in Healthcare. Journal of Healthcare Information Management, 19 (2): 64-72;

Ian H. Witten and Eibe Frank. (2005). Data Mining Practical Machine Learning Tools and Techniques. 2nd Edition, Series Editor: Jim Gray, Microsoft Research, Elsevier.

Gama, J.. (2004). Functional trees, Machine Learning, 55(3):219–250;

Ian H. Witten, Eibe Frank and Mark A. Hall. (2011). Data Mining Practical Machine Learning Tools and Techniques. 3rd Edition, Elsevier.

Holmes, G., and C. G. Nevill-Manning. (1995). Feature selection via the discovery of simple classification rules. Proceedings of the International Symposium on Intelligent Data Analysis, pp:75–79.

Cohen, W. W. (1995). Fast effective rule induction. Proceedings of the Twelfth International Conference on Machine Learning, Tahoe City, CA. San Francisco: Morgan Kaufmann, pp. 115–123.

Platt, J. (1998). Fast training of support vector machines using sequential minimal optimization. Advances inkernel methods: Support vector learning. Cambridge, MA: MIT Press.

Keerthi, S. S., S. K. Shevade, C. Bhattacharyya, and K. R. K.Murthy. (2001). Improvements to Platt’s SMO algorithm for SVM classifier design. Neural Computation, 1 (3): 637–649;

. le Cessie, S., and J. C. van Houwelingen. (1992). Ridge estimators in logistic regression. Applied Statistics, 41(1):191–201;

. Demiroz, G., and A. Guvenir.. (1997). Classification by voting feature intervals. Proceedings of the Ninth European Conference on Machine Learning, Prague, Berlin: Springer-Verlag, pp. 85–92.

. Breiman, L.. (1996). Bagging predictors. Machine Learning, 24(2):123–140;

XindongWu, et. al. (2008). Top 10 algorithms in data mining. Knowledge Information System, 14(1):1–37;

Smitha T., V. Sundaram. (2012). COMPARATIVE STUDY OF DATA MINING ALGORITHMS FOR HIGH DIMENSIONAL DATA ANALYSIS. International Journal of Advances in Engineering & Technology, 4(2):173-178;

Trilok Chand Sharma, Manoj Jain. (2013). WEKA Approach for Comparative Study of Classification Algorithm. International Journal of Advanced Research in Computer and Communication Engineering, 2 (4):1995-1931;

Pardeep Kumar, Nitin, Vivek Kumar Sehgal, Durg Singh Chauhan. (2012). A BENCHMARK TO SELECT DATA MINING BASED CLASSIFICATION ALGORITHMS FOR BUSINESS INTELLIGENCE AND DECISION SUPPORT SYSTEMS. International Journal of Data Mining & Knowledge Management Process (IJDKP), 2(5):25-42;

. Gopala Krishna Murthy Nookala, Bharath Kumar Pottumuthu, Nagaraju Orsu, Suresh B. Mudunuri. (2013). Performance Analysis and Evaluation of Different Data Mining Algorithms used for Cancer Classification. International Journal of Advanced Research in Artificial Intelligence, 2(5):49-55;

Araken M Santos, Anne M P Canuto, Antonino Feitosa Neto. (2011). A Comparative Analysis of Classification Methods to Multi-label Tasks in Different Application Domains. International Journal of Computer Information Systems and Industrial Management Applications, 3(1):218-227;

Manpreet Singh, Sonam Sharma, Avinash Kaur. (2013). Performance Analysis of Decision Trees. International Journal of Computer Applications, 71(19):10-14;

V.Karthikeyani, I.Parvin Begum, K.Tajudin and I.Shahina Begam. (2012). Comparative of Data Mining Classification Algorithm (CDMCA) in Diabetes Disease Prediction. International Journal of Computer Applications, 60(12):26-31;

Published by Department of Chemical Engineering University of Diponegoro Semarang
Google Scholar

IJSE  by is licensed under Creative Commons Attribution 3.0 License.