Optimizing Machine Learning Models for Graduation on Time Prediction: A Comparative Study with Resampling and Hyperparameter Tuning

Authors

  • Rizal Bakri Department of Digital Business, Makassar State University, Indonesia
  • Syamsu Alam Department of Digital Business, Makassar State University, Indonesia
  • Niken Probondani Astuti Statistics Research Group, STIEM Bongaya, Makassar, Indonesia
  • Muhammad Ilham Bakhtiar Department of Guidance and Counseling Education, Makassar State University, Indonesia

DOI:

https://doi.org/10.15575/join.v10i2.1590

Keywords:

Educational Data Mining, Graduation on Time, Hyperparameter Tuning, Machine Learning, Resampling Methods

Abstract

Timely graduation prediction is a crucial issue in higher education, especially when academic, demographic, and behavioral factors interact in complex ways. However, many previous studies rely on default machine learning (ML) parameters and fail to consider the class imbalance problem, leading to suboptimal predictions. This study aims to build a comprehensive framework to evaluate the effectiveness of seven ML algorithms, which are AdaBoost, K-Nearest Neighbors, Naïve Bayes, Neural Network, Random Forest, SVM-RBF, and XGBoost, for predicting graduation on time by incorporating five resampling techniques and hyperparameter tuning. Resampling methods include Random Undersampling (RUS), Random Oversampling (ROS), SMOTENC, and two hybrid approaches (RUS-ROS and SMOTENC-RUS). Hyperparameter tuning was conducted using Grid Search, and model performance was evaluated through cross-validation and hold-out methods. The results show that Random Forest combined with RUS-ROS achieved the best performance, with an average metric score of 0.948. Statistical analysis using PERMANOVA (p = 0.009) and Bonferroni's post-hoc pairwise tests confirmed significant differences between certain models. This study contributes to the educational data mining literature by demonstrating that combining resampling and hyperparameter tuning improves classification performance in imbalanced educational datasets.

References

[1] E. Fernandes, M. Holanda, M. Victorino, V. Borges, R. Carvalho, and G. Van Erven, “Educational data mining: Predictive analysis of academic performance of public school students in the capital of Brazil,” J Bus Res, vol. 94, pp. 335–343, Jan. 2019, doi: 10.1016/J.JBUSRES.2018.02.012.

[2] C. Romero and S. Ventura, “Educational data mining and learning analytics: An updated survey,” Wiley Interdiscip Rev Data Min Knowl Discov, vol. 10, no. 3, p. e1355, May 2020, doi: 10.1002/WIDM.1355.

[3] R. S. Baker and P. S. Inventado, “Educational Data Mining and Learning Analytics,” Learning Analytics: From Research to Practice, pp. 61–75, Jan. 2014, doi: 10.1007/978-1-4614-3305-7_4.

[4] A. Peña-Ayala, “Educational data mining: A survey and a data mining-based analysis of recent works,” Expert Syst Appl, vol. 41, no. 4, pp. 1432–1462, Mar. 2014, doi: 10.1016/J.ESWA.2013.08.042.

[5] A. M. Shahiri, W. Husain, and N. A. Rashid, “A Review on Predicting Student’s Performance Using Data Mining Techniques,” Procedia Comput Sci, vol. 72, pp. 414–422, Jan. 2015, doi: 10.1016/J.PROCS.2015.12.157.

[6] J. Stephen Bassi, E. Gbenga Dada, A. Abdulkadir Hamidu, M. Dauda Elijah, and C. Author, “Students Graduation on Time Prediction Model Using Artificial Neural Network,” IOSR Journal of Computer Engineering , vol. 21, no. 3, pp. 28–35, 2019, doi: 10.9790/0661-2103012835.

[7] A. C. Lagman et al., “Embedding naïve bayes algorithm data model in predicting student graduation,” ACM International Conference Proceeding Series, pp. 51–56, Nov. 2019, doi: 10.1145/3369555.3369570.

[8] A. Meiriza, E. Lestari, P. Putra, A. Monaputri, and D. A. Lestari, “Prediction Graduate Student Use Naive Bayes Classifier,” vol. 172, pp. 370–375, May 2020, doi: 10.2991/AISR.K.200424.056.

[9] F. Nuraeni, Y. H. Agustin, S. Rahayu, D. Kurniadi, Y. Septiana, and S. M. Lestari, “Student Study Timeline Prediction Model Using Naïve Bayes Based Forward Selection Feature,” 8th International Conference on ICT for Smart Society: Digital Twin for Smart Society, ICISS 2021 - Proceeding, Aug. 2021, doi: 10.1109/ICISS53185.2021.9532502.

[10] C. P. Kuncoro, “Analysis Of UMN Student Graduation Timeliness Using Supervised Learning Method,” IJNMT (International Journal of New Media Technology), vol. 8, no. 2, pp. 89–95, Feb. 2021, doi: 10.31937/IJNMT.V8I2.2366.

[11] Gunawan, F. Halim, and Djoni, “Students’ Timely Graduation Attributes Prediction Using Feature Selection Techniques, Case Study: Informatics Engineering Bachelor Study Program,” ICOSNIKOM 2022 - 2022 IEEE International Conference of Computer Science and Information Technology: Boundary Free: Preparing Indonesia for Metaverse Society, 2022, doi: 10.1109/ICOSNIKOM56551.2022.10034873.

[12] D. Dikriani, A. Tahta, and I. Karim, “Comparison of C4.5 and Naive Bayes Algorithm Methods in Prediction of Student Graduation on Time (Case Study: Information Systems Study Program),” Journal of Dinda : Data Science, Information Technology, and Data Analytics, vol. 3, no. 1, pp. 40–44, Feb. 2023, doi: 10.20895/DINDA.V3I1.782.

[13] A. Santoso, H. Retnawati, Kartianom, E. Apino, I. Rafi, and M. N. Rosyada, “Predicting Time to Graduation of Open University Students: An Educational Data Mining Study,” Open Education Studies, vol. 6, no. 1, Jan. 2024, doi: 10.1515/EDU-2022-0220/MACHINEREADABLECITATION/RIS.

[14] B. Jia et al., “Prediction for Student Academic Performance Using SMNaive Bayes Model,” Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics), vol. 11888 LNAI, pp. 712–725, 2019, doi: 10.1007/978-3-030-35231-8_52.

[15] A. P. Salim, K. A. Laksitowening, and I. Asror, “Time Series Prediction on College Graduation Using KNN Algorithm,” 2020 8th International Conference on Information and Communication Technology, ICoICT 2020, Jun. 2020, doi: 10.1109/ICOICT49345.2020.9166238.

[16] K. T. Chui, D. C. L. Fung, M. D. Lytras, and T. M. Lam, “Predicting at-risk university students in a virtual learning environment via a machine learning algorithm,” Comput Human Behav, vol. 107, p. 105584, Jun. 2020, doi: 10.1016/J.CHB.2018.06.032.

[17] S. Noviaristanti, G. Ramantoko, A. T. Hadi, and A. Inayati, “Predictive Model of Student Academic Performance in Private Higher Education Institution (Case in Undergraduate Management Program),” 2022 International Conference on Data Science and Its Applications, ICoDSA 2022, pp. 262–267, 2022, doi: 10.1109/ICODSA55874.2022.9862822.

[18] Y. Crismayella, N. Satyahadewi, and H. Perdana, “Comparison of Adaboost Application to C4.5 and C5.0 Algorithms in Student Graduation Classification,” Pattimura International Journal of Mathematics (PIJMath), vol. 2, no. 1, pp. 07–16, Apr. 2023, doi: 10.30598/PIJMATHVOL2ISS1PP07-16.

[19] H. Altabrawee, O. Abdul, J. Ali, and Q. Ajmi, “Predicting Students’ Performance Using Machine Learning Techniques,” JOURNAL OF UNIVERSITY OF BABYLON for Pure and Applied Sciences, vol. 27, no. 1, pp. 194–205, Apr. 2019, doi: 10.29196/JUBPAS.V27I1.2108.

[20] N. Mohammad Suhaimi, S. Abdul-Rahman, S. Mutalib, N. H. Abdul Hamid, and A. Md Ab Malik, “Predictive Model of Graduate-On-Time Using Machine Learning Algorithms,” Communications in Computer and Information Science, vol. 1100, pp. 130–141, 2019, doi: 10.1007/978-981-15-0399-3_11/COVER.

[21] M. Windarti and P. T. Prasetyaninrum, “Prediction Analysis Student Graduate Using Multilayer Perceptron,” pp. 53–57, May 2020, doi: 10.2991/ASSEHR.K.200521.011.

[22] N. Suresh, V. Hashiyana, G. T. Nhinda, I. Stephanus, and P. Kautwima, “Graduates’ Prediction System Using Artificial Intelligence,” ACM International Conference Proceeding Series, pp. 317–327, Aug. 2021, doi: 10.1145/3484824.3484873.

[23] D. Ruete et al., “Early Detection of Delayed Graduation in Master’s Students,” ASEE Annual Conference and Exposition, Conference Proceedings, Jul. 2021, doi: 10.18260/1-2--36999.

[24] G. Gunawan, H. Hanes, and C. Catherine, “C4.5, K-Nearest Neighbor, Naïve Bayes, and Random Forest Algorithms Comparison to Predict Students’ On Time Graduation,” Indonesian Journal of Artificial Intelligence and Data Mining, vol. 4, no. 2, pp. 62–71, Nov. 2021, doi: 10.24014/IJAIDM.V4I2.10833.

[25] J. Mantik, Y. Yennimar, M. R. Faturrahman, S. Nesen, M. A. Guci, and S. R. Pasaribu, “Implementation of artificial neural network and support vector machine algorithm on student graduation prediction model on time,” Jurnal Mantik, vol. 7, no. 2, pp. 925–934, Aug. 2023, doi: 10.35335/MANTIK.V7I2.3992.

[26] A. Desfiandi and B. Soewito, “STUDENT GRADUATION TIME PREDICTION USING LOGISTIC REGRESSION, DECISION TREE, SUPPORT VECTOR MACHINE, AND ADABOOST ENSEMBLE LEARNING,” IJISCS (International Journal of Information System and Computer Science), vol. 7, no. 3, pp. 195–199, Oct. 2023, doi: 10.56327/IJISCS.V7I2.1579.

[27] A. Sadqui, M. Ertel, H. Sadiki, and S. Amali, “Evaluating Machine Learning Models for Predicting Graduation Timelines in Moroccan Universities,” International Journal of Advanced Computer Science and Applications, vol. 14, no. 7, pp. 304–310, 2023, doi: 10.14569/IJACSA.2023.0140734.

[28] H. J. P. Weerts, A. C. Mueller, and J. Vanschoren, “Importance of Tuning Hyperparameters of Machine Learning Algorithms,” Jul. 2020, doi: 10.48550/arxiv.2007.07588.

[29] Z. Bitar and A. Al-Mousa, “Prediction of Graduate Admission using Multiple Supervised Machine Learning Models,” Conference Proceedings - IEEE SOUTHEASTCON, vol. 2020-March, Mar. 2020, doi: 10.1109/SOUTHEASTCON44009.2020.9249747.

[30] J. M. Aiken, R. de Bin, M. Hjorth-Jensen, and M. D. Caballero, “Predicting time to graduation at a large enrollment American university,” PLoS One, vol. 15, no. 11, p. e0242334, Nov. 2020, doi: 10.1371/JOURNAL.PONE.0242334.

[31] R. Bakri, N. P. Astuti, and A. S. Ahmar, “Machine Learning Algorithms with Parameter Tuning to Predict Students’ Graduation-on-time: A Case Study in Higher Education,” Journal of Applied Science, Engineering, Technology, and Education, vol. 4, no. 2, pp. 259–265, Dec. 2022, doi: 10.35877/454RI.ASCI1581.

[32] R. Ghorbani and R. Ghousi, “Comparing Different Resampling Methods in Predicting Students’ Performance Using Machine Learning Techniques,” IEEE Access, vol. 8, pp. 67899–67911, 2020, doi: 10.1109/ACCESS.2020.2986809.

[33] T. Wongvorachan, S. He, and O. Bulut, “A Comparison of Undersampling, Oversampling, and SMOTE Methods for Dealing with Imbalanced Classification in Educational Data Mining,” Information 2023, Vol. 14, Page 54, vol. 14, no. 1, p. 54, Jan. 2023, doi: 10.3390/INFO14010054.

[34] H. Brdesee, W. Alsaggaf, N. Aljohani, and S. U. Hassan, “Predictive Model Using a Machine Learning Approach for Enhancing the Retention Rate of Students At-Risk,” https://services.igi-global.com/resolvedoi/resolve.aspx?doi=10.4018/IJSWIS.299859, vol. 18, no. 1, pp. 1–21, Jan. 2020, doi: 10.4018/IJSWIS.299859.

[35] A. Anggrawan, H. Hairani, and C. Satria, “Improving SVM Classification Performance on Unbalanced Student Graduation Time Data Using SMOTE,” International Journal of Information and Education Technology, vol. 13, no. 2, pp. 289–295, Feb. 2023, doi: 10.18178/IJIET.2023.13.2.1806.

[36] H. S. Bako, F. U. Ambursa, B. S. Galadanci, and M. Garba, “PREDICTING TIMELY GRADUATION OF POSTGRADUATE STUDENTS USING RANDOM FORESTS ENSEMBLE METHOD,” FUDMA JOURNAL OF SCIENCES, vol. 7, no. 3, pp. 177–185, Jul. 2023, doi: 10.33003/fjs-2023-0703-1773.

[37] R. Bakri, N. P. Astuti, & Ansari, and S. Ahmar, “Evaluating Random Forest Algorithm in Educational Data Mining: Optimizing Graduation on-time prediction using Imbalance Methods,” ARRUS Journal of Social Sciences and Humanities, vol. 4, no. 1, pp. 108–116, Feb. 2024, doi: 10.35877/SOSHUM2449.

[38] H. Hassan, N. B. Ahmad, and S. Anuar, “Improved students’ performance prediction for multi-class imbalanced problems using hybrid and ensemble approach in educational data mining,” J Phys Conf Ser, vol. 1529, no. 5, p. 052041, May 2020, doi: 10.1088/1742-6596/1529/5/052041.

[39] M. Mukherjee and M. Khushi, “SMOTE-ENC: A Novel SMOTE-Based Method to Generate Synthetic Data for Nominal and Continuous Features,” Applied System Innovation 2021, Vol. 4, Page 18, vol. 4, no. 1, p. 18, Mar. 2021, doi: 10.3390/ASI4010018.

[40] N. Lunardon, G. Menardi, and N. Torelli, “ROSE: a Package for Binary Imbalanced Learning,” R Journal, vol. 6, no. 1, pp. 82–92, 2014.

[41] E. Hvitfeldt, “themis: Extra Recipes Steps for Dealing with Unbalanced Data.” [Online]. Available: https://cran.r-project.org/package=themis

[42] M. Kuhn, “caret: Classification and Regression Training,” 2020. [Online]. Available: https://github.com/topepo/caret/

[43] A. Liaw and M. Wiener, “Classification and Regression by randomForest,” R News, vol. 2, no. 3, pp. 18–22, 2022, [Online]. Available: https://cran.r-project.org/package=randomForest

[44] Maciej Serda et al., “MVN: An R Package for Assessing Multivariate Normality,” R JOURNAL, vol. 6, no. 2, pp. 343–354, 2014, doi: 10.2/JQUERY.MIN.JS.

[45] J. Oksanen et al., “Community Ecology Package [R package vegan version 2.6-10],” CRAN: Contributed Packages, Jan. 2025, doi: 10.32614/CRAN.PACKAGE.VEGAN.

[46] M. J. Anderson, “A new method for non-parametric multivariate analysis of variance,” Austral Ecol, vol. 26, no. 1, pp. 32–46, Feb. 2001, doi: 10.1111/J.1442-9993.2001.01070.PP.X.

[47] F. A. Al-Abdullatif, M. A. Al-Abdullatif, and G. Brooks, “MANOVA Post Hoc Techniques Used in Published Articles: A Systematic Review,” General Linear Model Journal, vol. 45, no. 1, pp. 4–11, Mar. 2019, doi: 10.31523/GLMJ.045001.002.

Downloads

Published

2025-08-17

Issue

Section

Article

Citation Check

Similar Articles

<< < 8 9 10 11 12 13 14 15 16 17 > >> 

You may also start an advanced similarity search for this article.