Random Forest Method Approach to Customer Classification Based on Non-Performing Loan in Micro Business





Classification, Imbalanced Data, Improved random forest, Oversampling Technique


This study aims to classify potential customers’ characteristics based on non- performing loans through the random forest method. This research uses data obtained from Syariah Mandiri Bank branch in Jambi, which includes data on micro-financing customers in years 2016–2020. The random forest method is used for analysis. The novelty of this work is that, unlike existing researches that used other soft-computing methods, we employ Random Forest method, specifically using an imbalanced class sampling technique. The obtained results show that credit risk can be estimated by taking into account factors such as age, monthly installments, margin, price of insurance, loan principal, occupation, and long installments. The research results indicate that the sensitivity, precision, and G-mean value increase compared to using the original data. Random forest with oversampling technique has the high Area Under the ROC Curve score that is equal to 66.69%.


Bank Indonesia, “Profil Bisnis Usaha Mikro, Kecil, dan Menengah,†2015. www.bi.go.id.

Geev, “Mengenal Apa Itu UMKM dan Perkembangannya di Indonesia,†2017. .

Z. Arifin, Dasar-dasar Manajemen Bank Syari’ah. Jakarta: Alfabeta, 2002.

Bank Indonesia, Undang-Undang Nomor 10 Tahun 1998 tentang Perubahan Undang-Undang No. 7 Tahun 1992 tentang Perbankan. Jakarta: Gramedia, 1998.

Y. H. Fahmi, I and Lavianti, Pengantar Manajemen Perkreditan. Bandung: Bandung, 2010.

A. KumarM.N and H. S. Sheshadri, “On the Classification of Imbalanced Datasets,†Int. J. Comput. Appl., vol. 44, no. 8, 2012, doi: 10.5120/6280-8449.

P. Trkman, K. McCormack, M. P. V. De Oliveira, and M. B. Ladeira, “The impact of business analytics on supply chain performance,†Decis. Support Syst., vol. 49, no. 3, 2010, doi: 10.1016/j.dss.2010.03.007.

L. Breiman, “Random Forest,†Mach. Learn., vol. 45, no. 1, pp. 5–32, 2001.

L. Lin, F. Wang, X. Xie, and S. Zhong, “Random forests-based extreme learning machine ensemble for multi-regime time series prediction,†Expert Syst. Appl., vol. 83, pp. 164–176, Oct. 2017, doi: 10.1016/j.eswa.2017.04.013.

F. N. Koutanaei, H. Sajedi, and M. Khanbabaei, “A hybrid data mining model of feature selection algorithms and ensemble learning classifiers for credit scoring,†J. Retail. Consum. Serv., vol. 27, 2015, doi: 10.1016/j.jretconser.2015.07.003.

H. He, W. Zhang, and S. Zhang, “A novel ensemble method for credit scoring: Adaption of different imbalance ratios,†Expert Syst. Appl., vol. 98, 2018, doi: 10.1016/j.eswa.2018.01.012.

L. Breiman, “Manual on setting up, using, and understanding random forests v3. 1,†Tech. Report, http//oz.berkeley.edu/users/breiman, Stat. Dep. Univ. Calif. Berkeley, …, 2002.

P. Singh, S. and Gupta, “Comparative study ID3, cart and C4 . 5 Decision tree algorithm: a survey,†Int. J. Adv. Inf. Sci. Technol., vol. 27, no. 27, pp. 97–103, 2014.

A. Liaw and M. Wiener, “Classification and Regression with Random Forest,†R News, vol. 2, 2002.

D. Ramyachitra and P. Manikandan, “Imbalanced Dataset Classification and Solutions: a Review,†Int. J. Comput. Bus. Res. ISSN (Online, vol. 5, no. 4, 2014.

K. Santra and C. J. Christy, “Genetic Algorithm and Confusion Matrix for Document Clustering,†Int. J. Comput. Sci., vol. 9, no. 1, 2012.

M. Bekkar, H. K. Djemaa, and T. A. Alitouche, “Evaluation Measures for Models Assessment over Imbalanced Data Sets,†J. Inf. Eng. Appl., vol. 3, no. 10, 2013.

H. M and S. M.N, “A Review on Evaluation Metrics for Data Classification Evaluations,†Int. J. Data Min. Knowl. Manag. Process, vol. 5, no. 2, 2015, doi: 10.5121/ijdkp.2015.5201.

J. M. Johnson and T. M. Khoshgoftaar, “Deep learning and data sampling with imbalanced big data,†2019, doi: 10.1109/IRI.2019.00038.

M. Bramer, Principles of data mining fourth edition, vol. 30, no. 7. 2020.

A. Ali, S. M. Shamsuddin, and A. L. Ralescu, “Classification with class imbalance problem: A review,†Int. J. Adv. Soft Comput. its Appl., vol. 7, no. 3, 2015.

G. Louppe, L. Wehenkel, A. Sutera, and P. Geurts, “Understanding variable importances in Forests of randomized trees,†2013.

S. Wang and X. Yao, “Using class imbalance learning for software defect prediction,†IEEE Trans. Reliab., vol. 62, no. 2, 2013, doi: 10.1109/TR.2013.2259203.

X. Y. Liu and Z. H. Zhou, “Ensemble methods for class imbalance learning,†in Imbalanced Learning: Foundations, Algorithms, and Applications, 2013.







Citation Check