Random Forest Method Approach to Customer Classification Based on Non-Performing Loan in Micro Business

Muhammad Muhajir; Julia Widiastuti

doi:10.15575/join.v7i2.842

Authors

Muhammad Muhajir Department of Statistics, Universitas Islam Indonesia, Indonesia http://orcid.org/0000-0001-7576-2630
Julia Widiastuti Service Operation, Electronic Payment Artajasa, Indonesia http://orcid.org/0000-0001-9442-8283

DOI:

https://doi.org/10.15575/join.v7i2.842

Keywords:

Classification, Imbalanced Data, Improved random forest, Oversampling Technique

Abstract

This study aims to classify potential customersâ€™ characteristics based on non- performing loans through the random forest method. This research uses data obtained from Syariah Mandiri Bank branch in Jambi, which includes data on micro-financing customers in years 2016â€“2020. The random forest method is used for analysis. The novelty of this work is that, unlike existing researches that used other soft-computing methods, we employ Random Forest method, specifically using an imbalanced class sampling technique. The obtained results show that credit risk can be estimated by taking into account factors such as age, monthly installments, margin, price of insurance, loan principal, occupation, and long installments. The research results indicate that the sensitivity, precision, and G-mean value increase compared to using the original data. Random forest with oversampling technique has the high Area Under the ROC Curve score that is equal to 66.69%.

References

Bank Indonesia, â€œProfil Bisnis Usaha Mikro, Kecil, dan Menengah,â€ 2015. www.bi.go.id.

Geev, â€œMengenal Apa Itu UMKM dan Perkembangannya di Indonesia,â€ 2017. .

Z. Arifin, Dasar-dasar Manajemen Bank Syariâ€™ah. Jakarta: Alfabeta, 2002.

Bank Indonesia, Undang-Undang Nomor 10 Tahun 1998 tentang Perubahan Undang-Undang No. 7 Tahun 1992 tentang Perbankan. Jakarta: Gramedia, 1998.

Y. H. Fahmi, I and Lavianti, Pengantar Manajemen Perkreditan. Bandung: Bandung, 2010.

A. KumarM.N and H. S. Sheshadri, â€œOn the Classification of Imbalanced Datasets,â€ Int. J. Comput. Appl., vol. 44, no. 8, 2012, doi: 10.5120/6280-8449.

P. Trkman, K. McCormack, M. P. V. De Oliveira, and M. B. Ladeira, â€œThe impact of business analytics on supply chain performance,â€ Decis. Support Syst., vol. 49, no. 3, 2010, doi: 10.1016/j.dss.2010.03.007.

L. Breiman, â€œRandom Forest,â€ Mach. Learn., vol. 45, no. 1, pp. 5â€“32, 2001.

L. Lin, F. Wang, X. Xie, and S. Zhong, â€œRandom forests-based extreme learning machine ensemble for multi-regime time series prediction,â€ Expert Syst. Appl., vol. 83, pp. 164â€“176, Oct. 2017, doi: 10.1016/j.eswa.2017.04.013.

F. N. Koutanaei, H. Sajedi, and M. Khanbabaei, â€œA hybrid data mining model of feature selection algorithms and ensemble learning classifiers for credit scoring,â€ J. Retail. Consum. Serv., vol. 27, 2015, doi: 10.1016/j.jretconser.2015.07.003.

H. He, W. Zhang, and S. Zhang, â€œA novel ensemble method for credit scoring: Adaption of different imbalance ratios,â€ Expert Syst. Appl., vol. 98, 2018, doi: 10.1016/j.eswa.2018.01.012.

L. Breiman, â€œManual on setting up, using, and understanding random forests v3. 1,â€ Tech. Report, http//oz.berkeley.edu/users/breiman, Stat. Dep. Univ. Calif. Berkeley, â€¦, 2002.

P. Singh, S. and Gupta, â€œComparative study ID3, cart and C4 . 5 Decision tree algorithm: a survey,â€ Int. J. Adv. Inf. Sci. Technol., vol. 27, no. 27, pp. 97â€“103, 2014.

A. Liaw and M. Wiener, â€œClassification and Regression with Random Forest,â€ R News, vol. 2, 2002.

D. Ramyachitra and P. Manikandan, â€œImbalanced Dataset Classification and Solutions: a Review,â€ Int. J. Comput. Bus. Res. ISSN (Online, vol. 5, no. 4, 2014.

K. Santra and C. J. Christy, â€œGenetic Algorithm and Confusion Matrix for Document Clustering,â€ Int. J. Comput. Sci., vol. 9, no. 1, 2012.

M. Bekkar, H. K. Djemaa, and T. A. Alitouche, â€œEvaluation Measures for Models Assessment over Imbalanced Data Sets,â€ J. Inf. Eng. Appl., vol. 3, no. 10, 2013.

H. M and S. M.N, â€œA Review on Evaluation Metrics for Data Classification Evaluations,â€ Int. J. Data Min. Knowl. Manag. Process, vol. 5, no. 2, 2015, doi: 10.5121/ijdkp.2015.5201.

J. M. Johnson and T. M. Khoshgoftaar, â€œDeep learning and data sampling with imbalanced big data,â€ 2019, doi: 10.1109/IRI.2019.00038.

M. Bramer, Principles of data mining fourth edition, vol. 30, no. 7. 2020.

A. Ali, S. M. Shamsuddin, and A. L. Ralescu, â€œClassification with class imbalance problem: A review,â€ Int. J. Adv. Soft Comput. its Appl., vol. 7, no. 3, 2015.

G. Louppe, L. Wehenkel, A. Sutera, and P. Geurts, â€œUnderstanding variable importances in Forests of randomized trees,â€ 2013.

S. Wang and X. Yao, â€œUsing class imbalance learning for software defect prediction,â€ IEEE Trans. Reliab., vol. 62, no. 2, 2013, doi: 10.1109/TR.2013.2259203.

X. Y. Liu and Z. H. Zhou, â€œEnsemble methods for class imbalance learning,â€ in Imbalanced Learning: Foundations, Algorithms, and Applications, 2013.