Performance Comparative Study of Machine Learning Classification Algorithms for Food Insecurity Experience by Households in West Java

Authors

  • Khusnia Nurul Khikmah IPB University, Indonesia http://orcid.org/0000-0002-9142-6968
  • Bagus Sartono IPB University, Indonesia
  • Budi Susetyo IPB University, Indonesia
  • Gerry Alfa Dito IPB University, Indonesia

DOI:

https://doi.org/10.15575/join.v9i1.1012

Keywords:

Extremely randomized tree, Food insecurity, Gradient boosting, Random forest, Rotation forest

Abstract

This study aims to compare the classification performance of the random forest, gradient boosting, rotation forest, and extremely randomized tree methods in classifying the food insecurity experience scale in West Java. The dataset used in this research is based on the Socio-Economic Survey by Statistics Indonesia in 2020. The novelty of this research is comparing the performance of the four methods used, which all are the tree ensemble approaches. In addition, due to the imbalance class problem, the authors also applied three imbalance handling techniques in this study. The results show that the combination of the random-forest algorithm and the random-under sampling technique is the best classifier. This approach has a balanced accuracy value of 65.795%. The best classification method results show that the food insecurity experience scale in West Java can be identified by considering the factors of floor area (house size), the number of depositors, type of floor, health insurance ownership status, and internet access capabilities.

Author Biographies

Khusnia Nurul Khikmah, IPB University

Department of Statistics

Bagus Sartono, IPB University

Department of Statistics

Budi Susetyo, IPB University

Department of Statistics

Gerry Alfa Dito, IPB University

Department of Statistics

References

W. Xing and Y. Bei, “Medical health big data classification based on KNN classification algorithm,†IEEE Access, vol. 8, pp. 28808–28819, 2019.

T. Lan, H. Hu, C. Jiang, G. Yang, and Z. Zhao, “A comparative study of decision tree, random forest, and convolutional neural network for spread-F identification,†Advances in Space Research, vol. 65, no. 8, pp. 2052–2061, 2020.

A. R. Bagasta, Z. Rustam, J. Pandelaki, and W. A. Nugroho, “Comparison of cubic SVM with Gaussian SVM: classification of infarction for detecting ischemic stroke,†in IOP Conference Series: Materials Science and Engineering, 2019, vol. 546, no. 5, p. 052016.

S. Talukdar, P. Singha, S. Mahato, S. Pal, Y.-A. Liou, and A. Rahman, “Land-use land-cover classification by machine learning classifiers for satellite observations—A review,†Remote Sens (Basel), vol. 12, no. 7, p. 1135, 2020.

N. Chakrabarty, T. Kundu, S. Dandapat, A. Sarkar, and D. K. Kole, “Flight arrival delay prediction using gradient boosting classifier,†in Emerging technologies in data mining and information security, Springer, 2019, pp. 651–659.

Z. Tian, J. Xiao, H. Feng, and Y. Wei, “Credit risk assessment based on gradient boosting decision tree,†Procedia Comput Sci, vol. 174, pp. 150–160, 2020.

M. Juez-Gil, Ã. Arnaiz-González, J. J. Rodríguez, C. López-Nozal, and C. García-Osorio, “Rotation Forest for Big Data,†Information Fusion, vol. 74, 2021, doi: 10.1016/j.inffus.2021.03.007.

M. Anwar, “The Household Food Insecurity Amidst the Covid-19 Pandemic in Indonesia,†JEJAK, vol. 14, no. 2, pp. 244–260, 2021.

L. Breiman, “Random forests,†Mach Learn, vol. 45, no. 1, pp. 5–32, 2001.

M. Maniruzzaman et al., “Accurate diabetes risk stratification using machine learning: role of missing value and outliers,†J Med Syst, vol. 42, no. 5, pp. 1–17, 2018.

M. F. Ijaz, M. Attique, and Y. Son, “Data-driven cervical cancer prediction model with outlier detection and over-sampling methods,†Sensors, vol. 20, no. 10, p. 2809, 2020.

F. Cánovas-García, F. Alonso-Sarría, F. Gomariz-Castillo, and F. Oñate-Valdivieso, “Modification of the random forest algorithm to avoid statistical dependence problems when classifying remote sensing imagery,†Comput Geosci, vol. 103, pp. 1–11, 2017.

B. Ghimire, J. Rogan, V. R. Galiano, P. Panday, and N. Neeti, “An evaluation of bagging, boosting, and random forests for land-cover classification in Cape Cod, Massachusetts, USA,†GIsci Remote Sens, vol. 49, no. 5, pp. 623–643, 2012.

T. N. Phan, V. Kuch, and L. W. Lehnert, “Land Cover Classification using Google Earth Engine and Random Forest Classifier—The Role of Image Composition,†Remote Sens (Basel), vol. 12, no. 15, p. 2411, 2020.

A. Cutler, D. R. Cutler, and J. R. Stevens, “Random forests,†in Ensemble machine learning, Springer, 2012, pp. 157–175.

J. H. Friedman, “Greedy function approximation: a gradient boosting machine,†Ann Stat, pp. 1189–1232, 2001.

B. A. Tama and K.-H. Rhee, “An in-depth experimental study of anomaly detection using gradient boosted machine,†Neural Comput Appl, vol. 31, no. 4, pp. 955–965, 2019.

J. J. Rodríguez, L. I. Kuncheva, and C. J. Alonso, “Rotation forest: A New classifier ensemble method,†IEEE Trans Pattern Anal Mach Intell, vol. 28, no. 10, pp. 1619–1630, 2006, doi: 10.1109/TPAMI.2006.211.

C. S. Septeria and L. Wachidah, “Klasifikasi Pasien Diabetes Melitus Tipe 1 dengan Metode Rotation Forest,†Prosiding Statistika, pp. 521–529, 2021.

Pd. Geurts, “Ernst D,†Wehenkel L. Extremely randomized trees. Machine Learning, vol. 63, no. 1, pp. 3–42, 2006.

C. Désir, C. Petitjean, L. Heutte, M. Salaun, and L. Thiberville, “Classification of endomicroscopic images of the lung based on random subwindows and extra-trees,†IEEE Trans Biomed Eng, vol. 59, no. 9, pp. 2677–2683, 2012.

E. K. Ampomah, Z. Qin, and G. Nyame, “Evaluation of tree-based ensemble machine learning models in predicting stock price direction of movement,†Information, vol. 11, no. 6, p. 332, 2020.

G. Alfian et al., “Predicting Breast Cancer from Risk Factors Using SVM and Extra-Trees-Based Feature Selection Method,†Computers, vol. 11, no. 9, p. 136, 2022.

B. T. Pham et al., “Intergration of Rotation Forest and MultiBoost Ensembles with Forest by Penalizing Attributes for Spatial Prediction of Landslide Susceptibility,†2022.

A. Luque, A. Carrasco, A. Martín, and A. de Las Heras, “The impact of class imbalance in classification performance metrics based on the binary confusion matrix,†Pattern Recognit, vol. 91, pp. 216–231, 2019.

J. H. Kranzler, R. G. Floyd, N. Benson, B. Zaboski, and L. Thibodaux, “Classification agreement analysis of cross-battery assessment in the identification of specific learning disorders in children and youth,†Int J Sch Educ Psychol, vol. 4, no. 3, pp. 124–136, 2016.

Y. Liu, J. Zhang, C. Gao, J. Qu, and L. Ji, “A sensitivity analysis of attention-gated convolutional neural networks for sentence classification,†arXiv preprint arXiv:1908.06263, 2019.

Downloads

Published

2024-06-25

Issue

Section

Article

Citation Check