Performance Comparative Study of Machine Learning Classification Algorithms for Food Insecurity Experience by Households in West Java

Khusnia Nurul Khikmah; Bagus Sartono; Budi Susetyo; Gerry Alfa Dito

doi:10.15575/join.v9i1.1012

Authors

Khusnia Nurul Khikmah IPB University, Indonesia http://orcid.org/0000-0002-9142-6968
Bagus Sartono IPB University, Indonesia
Budi Susetyo IPB University, Indonesia
Gerry Alfa Dito IPB University, Indonesia

DOI:

https://doi.org/10.15575/join.v9i1.1012

Keywords:

Extremely randomized tree, Food insecurity, Gradient boosting, Random forest, Rotation forest

Abstract

This study aims to compare the classification performance of the random forest, gradient boosting, rotation forest, and extremely randomized tree methods in classifying the food insecurity experience scale in West Java. The dataset used in this research is based on the Socio-Economic Survey by Statistics Indonesia in 2020. The novelty of this research is comparing the performance of the four methods used, which all are the tree ensemble approaches. In addition, due to the imbalance class problem, the authors also applied three imbalance handling techniques in this study. The results show that the combination of the random-forest algorithm and the random-under sampling technique is the best classifier. This approach has a balanced accuracy value of 65.795%. The best classification method results show that the food insecurity experience scale in West Java can be identified by considering the factors of floor area (house size), the number of depositors, type of floor, health insurance ownership status, and internet access capabilities.

Author Biographies

Khusnia Nurul Khikmah, IPB University

Department of Statistics

Bagus Sartono, IPB University

Department of Statistics

Budi Susetyo, IPB University

Department of Statistics

Gerry Alfa Dito, IPB University

Department of Statistics

References

W. Xing and Y. Bei, â€œMedical health big data classification based on KNN classification algorithm,â€ IEEE Access, vol. 8, pp. 28808â€“28819, 2019.

T. Lan, H. Hu, C. Jiang, G. Yang, and Z. Zhao, â€œA comparative study of decision tree, random forest, and convolutional neural network for spread-F identification,â€ Advances in Space Research, vol. 65, no. 8, pp. 2052â€“2061, 2020.

A. R. Bagasta, Z. Rustam, J. Pandelaki, and W. A. Nugroho, â€œComparison of cubic SVM with Gaussian SVM: classification of infarction for detecting ischemic stroke,â€ in IOP Conference Series: Materials Science and Engineering, 2019, vol. 546, no. 5, p. 052016.

S. Talukdar, P. Singha, S. Mahato, S. Pal, Y.-A. Liou, and A. Rahman, â€œLand-use land-cover classification by machine learning classifiers for satellite observationsâ€”A review,â€ Remote Sens (Basel), vol. 12, no. 7, p. 1135, 2020.

N. Chakrabarty, T. Kundu, S. Dandapat, A. Sarkar, and D. K. Kole, â€œFlight arrival delay prediction using gradient boosting classifier,â€ in Emerging technologies in data mining and information security, Springer, 2019, pp. 651â€“659.

Z. Tian, J. Xiao, H. Feng, and Y. Wei, â€œCredit risk assessment based on gradient boosting decision tree,â€ Procedia Comput Sci, vol. 174, pp. 150â€“160, 2020.

M. Juez-Gil, Ã. Arnaiz-GonzÃ¡lez, J. J. RodrÃguez, C. LÃ³pez-Nozal, and C. GarcÃa-Osorio, â€œRotation Forest for Big Data,â€ Information Fusion, vol. 74, 2021, doi: 10.1016/j.inffus.2021.03.007.

M. Anwar, â€œThe Household Food Insecurity Amidst the Covid-19 Pandemic in Indonesia,â€ JEJAK, vol. 14, no. 2, pp. 244â€“260, 2021.

L. Breiman, â€œRandom forests,â€ Mach Learn, vol. 45, no. 1, pp. 5â€“32, 2001.

M. Maniruzzaman et al., â€œAccurate diabetes risk stratification using machine learning: role of missing value and outliers,â€ J Med Syst, vol. 42, no. 5, pp. 1â€“17, 2018.

M. F. Ijaz, M. Attique, and Y. Son, â€œData-driven cervical cancer prediction model with outlier detection and over-sampling methods,â€ Sensors, vol. 20, no. 10, p. 2809, 2020.

F. CÃ¡novas-GarcÃa, F. Alonso-SarrÃa, F. Gomariz-Castillo, and F. OÃ±ate-Valdivieso, â€œModification of the random forest algorithm to avoid statistical dependence problems when classifying remote sensing imagery,â€ Comput Geosci, vol. 103, pp. 1â€“11, 2017.

B. Ghimire, J. Rogan, V. R. Galiano, P. Panday, and N. Neeti, â€œAn evaluation of bagging, boosting, and random forests for land-cover classification in Cape Cod, Massachusetts, USA,â€ GIsci Remote Sens, vol. 49, no. 5, pp. 623â€“643, 2012.

T. N. Phan, V. Kuch, and L. W. Lehnert, â€œLand Cover Classification using Google Earth Engine and Random Forest Classifierâ€”The Role of Image Composition,â€ Remote Sens (Basel), vol. 12, no. 15, p. 2411, 2020.

A. Cutler, D. R. Cutler, and J. R. Stevens, â€œRandom forests,â€ in Ensemble machine learning, Springer, 2012, pp. 157â€“175.

J. H. Friedman, â€œGreedy function approximation: a gradient boosting machine,â€ Ann Stat, pp. 1189â€“1232, 2001.

B. A. Tama and K.-H. Rhee, â€œAn in-depth experimental study of anomaly detection using gradient boosted machine,â€ Neural Comput Appl, vol. 31, no. 4, pp. 955â€“965, 2019.

J. J. RodrÃguez, L. I. Kuncheva, and C. J. Alonso, â€œRotation forest: A New classifier ensemble method,â€ IEEE Trans Pattern Anal Mach Intell, vol. 28, no. 10, pp. 1619â€“1630, 2006, doi: 10.1109/TPAMI.2006.211.

C. S. Septeria and L. Wachidah, â€œKlasifikasi Pasien Diabetes Melitus Tipe 1 dengan Metode Rotation Forest,â€ Prosiding Statistika, pp. 521â€“529, 2021.

Pd. Geurts, â€œErnst D,â€ Wehenkel L. Extremely randomized trees. Machine Learning, vol. 63, no. 1, pp. 3â€“42, 2006.

C. DÃ©sir, C. Petitjean, L. Heutte, M. Salaun, and L. Thiberville, â€œClassification of endomicroscopic images of the lung based on random subwindows and extra-trees,â€ IEEE Trans Biomed Eng, vol. 59, no. 9, pp. 2677â€“2683, 2012.

E. K. Ampomah, Z. Qin, and G. Nyame, â€œEvaluation of tree-based ensemble machine learning models in predicting stock price direction of movement,â€ Information, vol. 11, no. 6, p. 332, 2020.

G. Alfian et al., â€œPredicting Breast Cancer from Risk Factors Using SVM and Extra-Trees-Based Feature Selection Method,â€ Computers, vol. 11, no. 9, p. 136, 2022.

B. T. Pham et al., â€œIntergration of Rotation Forest and MultiBoost Ensembles with Forest by Penalizing Attributes for Spatial Prediction of Landslide Susceptibility,â€ 2022.

A. Luque, A. Carrasco, A. MartÃn, and A. de Las Heras, â€œThe impact of class imbalance in classification performance metrics based on the binary confusion matrix,â€ Pattern Recognit, vol. 91, pp. 216â€“231, 2019.

J. H. Kranzler, R. G. Floyd, N. Benson, B. Zaboski, and L. Thibodaux, â€œClassification agreement analysis of cross-battery assessment in the identification of specific learning disorders in children and youth,â€ Int J Sch Educ Psychol, vol. 4, no. 3, pp. 124â€“136, 2016.

Y. Liu, J. Zhang, C. Gao, J. Qu, and L. Ji, â€œA sensitivity analysis of attention-gated convolutional neural networks for sentence classification,â€ arXiv preprint arXiv:1908.06263, 2019.

Performance Comparative Study of Machine Learning Classification Algorithms for Food Insecurity Experience by Households in West Java

Authors

DOI:

Keywords:

Abstract

Author Biographies

Khusnia Nurul Khikmah, IPB University

Bagus Sartono, IPB University

Budi Susetyo, IPB University

Gerry Alfa Dito, IPB University

References

Downloads

Published

Issue

Section

Citation Check

License

You are free to:

Under the following terms:

Notices:

Most read articles by the same author(s)

Similar Articles

Make a Submission

newsidebarjoin