Modality-based Modeling with Data Balancing and Dimensionality Reduction for Early Stunting Detection

Yohanes Setiawan; Mohammad Hamim Zajuli Al Faroby; Mochamad Nizar Palefi Ma’ady; I Made Wisnu Adi Sanjaya; Cisa Valentino Cahya Ramadhani

doi:10.15575/join.v10i1.1495

Authors

Yohanes Setiawan Department of Information Technology, Telkom University, Surabaya Campus, Surabaya, Indonesia
Mohammad Hamim Zajuli Al Faroby Department of Data Science, Telkom University, Surabaya Campus, Surabaya, Indonesia https://orcid.org/0000-0001-6500-270X
Mochamad Nizar Palefi Ma’ady Department of Information Systems, Telkom University, Surabaya Campus, Surabaya, Indonesia
I Made Wisnu Adi Sanjaya Department of Data Science, Telkom University, Surabaya Campus, Surabaya, Indonesia
Cisa Valentino Cahya Ramadhani Department of Information Technology, Telkom University, Surabaya Campus, Surabaya, Indonesia

DOI:

https://doi.org/10.15575/join.v10i1.1495

Keywords:

Data Balancing, Dimensionality Reduction, Multimodal, Stunting, Unimodal

Abstract

In Indonesia, the stunting rate has reached 36%, significantly higher than the World Health Organization's (WHO) standard of 20%. This high prevalence underscores the urgent need for effective early detection methods. Traditional data mining approaches for stunting detection have primarily focused on unimodal data, either tabular or image data alone, limiting the comprehensiveness and accuracy of the detection models. Modality-based modeling, which integrates image and tabular data, can provide a more holistic view and improve detection accuracy. This research aims to analyze modality-based modeling for the early detection of stunting. Two modalities, unimodal and multimodal, are used in this study. The main contributions of this research are the development of a comprehensive framework for modality-based analysis, the application of advanced data preprocessing techniques, and the comparison of various machine learning algorithms to identify the best model for stunting detection. The dataset, comprising images and tabular data, is sourced from Posyandu in Sidoarjo, Indonesia. Image data undergoes preprocessing, including background segmentation and feature extraction using the Gray Level Co-occurrence Matrix (GLCM), while tabular data is processed through categorical encoding. The Synthetic Minority Oversampling Technique (SMOTE) addresses class imbalance, and Principal Component Analysis (PCA) is used for dimensionality reduction. Unimodal modeling uses tabular or image data alone, while multimodal modeling combines both before classification. The study achieves the best F1 scores of 0.96, 0.91, and 0.90 for tabular-only, image-only, and image-tabular modalities, respectively, demonstrating the effectiveness of data balancing and dimensionality reduction techniques.

References

[1] W. Hadikurniawati, K. Dwi Hartomo, I. Sembiring, H. Dwi Purnomo, and A. Iriani, “Triangular fuzzy numbers-based MADM for selecting pregnant mothers at risk of stunting,” Jurnal RESTI (Rekayasa Sistem dan Teknologi Informasi), vol. 7, no. 3, pp. 579–585, Jun. 2023, doi: 10.29207/resti.v7i3.4966.

[2] L. A. Jokhu, A. Syauqy, L. Y. Lin, F. F. Dieny, and A. Rahadiyanti, “Determinants of stunting among children 6–23 months: a population-based study in Indonesia,” Nutr Food Sci, 2024, doi: 10.1108/NFS-01-2024-0025.

[3] I. K. Hasan, Nurwan, Nur Falaq, and Muhammad Rezky Friesta Payu, “Optimization Fuzzy Geographically Weighted Clustering with Gravitational Search Algorithm for Factors Analysis Associated with Stunting,” Jurnal RESTI (Rekayasa Sistem dan Teknologi Informasi), vol. 7, no. 1, pp. 120–128, Feb. 2023, doi: 10.29207/resti.v7i1.4508.

[4] Y. Yuliana, P. Paradise, and M. Qulub, “DETECTION OF CHILDREN’S NUTRITIONAL STATUS USING MACHINE LEARNING WITH LOGISTIC REGRESSION ALGORITHM,” JURTEKSI (Jurnal Teknologi dan Sistem Informasi), vol. 10, no. 2, pp. 267–274, Mar. 2024, doi: 10.33330/jurteksi.v10i2.2973.

[5] R. Gustriansyah, N. Suhandi, S. Puspasari, and A. Sanmorino, “Machine Learning Method to Predict the Toddlers’ Nutritional Status,” JURNAL INFOTEL, vol. 16, no. 1, Jan. 2024, doi: 10.20895/infotel.v15i4.988.

[6] A. A. Ningrum and Y. Ikawati, “Early Detection of Stunting Based on Feature Engineering and Machine Learning Algorithm Approach,” Indonesian Journal of Electronics, Electromedical Engineering, and Medical Informatics, vol. 6, no. 3, pp. 147–155, 2024, doi: 10.35882/ijeeemi.v6i3.6.

[7] A. A. Permana, B. Raharja, and A. T. Perdana, “Artificial Intelligence for Diagnosing Child Stunting: A Systematic Literature Review,” Journal of System and Management Sciences, vol. 13, no. 6, pp. 605–621, 2023, doi: 10.33168/JSMS.2023.0635.

[8] R. Saragih, E. N. Saputra, D. Setiadikarunia, and J. J. Jarden, “Perbandingan Teknik Klasifikasi Fast Null-space Based Linear Discriminant Analysis (FNLDA) dan Direct Linear Discriminant Analysis (DLDA) dalam Pengenalan Citra Multimodal Wajah atau Pembuluh Darah di Telapak Tangan,” Jurnal Ecotipe (Electronic, Control, Telecommunication, Information, and Power Engineering), vol. 9, no. 1, pp. 40–48, Apr. 2022, doi: 10.33019/jurnalecotipe.v9i1.2867.

[9] Y. Wang et al., “Geometric Correspondence-Based Multimodal Learning for Ophthalmic Image Analysis,” IEEE Trans Med Imaging, vol. 43, no. 5, pp. 1945–1957, May 2024, doi: 10.1109/TMI.2024.3352602.

[10] K. N. Singh, O. P. Singh, A. K. Singh, and A. K. Agrawal, “WatMIF: Multimodal Medical Image Fusion-Based Watermarking for Telehealth Applications,” Cognit Comput, vol. 16, no. 4, pp. 1947–1963, Jul. 2024, doi: 10.1007/s12559-022-10040-4.

[11] C. Cui et al., “Deep multimodal fusion of image and non-image data in disease diagnosis and prognosis: a review,” Progress in Biomedical Engineering, vol. 5, no. 2, Apr. 2023, doi: 10.1088/2516-1091/acc2fe.

[12] F. Behrad and M. Saniee Abadeh, “An overview of deep learning methods for multimodal medical data mining,” Aug. 15, 2022, Elsevier Ltd. doi: 10.1016/j.eswa.2022.117006.

[13] C. Liu, Z. Mao, T. Zhang, A. A. Liu, B. Wang, and Y. Zhang, “Focus Your Attention: A Focal Attention for Multimodal Learning,” IEEE Trans Multimedia, vol. 24, pp. 103–115, 2022, doi: 10.1109/TMM.2020.3046855.

[14] D. Elreedy, A. F. Atiya, and F. Kamalov, “A theoretical distribution analysis of synthetic minority oversampling technique (SMOTE) for imbalanced learning,” Mach Learn, vol. 113, no. 7, pp. 4903–4923, Jul. 2024, doi: 10.1007/s10994-022-06296-4.

[15] M. Greenacre et al., “Economics Working Paper Series Principal component analysis Principal Component Analysis,” Barcelona, Jan. 2023.

[16] U. N. Wisesty, T. A. B. Wirayuda, F. Sthevanie, and R. Rismala, “Analysis of Data and Feature Processing on Stroke Prediction using Wide Range Machine Learning Model,” Jurnal Online Informatika, vol. 9, no. 1, pp. 29–40, Apr. 2024, doi: 10.15575/join.v9i1.1249.

[17] Y. Yin, H. Wang, S. Liu, J. Sun, P. Jing, and Y. Liu, “Internet of Things for Diagnosis of Alzheimer’s Disease: A Multimodal Machine Learning Approach Based on Eye Movement Features,” IEEE Internet Things J, vol. 10, no. 13, pp. 11476–11485, Jul. 2023, doi: 10.1109/JIOT.2023.3245067.

[18] M. Ula, S. Fachrurrazi, and R. Achmad Rizal, “IMPLEMENTATION OF DATA MINING MODELS WITH ALGORITHMS K-NEAREST NEIGHBOR IN MONITORING THE NUTRITIONAL STATUS OF CHILDREN AND STUNTING,” Journal of Information Systems and Computer Science Prima), vol. 6, no. 2, 2023.

[19] M. I. Irawan, M. H. Z. Al Faroby, and A. P. Dyah Nurhayati, “In Silico Analysis Using Hybrid Support Vector Machine and Second Order of Markov Chain for Multiple Sequence Alignment to Identify the Types of Leukaemia,” in Journal of Physics: Conference Series, Institute of Physics Publishing, Nov. 2019. doi: 10.1088/1742-6596/1366/1/012052.

[20] A. Nugroho, H. L. Hendric, S. Warnars, F. L. Gaol, and T. Matsuo, “Trend of Stunting Weight for Infants and Toddlers Using Decision Tree,” Int J Appl Math (Sofia), vol. 52, no. 1, Mar. 2022.

[21] Y. Setiawan, O. A. Permata, and M. P. Yuda, “Decision Tree based Data Modelling for First Detection of Thalassemia Major,” Jurnal Sisfokom (Sistem Informasi dan Komputer), vol. 13, no. 1, pp. 49–56, Feb. 2024, doi: 10.32736/sisfokom.v13i1.1949.

[22] I. D. Mienye and Y. Sun, “A Survey of Ensemble Learning: Concepts, Algorithms, Applications, and Prospects,” 2022, Institute of Electrical and Electronics Engineers Inc. doi: 10.1109/ACCESS.2022.3207287.

[23] O. N. Chilyabanyama et al., “Performance of Machine Learning Classifiers in Classifying Stunting among Under-Five Children in Zambia,” Children, vol. 9, no. 7, Jul. 2022, doi: 10.3390/children9071082.

[24] A. Natekin and A. Knoll, “Gradient boosting machines, a tutorial,” Front Neurorobot, vol. 7, no. 21, 2013, doi: 10.3389/fnbot.2013.00021.

[25] M. H. Z. AlFaroby, H. N. Fadhilah, S. Amiroch, and R. S. Hidayat, “XGB-Hybrid Fingerprint Classification Model for Virtual Screening of Meningitis Drug Compounds Candidate,” Kinetik: Game Technology, Information System, Computer Network, Computing, Electronics, and Control, Jun. 2022, doi: 10.22219/kinetik.v7i2.1424.

[26] D. A. McCarty, H. W. Kim, and H. K. Lee, “Evaluation of light gradient boosted machine learning technique in large scale land use and land cover classification,” Environments - MDPI, vol. 7, no. 10, pp. 1–22, Oct. 2020, doi: 10.3390/environments7100084.

[27] D. Niu, L. Diao, Z. Zang, H. Che, T. Zhang, and X. Chen, “A Machine-Learning Approach Combining Wavelet Packet Denoising with Catboost for Weather Forecasting,” Atmosphere (Basel), vol. 12, no. 12, Dec. 2021, doi: 10.3390/atmos12121618.

[28] J. Tanha, Y. Abdi, N. Samadi, N. Razzaghi, and M. Asadpour, “Boosting methods for multi-class imbalanced data classification: an experimental review,” J Big Data, vol. 7, no. 1, Dec. 2020, doi: 10.1186/s40537-020-00349-y.

[29] F. Wang, Z. Li, F. He, R. Wang, W. Yu, and F. Nie, “Feature Learning Viewpoint of Adaboost and a New Algorithm,” IEEE Access, vol. 7, pp. 149890–149899, 2019, doi: 10.1109/ACCESS.2019.2947359.

[30] H. Jafarzadeh, M. Mahdianpari, E. Gill, F. Mohammadimanesh, and S. Homayouni, “Bagging and boosting ensemble classifiers for classification of multispectral, hyperspectral and polSAR data: A comparative evaluation,” Remote Sens (Basel), vol. 13, no. 21, Nov. 2021, doi: 10.3390/rs13214405.

[31] F. M. Amin, D. Candra, and R. Novitasari, “Identification of Stunting Disease using Anthropometry Data and Long Short-Term Memory (LSTM) Model,” Computer Engineering and Applications, vol. 11, no. 1, 2022.