Comparative Analysis of IndoBERT and LSTM for Multi-Label Text Classification of Indonesian Motivation Letter

Authors

  • Yosep Setiawan Department of Computer Science, Binus University Jakarta, Indonesia
  • Lili Ayu Wulandhari Department of Computer Science, Binus University Jakarta, Indonesia

DOI:

https://doi.org/10.15575/join.v10i2.1499

Keywords:

Comparative analysis, IndoBERT, LSTM, Motivation letter, Multi-label text classification

Abstract

The evaluation of motivation letters is a crucial step in the student admission process for one of vocational institutions in Indonesia. However, the current manual assessment method is prone to subjectivity and inconsistency, making it less reliable for fair student selection. This research presents a comparative analysis of two deep learning models, IndoBERT and Long Short-Term Memory (LSTM), for multi-label text classification of motivation letters written in Indonesian. Using a dataset of 676 motivation letters labeled with nine predefined categories, we evaluate the models based on their classification performance. The results indicate that IndoBERT outperforms LSTM, achieving an F1-score of 81%, compared to 76% for LSTM. This research provides insights into the effectiveness of IndoBERT for multi-label classification tasks in the Indonesian language and serves as a benchmark for future research in automating motivation letter evaluations.

References

[1] A. Adhikari, A. Ram, R. Tang, and J. Lin, “DocBERT: BERT for Document Classification,” arXiv:1904.08398, 2019.

[2] W.-C. Chang, H.-F. Yu, K. Zhong, Y. Yang, and I. Dhillon, “X-BERT: eXtreme Multi-label Text Classification with BERT,” arXiv:1905.02331, 2019.

[3] L. Cai, Y. Song, T. Liu, and K. Zhang, “A Hybrid BERT Model That Incorporates Label Semantics via Adjustive Attention for Multi-label Text Classification,” IEEE Access, 2020, doi:10.1109/ACCESS.2020.3017382.

[4] I. Chalkidis, M. Fergadiotis, P. Malakasiotis, and I. Androutsopoulos, “Large-Scale Multi-Label Text Classification on EU Legislation,” arXiv:1906.02192, 2019.

[5] N. K. Nissa and E. Yulianti, “Multi-label text classification of Indonesian customer reviews using bidirectional encoder representations from transformers language model,” Int. J. Electr. Comput. Eng., vol. 13, no. 5, pp. 5641–5652, Oct. 2023, doi:10.11591/ijece.v13i5.pp5641-5652.

[6] G. Z. Nabiilah, I. Nur, E. S. Purwanto, and M. F. Hidayat, “Indonesian multilabel classification using IndoBERT embedding and MBERT classification,” Int. J. Electr. Comput. Eng., vol. 14, no. 1, pp. 1071–1078, Feb. 2024, doi:10.11591/ijece.v14i1.pp1071-1078.

[7] N. C. Mei, S. Tiun, and G. Sastria, “Multi-Label Aspect-Sentiment Classification on Indonesian Cosmetic Product Reviews with IndoBERT Model,” Int. J. Adv. Comput. Sci. Appl., vol. 15, no. 11, 2024, doi:10.14569/IJACSA.2024.0151168.

[8] Y. Sagama and A. Alamsyah, “Multi-label Classification of Indonesian Online Toxicity Using BERT and RoBERTa,” in Proc. IAICT, 2023, doi:10.1109/IAICT59002.2023.10205892.

[9] Y. Yan et al., “LSTM2: Multi-Label Ranking for Document Classification,” Neural Process Lett 47, 117–138, 2018, doi:10.1007/s11063-017-9636-0.

[10] B. Alsukhni, “Multi-Label Arabic Text Classification Based On Deep Learning,” 12th International Conference on Information and Communication Systems (ICICS), Valencia, Spain, 2021, pp. 475-477, doi: 10.1109/ICICS52457.2021.9464538.

[11] L. Enamoto et al., “Multi-label legal text classification with BiLSTM and attention,” International Journal of Computer Applications in Technology, 2022, Vol. 68 No. 4, pp. 369-378, doi:10.1504/IJCAT.2022.125186

[12] S. J. Vine, “Pdfplumber: Plumb a PDF for Detailed Information About Each Char, Rectangle, and Line,” 2025, [Online]. Available: https://pypi.org/project/pdfplumber/.

[13] H. A. Robbani, “Sastrawi: Library for Stemming Indonesian (Bahasa) Text,” 2025, [Online]. Available: https://pypi.org/project/Sastrawi/.

[14] S. Huang, W. Hu, B. Lu, Q. Fan, X. Xu, X. Zhou, and H. Yan, “Application of Label Correlation in Multi-Label Classification: A Survey,” Appl. Sci., vol. 14, no. 19, p. 9034, 2024, doi:10.3390/app14199034.

[15] J. Devlin, M.-W. Chang, K. Lee, and K. Toutanova, “BERT: Pre-training of Deep Bidirectional Transformers for Language Understanding,” arXiv:1810.04805v2, 2019.

[16] A. Vaswani et al., “Attention Is All You Need,” arXiv:1706.03762v7, 2023.

[17] F. Koto, A. Rahimi, J. H. Lau, and T. Baldwin, “IndoLEM and IndoBERT: A Benchmark Dataset and Pre-trained Language Model for Indonesian NLP,” arXiv:2011.00677v1, 2020.

[18] Indolem, “IndoBERT: Indonesian Version of BERT Model,” 2025, [Online]. Available: https://huggingface.co/indolem/indobert-base-uncased.

[19] S. Hochreiter and J. Schmidhuber, "Long Short-Term Memory," Neural Computation, vol. 9, no. 8, pp. 1735–1780, 1997, doi:10.1162/neco.1997.9.8.1735.

[20] S. Minaee, N. Kalchbrenner, E. Cambria, N. Nikzad, M. Chenaghlu, and J. Gao, "Deep Learning Based Text Classification: A Comprehensive Review," arXiv:2004.03705v3, 2021.

[21] Z. C. Lipton, C. Elkan, and B. Narayanaswamy, "Thresholding Classifiers to Maximize F1 Score," arXiv:1402.1892v2, 2014.

Downloads

Published

2025-08-17

Issue

Section

Article

Citation Check

Similar Articles

<< < 4 5 6 7 8 9 10 11 12 13 > >> 

You may also start an advanced similarity search for this article.