Study of the Application of Text Augmentation with Paraphrasing to Overcome Imbalanced Data in Indonesian Text Classification
DOI:
https://doi.org/10.15575/join.v10i1.1472Keywords:
Imbalanced dataset, Paraphrase, Pre-trained model, Text augmentation, Text classificationAbstract
References
[1] E. Olshannikova, T. Olsson, J. Huhtamäki, and H. Kärkkäinen, “Conceptualizing Big Social Data,” J. Big data, vol. 4, no. 1, 2017, doi: 10.1186/s40537-017-0063-x.
[2] J. Kaur and J. R. Saini, “A Study of Text Classification Natural Language Processing Algorithms for Indian Languages,” Vnsgu J. Sci. Technol., vol. 4, no. 1, pp. 162–167, 2015.
[3] Y. Ko and J. Seo, “Automatic text categorization by unsupervised learning,” pp. 453–459, 2000, doi: 10.3115/990820.990886.
[4] G. Haixiang, L. Yijing, J. Shang, G. Mingyun, H. Yuanyue, and G. Bing, “Learning from class-imbalanced data: Review of methods and applications,” Expert Syst. Appl., vol. 73, pp. 220–239, 2017, doi: 10.1016/j.eswa.2016.12.035.
[5] N. A. Verdikha, T. B. Adji, and A. E. Permanasari, “Komparasi Metode Oversampling Untuk Klasifikasi Teks Ujaran Kebencian,” in Semin. Nas. Teknol. Inf. dan Multimed. 2018, pp. 85–90, 2018.
[6] A. Sun, E. P. Lim, and Y. Liu, “On strategies for imbalanced text classification using SVM: A comparative study,” Decis. Support Syst., vol. 48, no. 1, pp. 191–201, 2009, doi: 10.1016/j.dss.2009.07.011.
[7] T. A. Le and D. Moeljadi, “Sentiment Analysis for Low Resource Languages: A Study on Informal Indonesian Tweets,” pp. 123–131, 2016.
[8] V. S. Spelmen and R. Porkodi, “A Review on Handling Imbalanced Data,” in Proc. 2018 Int. Conf. Curr. Trends Towar. Converging Technol. ICCTCT 2018, pp. 1–11, 2018, doi: 10.1109/ICCTCT.2018.8551020.
[9] I.A. Rahma and L. H. Suadaa “Penerapan Text Augmentation untuk Mengatasi Data yang Tidak Seimbang pada Klasifikasi Teks Berbahasa Indonesia,” 2023.
[10] A. A. Tavor et al., “Do not have enough data? Deep learning to the rescue!” in 34th AAAI Conference on Artificial Intelligence, 2020, pp. 7383–7390. doi: 10.1609/aaai.v34i05.6233.
[11] D. R. Beddiar, M. S. Jahan, and M. Oussalah, “Data expansion using back translation and paraphrasing for hate speech detection,” Online Social Networks and Media, vol. 24, 2021, doi: 10.1016/j.osnem.2021.100153.
[12] William and Y. Sari, “CLICK-ID: A novel dataset for Indonesian clickbait headlines,” Data in Brief, vol. 32, p. 106231, 2020, doi: 10.1016/j.dib.2020.106231.
[13] A. D. Sanya and L. H. Suadaa, “Handling Imbalanced Dataset on Hate Speech Detection in Indonesian Online News Comments,” in 2022 10th ICoICT, Bandung, Indonesia, 2022, pp. 380–385, doi: 10.1109/ICoICT55009.2022.9914883.
[14] A. Purwarianti and I. A. P. A. Crisdayanti, “Improving Bi-LSTM Performance for Indonesian Sentiment Analysis Using Paragraph Vector,” in 2019 ICAICTA, Yogyakarta, Indonesia, 2019, pp. 1-5, doi: 10.1109/ICAICTA.2019.8904199.
[15] A. F. Aji et al., “ParaCotta: Synthetic Multilingual Paraphrase Corpora from the Most Diverse Translation Sample Pair,” Proc. 35th Pacific Asia Conf. Lang. Inf. Comput. PACLIC 2021, pp. 666–675, 2021, doi: 10.48550/arXiv.2205.04651.
[16] A. Kumar, S. Bhattamishra, M. Bhandari, and P. Talukdar, “Submodular optimization-based diverse paraphrasing and its effectiveness in data augmentation,” in 2019 Proceedings of NAACL-HLT, Jun 2019, pp. 3609–3619, doi: 10.18653/v1/N19-1363.
[17] B. Li, Y. Hou, and W. Che, “Data augmentation approaches in natural language processing: A survey,” AI Open, vol. 3, pp. 71–90, 2022, doi: 10.1016/j.aiopen.2022.03.001.
[18] J. Li, T. Tang, W. X. Zhao, and J. R. Wen, “Pretrained Language Models for Text Generation: A Survey,” IJCAI Int. Jt. Conf. Artif. Intell., vol. 1, no. 1, pp. 4492–4499, 2021, doi: 10.24963/ijcai.2021/612.
[19] S. Cahyawijaya et al., “IndoNLG: Benchmark and Resources for Evaluating Indonesian Natural Language Generation,” In Proceedings of the 2021 Conference on Empirical Methods in Natural Language Processing, Online and Punta Cana, Dominican Republic: Association for Computational Linguistics, Nov 2021, pp. 8875–8898, doi: 10.48550/arXiv.2104.08200.
[20] Y. Liu et al., “Multilingual denoising pre-training for neural machine translation,” Trans. Assoc. Comput. Linguist., vol. 8, pp. 726–742, 2020, doi: 10.1162/tacl_a_00343.
[21] Y. Tang et al., “Multilingual Translation with Extensible Multilingual Pretraining and Finetuning,” 2020, doi: 10.48550/arXiv.2008.00401.
[22] S. Y. Feng et al., “A Survey of Data Augmentation Approaches for NLP,” Find. Assoc. Comput. Linguist. ACL-IJCNLP 2021, pp. 968–988, 2021, doi: 10.18653/v1/2021.findings-acl.84.
[23] J. Cervantes, F. Garcia-Lamont, L. Rodríguez-Mazahua, and A. Lopez, “A comprehensive survey on support vector machine classification: Applications, challenges and trends,” Neurocomputing, vol. 408, pp. 189–215, 2020, doi: 10.1016/j.neucom.2019.10.118.
[24] S. Qaiser and R. Ali, “Text Mining: Use of TF-IDF to Examine the Relevance of Words to Documents,” Int. J. Comput. Appl., vol. 181, no. 1, pp. 25–29, 2018, doi: 10.5120/ijca2018917395.
[25] B. Wilie et al., “IndoNLU: Benchmark and Resources for Evaluating Indonesian Natural Language Understanding,” Proc. 1st Conf. Asia-Pacific Chapter Assoc. Comput. Linguist. 10th Int. Jt. Conf. Nat. Lang. Process., pp. 843–857, 2020, doi: 10.48550/arXiv.2009.05387.
[26] K. Papineni, S. Roukos, T. Ward, and W.-J. Zhu, “BLEU: a Method for Automatic Evaluation of Machine Translation,” Proc. 40th Annu. Meet. Assoc. Comput. Linguist., pp. 311–318, 2002, doi: 10.3917/chev.030.0107.
[27] T. Zhang, V. Kishore, F. Wu, K. Q. Weinberger, and Y. Artzi, “Bertscore: Evaluating Text Generation with Bert,” 8th Int. Conf. Learn. Represent. ICLR 2020, pp. 1–43, 2020, doi: 10.48550/arXiv.1904.09675.
[28] A. Graefe, M. Haim, B. Haarmann, and H. B. Brosius, “Readers’ perception of computer-generated news: Credibility, expertise, and readability,” Journalism, vol. 19, no. 5, pp. 595–610, 2018, doi: 10.1177/1464884916641269.
[29] Orme, “MaxDiff Analysis: Simple Counting, Individual-Lelvel Logit, and HB” vol.98382, no. 360, 2009.
[30] S. Kiritchenko and S. M. Mohammad, “Best–Worst scaling more reliable than rating scales: A case study on sentiment intensity annotation,” ACL 2017 - 55th Annu. Meet. Assoc. Comput. Linguist. Proc. Conf. (Long Pap., vol. 2, pp. 465–470, 2017, doi: 10.18653/v1/P17-2074.
Downloads
Published
Issue
Section
Citation Check
License
Copyright (c) 2025 Mutiara Indryan Sari, Lya Hulliyyatus Suadaa

This work is licensed under a Creative Commons Attribution-NoDerivatives 4.0 International License.
You are free to:
- Share — copy and redistribute the material in any medium or format for any purpose, even commercially.
- The licensor cannot revoke these freedoms as long as you follow the license terms.
Under the following terms:
-
Attribution — You must give appropriate credit, provide a link to the license, and indicate if changes were made. You may do so in any reasonable manner, but not in any way that suggests the licensor endorses you or your use.
-
NoDerivatives — If you remix, transform, or build upon the material, you may not distribute the modified material.
-
No additional restrictions — You may not apply legal terms or technological measures that legally restrict others from doing anything the license permits.
Notices:
- You do not have to comply with the license for elements of the material in the public domain or where your use is permitted by an applicable exception or limitation.
- No warranties are given. The license may not give you all of the permissions necessary for your intended use. For example, other rights such as publicity, privacy, or moral rights may limit how you use the material.
This work is licensed under a Creative Commons Attribution-NoDerivatives 4.0 International License