Implementation of Recurrent Neural Network (RNN) for Question Similarity Identification in Indonesian Language
DOI:
https://doi.org/10.15575/join.v8i2.1138Keywords:
Manhattan Distance, Question in Indonesian, Similarity question, RNNAbstract
In a question-and-answer forum, the identification of question similarity is used to determine how similar two questions are. This procedure makes sure that user-submitted questions are compared to the questions in a database for matches to improve system performance on the online Q&A platform. Currently, question similarity is mostly done in foreign languages. The purpose of this research is to identify question similarities and evaluate the effectiveness of the methods used in Indonesian language questions. The data used is a public dataset with labeled pairs of questions as 0 and 1 where label 0 for different pairs of questions and label 1 for the same pairs of questions. The method used is a Recurrent Neural Network (RNN) with the Manhattan Distance approach to calculate the similarity distance between two questions. The question pairs are taken as two inputs with a reference label to identify the similarity distance between the two question inputs. We evaluated the model using three different optimizers namely RMSprop, Adam, and Adagrad. The best results were obtained using the Adam optimizer with 80:20 ratio split-data and overall accuracy is 76%, precision is 74%, recall is 98.8%, and F1-score is 85.1%.
References
I. M. S. Putra, Putu Jhonarendra, and Ni Kadek Dwi Rusjayanthi, “Deteksi Kesamaan Teks Jawaban pada Sistem Test Essay Online dengan Pendekatan Neural Network,” Jurnal RESTI (Rekayasa Sistem dan Teknologi Informasi), vol. 5, no. 6, pp. 1070–1082, Dec. 2021, doi: 10.29207/resti.v5i6.3544.
R. E. Setiawan, T. Fabrianti Kusumasari, and M. A. Hasibuan, “Penerapan Deep Learning, NLP(Natural Language Processing) dan Data Visualization untuk Customer Research Digital Marketing Instagram,” Bandung, 2019.
E. D. Liddy, “Natural Language Processing Natural Language Processing Natural Language Processing 1,” 2001. [Online]. Available: https://surface.syr.edu/istpub
G. G. Chowdhury, “Natural Language Processing Dept. of Computer and Information Sciences University of Strathclyde,” Glasgow G1 1XH, UK, 2003.
J. Wang and Y. Dong, “Measurement of text similarity: A survey,” Information (Switzerland), vol. 11, no. 9. MDPI AG, pp. 1–17, Sep. 01, 2020. doi: 10.3390/info11090421.
F. Kunneman, T. C. Ferreira, E. Krahmer, and A. Van Den Bosch, “Question similarity in community question answering: A systematic exploration of preprocessing methods and models,” in International Conference Recent Advances in Natural Language Processing, RANLP, Incoma Ltd, 2019, pp. 593–601. doi: 10.26615/978-954-452-056-4_070.
D. Bogdanova, C. dos Santos, L. Barbosa, and B. Zadrozny, “Detecting semantically equivalent questions in online user forums,” in CoNLL 2015 - 19th Conference on Computational Natural Language Learning, Proceedings, Association for Computational Linguistics (ACL), 2015, pp. 123–131. doi: 10.18653/v1/k15-1013.
Z. Zhu, Z. He, Z. Tang, B. Wang, and W. Chen, “A Semantic Similarity Computing Model based on Siamese Network for Duplicate Questions Identification,” School of Computer Science and Technology, Soochow University, 2018.
B. Ye, G. Feng, A. Cui, and M. Li, “Learning Question Similarity with Recurrent Neural Networks,” in Proceedings - 2017 IEEE International Conference on Big Knowledge, ICBK 2017, Institute of Electrical and Electronics Engineers Inc., Aug. 2017, pp. 111–118. doi: 10.1109/ICBK.2017.46.
N. A. M. B. Y. E. K. M. H. M. A.-S. Muntaha Al-asa’d, Question to Question Similarity Analysis Using Morphological, Syntactic, Semantic, and Lexical Features. 2019.
W. Suwarningsih, R. A. Pratama, F. Y. Rahadika, and M. H. A. Purnomo, “RoBERTa: language modeling in building Indonesian question-answering systems,” Telkomnika (Telecommunication Computing Electronics and Control), vol. 20, no. 6, pp. 1248–1255, Dec. 2022, doi: 10.12928/TELKOMNIKA.v20i6.24248.
N. Adani Setyadi, M. Nasrun, and C. Setianingsih, Text Analysis For Hate Speech Detection Using Backpropagation Neural Network. 2018.
M. A. Rosid, A. S. Fitrani, I. R. I. Astutik, N. I. Mulloh, and H. A. Gozali, “Improving Text Preprocessing for Student Complaint Document Classification Using Sastrawi,” in IOP Conference Series: Materials Science and Engineering, Institute of Physics Publishing, Jul. 2020. doi: 10.1088/1757-899X/874/1/012017.
M. Anandarajan, C. Hill, and T. Nolan, “Text Preprocessing,” 2019, pp. 45–59. doi: 10.1007/978-3-319-95663-3_4.
A. Conneau, H. Schwenk, L. Barrault, and Y. Lecun, “Very Deep Convolutional Networks for Text Classification,” Jun. 2016, [Online]. Available: http://arxiv.org/abs/1606.01781
M. Mansoor, Z. Ur Rehman, M. Shaheen, M. A. Khan, and M. Habib, “Deep learning based semantic similarity detection using text data,” Information Technology and Control, vol. 49, no. 4, pp. 495–510, 2020, doi: 10.5755/j01.itc.49.4.27118.
L. Efrizoni, S. Defit, M. Tajuddin, and A. Anggrawan, “Komparasi Ekstraksi Fitur dalam Klasifikasi Teks Multilabel Menggunakan Algoritma Machine Learning,” MATRIK?: Jurnal Manajemen, Teknik Informatika dan Rekayasa Komputer, vol. 21, no. 3, pp. 653–666, Jul. 2022, doi: 10.30812/matrik.v21i3.1851.
K. Kowsari, K. J. Meimandi, M. Heidarysafa, S. Mendu, L. Barnes, and D. Brown, “Text classification algorithms: A survey,” Information (Switzerland), vol. 10, no. 4. MDPI AG, 2019. doi: 10.3390/info10040150.
M. Naili, A. H. Chaibi, and H. H. Ben Ghezala, “Comparative study of word embedding methods in topic segmentation,” in Procedia Computer Science, Elsevier B.V., 2017, pp. 340–349. doi: 10.1016/j.procs.2017.08.009.
E. M. Dharma, F. Lumban Gaol, H. Leslie, H. S. Warnars, and B. Soewito, “The Accuracy Comparison Among Word2Vec, Glove, and FastTexr Towards Convolution Neural Network (CNN) Text Classification,” J Theor Appl Inf Technol, vol. 31, no. 2, 2022, [Online]. Available: www.jatit.org
X. Zhang, M. H. Chen, and Y. Qin, “NLP-QA Framework Based on LSTM-RNN,” in Proceedings - 2nd International Conference on Data Science and Business Analytics, ICDSBA 2018, Institute of Electrical and Electronics Engineers Inc., Dec. 2018, pp. 307–311. doi: 10.1109/ICDSBA.2018.00065.
Downloads
Published
Issue
Section
Citation Check
License
Copyright (c) 2023 Muhammad Iqbal, Hasmawati, Ade Romadhony
This work is licensed under a Creative Commons Attribution-NoDerivatives 4.0 International License.
You are free to:
- Share — copy and redistribute the material in any medium or format for any purpose, even commercially.
- The licensor cannot revoke these freedoms as long as you follow the license terms.
Under the following terms:
-
Attribution — You must give appropriate credit, provide a link to the license, and indicate if changes were made. You may do so in any reasonable manner, but not in any way that suggests the licensor endorses you or your use.
-
NoDerivatives — If you remix, transform, or build upon the material, you may not distribute the modified material.
-
No additional restrictions — You may not apply legal terms or technological measures that legally restrict others from doing anything the license permits.
Notices:
- You do not have to comply with the license for elements of the material in the public domain or where your use is permitted by an applicable exception or limitation.
- No warranties are given. The license may not give you all of the permissions necessary for your intended use. For example, other rights such as publicity, privacy, or moral rights may limit how you use the material.
This work is licensed under a Creative Commons Attribution-NoDerivatives 4.0 International License