Implementation of Recurrent Neural Network (RNN) for Question Similarity Identification in Indonesian Language

Muhammad Iqbal; Hasmawati; Ade Romadhony

doi:10.15575/join.v8i2.1138

Authors

Muhammad Iqbal School of Computing, Informatics, Telkom University, Bandung, Indonesia
Hasmawati School of Computing, Informatics, Telkom University, Bandung, Indonesia, Indonesia
Ade Romadhony School of Computing, Informatics, Telkom University, Bandung, Indonesia, Indonesia

DOI:

https://doi.org/10.15575/join.v8i2.1138

Keywords:

Manhattan Distance, Question in Indonesian, Similarity question, RNN

Abstract

In a question-and-answer forum, the identification of question similarity is used to determine how similar two questions are. This procedure makes sure that user-submitted questions are compared to the questions in a database for matches to improve system performance on the online Q&A platform. Currently, question similarity is mostly done in foreign languages. The purpose of this research is to identify question similarities and evaluate the effectiveness of the methods used in Indonesian language questions. The data used is a public dataset with labeled pairs of questions as 0 and 1 where label 0 for different pairs of questions and label 1 for the same pairs of questions. The method used is a Recurrent Neural Network (RNN) with the Manhattan Distance approach to calculate the similarity distance between two questions. The question pairs are taken as two inputs with a reference label to identify the similarity distance between the two question inputs. We evaluated the model using three different optimizers namely RMSprop, Adam, and Adagrad. The best results were obtained using the Adam optimizer with 80:20 ratio split-data and overall accuracy is 76%, precision is 74%, recall is 98.8%, and F1-score is 85.1%.

References

I. M. S. Putra, Putu Jhonarendra, and Ni Kadek Dwi Rusjayanthi, “Deteksi Kesamaan Teks Jawaban pada Sistem Test Essay Online dengan Pendekatan Neural Network,” Jurnal RESTI (Rekayasa Sistem dan Teknologi Informasi), vol. 5, no. 6, pp. 1070–1082, Dec. 2021, doi: 10.29207/resti.v5i6.3544.

R. E. Setiawan, T. Fabrianti Kusumasari, and M. A. Hasibuan, “Penerapan Deep Learning, NLP(Natural Language Processing) dan Data Visualization untuk Customer Research Digital Marketing Instagram,” Bandung, 2019.

E. D. Liddy, “Natural Language Processing Natural Language Processing Natural Language Processing 1,” 2001. [Online]. Available: https://surface.syr.edu/istpub

G. G. Chowdhury, “Natural Language Processing Dept. of Computer and Information Sciences University of Strathclyde,” Glasgow G1 1XH, UK, 2003.

J. Wang and Y. Dong, “Measurement of text similarity: A survey,” Information (Switzerland), vol. 11, no. 9. MDPI AG, pp. 1–17, Sep. 01, 2020. doi: 10.3390/info11090421.

F. Kunneman, T. C. Ferreira, E. Krahmer, and A. Van Den Bosch, “Question similarity in community question answering: A systematic exploration of preprocessing methods and models,” in International Conference Recent Advances in Natural Language Processing, RANLP, Incoma Ltd, 2019, pp. 593–601. doi: 10.26615/978-954-452-056-4_070.

D. Bogdanova, C. dos Santos, L. Barbosa, and B. Zadrozny, “Detecting semantically equivalent questions in online user forums,” in CoNLL 2015 - 19th Conference on Computational Natural Language Learning, Proceedings, Association for Computational Linguistics (ACL), 2015, pp. 123–131. doi: 10.18653/v1/k15-1013.

Z. Zhu, Z. He, Z. Tang, B. Wang, and W. Chen, “A Semantic Similarity Computing Model based on Siamese Network for Duplicate Questions Identification,” School of Computer Science and Technology, Soochow University, 2018.

B. Ye, G. Feng, A. Cui, and M. Li, “Learning Question Similarity with Recurrent Neural Networks,” in Proceedings - 2017 IEEE International Conference on Big Knowledge, ICBK 2017, Institute of Electrical and Electronics Engineers Inc., Aug. 2017, pp. 111–118. doi: 10.1109/ICBK.2017.46.

N. A. M. B. Y. E. K. M. H. M. A.-S. Muntaha Al-asa’d, Question to Question Similarity Analysis Using Morphological, Syntactic, Semantic, and Lexical Features. 2019.

W. Suwarningsih, R. A. Pratama, F. Y. Rahadika, and M. H. A. Purnomo, “RoBERTa: language modeling in building Indonesian question-answering systems,” Telkomnika (Telecommunication Computing Electronics and Control), vol. 20, no. 6, pp. 1248–1255, Dec. 2022, doi: 10.12928/TELKOMNIKA.v20i6.24248.

N. Adani Setyadi, M. Nasrun, and C. Setianingsih, Text Analysis For Hate Speech Detection Using Backpropagation Neural Network. 2018.

M. A. Rosid, A. S. Fitrani, I. R. I. Astutik, N. I. Mulloh, and H. A. Gozali, “Improving Text Preprocessing for Student Complaint Document Classification Using Sastrawi,” in IOP Conference Series: Materials Science and Engineering, Institute of Physics Publishing, Jul. 2020. doi: 10.1088/1757-899X/874/1/012017.

M. Anandarajan, C. Hill, and T. Nolan, “Text Preprocessing,” 2019, pp. 45–59. doi: 10.1007/978-3-319-95663-3_4.

A. Conneau, H. Schwenk, L. Barrault, and Y. Lecun, “Very Deep Convolutional Networks for Text Classification,” Jun. 2016, [Online]. Available: http://arxiv.org/abs/1606.01781

M. Mansoor, Z. Ur Rehman, M. Shaheen, M. A. Khan, and M. Habib, “Deep learning based semantic similarity detection using text data,” Information Technology and Control, vol. 49, no. 4, pp. 495–510, 2020, doi: 10.5755/j01.itc.49.4.27118.

L. Efrizoni, S. Defit, M. Tajuddin, and A. Anggrawan, “Komparasi Ekstraksi Fitur dalam Klasifikasi Teks Multilabel Menggunakan Algoritma Machine Learning,” MATRIK?: Jurnal Manajemen, Teknik Informatika dan Rekayasa Komputer, vol. 21, no. 3, pp. 653–666, Jul. 2022, doi: 10.30812/matrik.v21i3.1851.

K. Kowsari, K. J. Meimandi, M. Heidarysafa, S. Mendu, L. Barnes, and D. Brown, “Text classification algorithms: A survey,” Information (Switzerland), vol. 10, no. 4. MDPI AG, 2019. doi: 10.3390/info10040150.

M. Naili, A. H. Chaibi, and H. H. Ben Ghezala, “Comparative study of word embedding methods in topic segmentation,” in Procedia Computer Science, Elsevier B.V., 2017, pp. 340–349. doi: 10.1016/j.procs.2017.08.009.

E. M. Dharma, F. Lumban Gaol, H. Leslie, H. S. Warnars, and B. Soewito, “The Accuracy Comparison Among Word2Vec, Glove, and FastTexr Towards Convolution Neural Network (CNN) Text Classification,” J Theor Appl Inf Technol, vol. 31, no. 2, 2022, [Online]. Available: www.jatit.org

X. Zhang, M. H. Chen, and Y. Qin, “NLP-QA Framework Based on LSTM-RNN,” in Proceedings - 2nd International Conference on Data Science and Business Analytics, ICDSBA 2018, Institute of Electrical and Electronics Engineers Inc., Dec. 2018, pp. 307–311. doi: 10.1109/ICDSBA.2018.00065.