Multi Rule-based and Corpus-based for Sundanese Stemmer

Authors

  • Ade Sutedi Department of Informatics Engineering, Institut Teknologi Garut, Indonesia
  • Muhammad Rikza Nasrulloh Department of Informatics Engineering, Institut Teknologi Garut, Indonesia
  • Rickard Elsen Department of Informatics Engineering, Institut Teknologi Garut, Indonesia

DOI:

https://doi.org/10.15575/join.v7i2.846

Keywords:

Corpus-based, Multi Rule-based, Stemmer, Sundanese

Abstract

The purpose of this study is to develop a stemming method by involved several methods including morphological (with affix and pro-lexeme removal), syllable (canonical) pattern, and corpus data as a comparison of the final results of stemming. The algorithm checks a number of the string first and removes affixes, then check the syllable pattern according to the stripping result, then compares to the corpus data which determines the final stemming process. In this study, the corpus data was taken from Sundanese dictionary consists of a single word used for the root word and the extracted dataset from the online Sundanese magazine. The results showed that the stripping of affix and pro-lexeme can remove the corresponding affixes and pro-lexeme then compares words that have a syllable pattern then executes the basic words quickly and the use of corpus can improve accuracy and reduce the over-stemming problems that occur in the stemming process.

Author Biography

Ade Sutedi, Department of Informatics Engineering, Institut Teknologi Garut

Teknik Informatika

References

P. Willett, “The Porter stemming algorithm: then and now,†Program, vol. 40, no. 3, pp. 219–223, Jul. 2006, doi: 10.1108/00330330610681295.

M. Adriani, J. Asian, B. Nazief, S. M. M. Tahaghoghi, and H. E. Williams, “Stemming Indonesian,†ACM Trans. Asian Lang. Inf. Process., vol. 6, no. 4, pp. 1–33, Dec. 2007, doi: 10.1145/1316457.1316459.

A. Purwarianti, “A non deterministic Indonesian stemmer,†Proc. 2011 Int. Conf. Electr. Eng. Informatics, ICEEI 2011, no. October, 2011, doi: 10.1109/ICEEI.2011.6021829.

A. A. Damar, K. Dewi, and U. M. Siti, “Penerapan Algoritma Paice atau Husk untuk Stemming pada Kamus Bahasa Inggris ke Bahasa Indonesia,†J. Tek. Inform., vol. 6, no. 2, Oct. 2013, doi: 10.15408/jti.v6i2.2031.

A. S. Rizki, A. Tjahyanto, and R. Trialih, “Comparison of stemming algorithms and its effect on Indonesian text processing,†TELKOMNIKA (Telecommunication Comput. Electron. Control., vol. 17, no. 1, p. 95, Feb. 2019, doi: 10.12928/telkomnika.v17i1.10183.

Y. Anistyasari and E. Hariadi, “Algoritma Baru Pembentukan Kata Dasar Pada Proses Stemming Bahasa Indonesia,†Pros. SNRT (Seminar Nas. Ris. Ter., vol. 5662, no. November, pp. 70–76, 2019.

F. Amin and Purwatiningtyas, “Stemmer Bahasa Jawa Ngoko dengan Metode Affix Removal Stemmer (Rule Base Approach),†J. Teknol. Inf. Din., vol. 21, no. 1, pp. 16–24, 2016.

N. Hidayatullah, A. P. Wibawa, and H. A. Rosyid, “Penerapan ECS Stemmer untuk Modifikasi Nazief & Adriani Berbahasa Jawa,†vol. 3, no. 3, pp. 343–348, 2019.

R. Maulidi, “Stemmer Untuk Bahasa Madura Dengan Modifikasi Metode Enhanced Confix Stripping Stemmer,†in Prosiding Seminar Nasional FDI 2016, 2016, no. December, pp. 12–15.

G. Ngurah, M. Nata, and P. P. Yudiastra, “Stemming teks sor-singgih Bahasa Bali,†Konf. Nas. Sist. Inform. 2017 STMIK, no. Agustus, pp. 608–612, 2017.

M. Agus, P. Subali, C. Fatichah, and D. Informatika, “Kombinasi Metode Rule-Based Dan N-Gram Stemming Untuk Mengenali Stemmer Bahasa Bali,†J. Teknol. Inf. dan Ilmu Komput., vol. 6, no. 2, 2019, doi: 10.25126/jtiik.201961105.

D. Junaedi, O. Herlistiono, and D. Akbar, “Stemmer For Basa Sunda,†pp. 275–278, 2010.

A. Purwoko, “Model Stemming Berbasis kamus untuk dokumen berbahasa sunda,†INSTITUT PERTANIAN BOGOR, 2011.

A. A. Suryani, D. H. Widyantoro, A. Purwarianti, and Y. Sudaryat, “The Rule-Based Sundanese Stemmer,†ACM Trans. Asian Low-Resource Lang. Inf. Process., vol. 17, no. 4, pp. 1–28, Aug. 2018, doi: 10.1145/3195634.

A. Sutedi, R. Elsen, and M. R. Nashrulloh, “Sundanese Stemming using Syllable Pattern,†vol. 6, no. 2, pp. 218–224, 2021, doi: 10.15575/join.v6i2.812.

I. Baidillah et al., Direktori Aksara Sunda untuk Unicode, 1st ed. Dinas Pendidikan Provinsi Jawa Barat, 2008.

D. Sudaryat, Yayat, A. Prawirasumantri, and K. Yudibrata, Tata Basa Sunda Kiwari. Bandung: Yrama Widya, 2013.

L. S. Faznur et al., “Komparasi fonem bahasa sunda dan bahasa indonesia dalam buku teks,†Pena Literasi J. Pendidik. Bhs. dan Sastra Indones., vol. 2, no. 2, pp. 105–114, 2019.

R. A. Danadibrata, Kamus Basa Sunda, 4th ed. Bandung: Panitia Penerbitan Kamus Basa Sunda dan PT. Kiblat Buku Utama, 2015.

Downloads

Published

2022-12-29

Issue

Section

Article

Citation Check

Most read articles by the same author(s)