Sundanese Stemming using Syllable Pattern
DOI:
https://doi.org/10.15575/join.v6i2.812Keywords:
Phonology, Stemming, Sundanese, SyllableAbstract
Stemming is a technique to return the word derivation to the root or base word. Stemming is widely used for data processing such as searching word indexes, translating, and information retrieval from a document in the database. In general, stemming uses a morphological pattern from a derived word to produce the original word or root word. In the previous research, this technique faced over-stemming and under-stemming problems. In this study, the stemming process will be improved by the syllable pattern (canonical) based on the phonological rule in Sundanese. The stemming result for syllable patterns gets an accuracy of 89% and the execution of the test data resulted in 95% from all the basic words. This simple algorithm has the advantage of being able to adjust the position of the syllable pattern with the word to be stemmed. Due to some data shortage constraints (typo, loan-word, non-deterministic word with syllable pattern), we can improve to increase the accuracy such as adjusting words and adding reference dictionaries. In addition, this algorithm has a drawback that causes the execution to be over-stemming.
References
E. Z. Arifin, “Bahasa Sunda Dialek Priangan,†Pujangga, vol. 2, no. 1, pp. 1–44, 2016.
https://www.pikiran-rakyat.com/pendidikan/pr-01342765/bahasa-sunda-hadapi-tantangan-besar-pemerintah-lakukan-beragam-upaya. Accessed 1 July 2021.
F. Amin and Purwatiningtyas, “Stemmer Bahasa Jawa Ngoko dengan Metode Affix Removal Stemmer (Rule Base Approach),†J. Teknol. Inf. Din., vol. 21, no. 1, pp. 16–24, 2016.
N. Hidayatullah, A. P. Wibawa, and H. A. Rosyid, “Penerapan ECS Stemmer untuk Modifikasi Nazief & Adriani Berbahasa Jawa,†vol. 3, no. 3, pp. 343–348, 2019.
R. Maulidi, “Modifikasi Metode Enhanced Confix Stripping,†Pros. Semin. Nas. FDI 2016, no. December, pp. 12–15, 2016.
G. Ngurah, M. Nata, and P. P. Yudiastra, “Stemming teks sor-singgih Bahasa Bali,†Konf. Nas. Sist. Inform. 2017 STMIK, no. Agustus, pp. 608–612, 2017.
M. Agus, P. Subali, C. Fatichah, and D. Informatika, “Kombinasi Metode Rule-Based Dan N-Gram Stemming Untuk A Combination Of Methods Rule-Based And N-Gram Stemming To Recognize Balinese Language Stemmer,†vol. 6, no. 2, 2019, doi: 10.25126/jtiik.201961105.
D. Junaedi, O. Herlistiono, and D. Akbar, “Stemmer for ‘Basa Sunda,’†Semin. Nas. ILMU Komput. Univ. DIPONEGORO, pp. 275–278, 2010.
A. Purwoko, “Model Stemming Berbasis kamus untuk dokumen berbahasa sunda,†INSTITUT PERTANIAN BOGOR, 2011.
A. Ardiyanti Suryani, D. Hendratmo Widyantoro, A. Purwarianti, and Y. Sudaryat, “The rule-based sundanese stemmer,†ACM Trans. Asian Low-Resource Lang. Inf. Process., vol. 17, no. 4, 2018, doi: 10.1145/3195634.
A. Mirna, J. Asian, B. Nazief, S. M. M. Tahaghoghi, and H. Williams, “Stemming Indonesian : A confix-stripping approach,†no. September 2018, 2007, doi: 10.1145/1316457.1316459.
A. Purwarianti, “A non deterministic Indonesian stemmer,†Proc. 2011 Int. Conf. Electr. Eng. Informatics, ICEEI 2011, no. October, 2011, doi: 10.1109/ICEEI.2011.6021829.
P. Willett, “The Porter stemming algorithm: then and now,†Program, vol. 40, no. 3, pp. 219–223, Jul. 2006, doi: 10.1108/00330330610681295.
Y. Sudaryat, Tatabasa Sunda Kiwari. 2013.
L. S. Faznur et al., “Komparasi fonem bahasa sunda dan bahasa indonesia dalam buku teks,†Pena Literasi J. Pendidik. Bhs. dan Sastra Indones., vol. 2, no. 2, pp. 105–114, 2019.
A. Djamaludin, M. Patoni, A. Sumantri, R. H. M. Koerdie, M. O. Koesman, and E. S. Adisastra, Kamus Sunda Indonesia. 1985., [Online]. Available: http://repositori.kemdikbud.go.id/2954/1/Kamus Sunda-Indonesia - %28449h%29a.pdf
I. Baidillah et al., Direktori Aksara Sunda untuk Unicode, 1st ed. Dinas Pendidikan Provinsi Jawa Barat, 2008.
K. Sodimana et al., “A Step-by-Step Process for Building TTS Voices Using Open Source Data and Framework for Bangla, Javanese, Khmer, Nepali, Sinhala, and Sundanese,†in Proc. The 6th Intl. Workshop on Spoken Language Technologies for Under-Resourced Languages (SLTU), Aug. 2018, pp. 66–70, [Online]. Available: http://dx.doi.org/10.21437/SLTU.2018-14.
Downloads
Published
Issue
Section
Citation Check
License
Copyright (c) 2021 Jurnal Online Informatika
This work is licensed under a Creative Commons Attribution-NoDerivatives 4.0 International License.
You are free to:
- Share — copy and redistribute the material in any medium or format for any purpose, even commercially.
- The licensor cannot revoke these freedoms as long as you follow the license terms.
Under the following terms:
-
Attribution — You must give appropriate credit, provide a link to the license, and indicate if changes were made. You may do so in any reasonable manner, but not in any way that suggests the licensor endorses you or your use.
-
NoDerivatives — If you remix, transform, or build upon the material, you may not distribute the modified material.
-
No additional restrictions — You may not apply legal terms or technological measures that legally restrict others from doing anything the license permits.
Notices:
- You do not have to comply with the license for elements of the material in the public domain or where your use is permitted by an applicable exception or limitation.
- No warranties are given. The license may not give you all of the permissions necessary for your intended use. For example, other rights such as publicity, privacy, or moral rights may limit how you use the material.
This work is licensed under a Creative Commons Attribution-NoDerivatives 4.0 International License