An Analysis of Spam Email Detection Performance Assessment Using Machine Learning
DOI:
https://doi.org/10.15575/join.v4i1.298Keywords:
spam detection, e-mail, machine learning, performanceAbstract
Spam email is very annoying for email account users to get relevant information. Detection of email spam has actually been applied to email services for the public with various methods. But for the use of a limited number of company's e-mail accounts, not all e-mail servers provide spam e-mail detection features. The server administrator must add a separate or modular spam detection feature so that e-mail accounts can be protected from spam e-mail. This study aims to get the best method in the process of detecting spam emails. Some machine learning methods such as Logistic Regression, Decision Tree, and Random Forest are applied and compared results to get the most efficient method of detecting spam e-mail. Efficiency measurements are obtained from the speed of training and testing processes, as well as the accuracy in detecting spam emails. The results obtained in this study indicate that the Random Forest method has the best performance with a test data speed of 0.19 seconds and an accuracy of 98%. This result can be used as a reference for the development of spam detection using other methods.
References
S. N. D. Pratiwi and B. S. S. Ulama, “Klasifikasi Email Spam dengan Menggunakan Metode Support Vector Machine dan k-Nearest Neighbor,” J. SAINS DAN SENI ITS, vol. 5, no. 2, 2016.
A. Saputra and M. Syafrizal, “Perancangan dan Implementasi Mail Server pada CV. Sanjaya Anugerah Sejahtera (Isp Jogjaringan) Berbasis Open Source,” J. DASI, vol. 13, no. 2, 2012.
F. Rozi and R. Kartadie, “Deteksi E-Mail dan Spam Menggunakan Fuzzy Association Rule Mining,” J. Ilm. Penelit. dan Pembelajaran Inform., vol. 02, no. 02, 2017.
M. I. Jordan and T. M. Mitchell, “Machine learning: Trends, perspectives, and prospects,” Science, vol. 349, no. 6245, pp. 255–260, 2015.
N. M. Samsudin, C. F. binti Mohd Foozy, N. Alias, P. Shamala, N. F. Othman, and W. I. S. Wan Din, “Youtube spam detection framework using naïve bayes and logistic regression,” Indones. J. Electr. Eng. Comput. Sci., vol. 14, no. 3, p. 1508, Jun. 2019.
N. Alias, C. F. M. Foozy, S. N. Ramli, and N. Zainuddin, “Video spam comment features selection using machine learning techniques,” Indones. J. Electr. Eng. Comput. Sci., vol. 15, no. 2, pp. 1046–1053, Aug. 2019.
A. T. Basuki, Bahan Ajar Ekonometrika. Yogyakarta: Universitas Muhammadiyah Yogyakarta, 2017.
K. Hastuti and E. Y. Hidayat, “Analisis Algoritma Decision Tree untuk Prediksi Mahasiswa Non Aktif,” 2013.
A. Saputra, Pengantar Data Mining: Menambang Permata Pengetahuan di Gunung Data. 2016.
X. Luo, “A New Text Classifier Based on Random Forests,” vol. 107, no. Meita 2016, pp. 290–293, 2017.
T. T. A. Putri, H. W. S, I. Y. Sitepu, M. Sihombing, and Silvi, “Analysis and Detection of Hoax Contents in Indonesian News Based on Machine Learning,” JIPN (Journal Informatics Pelita Nusantara), vol. 4, no. 1, pp. 19–26, 2019.
S. S. Pangastuti, “Perbandingan Metode Ensemble Random Forest dengan Smote-Boosting dan Smote-Bagging pada Klasifikasi Data Mining untuk Kelas Imbalance,” Institut Teknologi Sepuluh Nopember, Surabaya, 2018.
H. W. Nugroho, T. B. Adji, and N. A. Setiawan, “Random Forest Weighting based Feature Selection for C4.5 Algorithm on Wart Treatment Selection Method,” Int. J. Adv. Sci. Eng. Inf. Technol., vol. 8, no. 5, p. 1858, Oct. 2018.
S. Samsuddin, Z. Ali Shah, R. R. Saedudin, S. Kasim, and C. Sen Seah, “Analysis of Attribute Selection and Classification Algorithm Applied to Hepatitis Patients,” Int. J. Adv. Sci. Eng. Inf. Technol., vol. 9, no. 3, p. 967, May 2019.
A. R. Chrismanto and Y. Lukito, “Deteksi Komentar Spam Bahasa Indonesia Pada Instagram Menggunakan Naive Bayes,” J. Ultim., vol. IX, no. 1, 2017.
Downloads
Published
Issue
Section
Citation Check
License
Copyright (c) 2019 Jurnal Online Informatika
This work is licensed under a Creative Commons Attribution-NoDerivatives 4.0 International License.
You are free to:
- Share — copy and redistribute the material in any medium or format for any purpose, even commercially.
- The licensor cannot revoke these freedoms as long as you follow the license terms.
Under the following terms:
-
Attribution — You must give appropriate credit, provide a link to the license, and indicate if changes were made. You may do so in any reasonable manner, but not in any way that suggests the licensor endorses you or your use.
-
NoDerivatives — If you remix, transform, or build upon the material, you may not distribute the modified material.
-
No additional restrictions — You may not apply legal terms or technological measures that legally restrict others from doing anything the license permits.
Notices:
- You do not have to comply with the license for elements of the material in the public domain or where your use is permitted by an applicable exception or limitation.
- No warranties are given. The license may not give you all of the permissions necessary for your intended use. For example, other rights such as publicity, privacy, or moral rights may limit how you use the material.
This work is licensed under a Creative Commons Attribution-NoDerivatives 4.0 International License