An Analysis of Spam Email Detection Performance Assessment Using Machine Learning

Authors

DOI:

https://doi.org/10.15575/join.v4i1.298

Keywords:

spam detection, e-mail, machine learning, performance

Abstract

Spam email is very annoying for email account users to get relevant information. Detection of email spam has actually been applied to email services for the public with various methods. But for the use of a limited number of company's e-mail accounts, not all e-mail servers provide spam e-mail detection features. The server administrator must add a separate or modular spam detection feature so that e-mail accounts can be protected from spam e-mail. This study aims to get the best method in the process of detecting spam emails. Some machine learning methods such as Logistic Regression, Decision Tree, and Random Forest are applied and compared results to get the most efficient method of detecting spam e-mail. Efficiency measurements are obtained from the speed of training and testing processes, as well as the accuracy in detecting spam emails. The results obtained in this study indicate that the Random Forest method has the best performance with a test data speed of 0.19 seconds and an accuracy of 98%. This result can be used as a reference for the development of spam detection using other methods.

References

S. N. D. Pratiwi and B. S. S. Ulama, “Klasifikasi Email Spam dengan Menggunakan Metode Support Vector Machine dan k-Nearest Neighbor,” J. SAINS DAN SENI ITS, vol. 5, no. 2, 2016.

A. Saputra and M. Syafrizal, “Perancangan dan Implementasi Mail Server pada CV. Sanjaya Anugerah Sejahtera (Isp Jogjaringan) Berbasis Open Source,” J. DASI, vol. 13, no. 2, 2012.

F. Rozi and R. Kartadie, “Deteksi E-Mail dan Spam Menggunakan Fuzzy Association Rule Mining,” J. Ilm. Penelit. dan Pembelajaran Inform., vol. 02, no. 02, 2017.

M. I. Jordan and T. M. Mitchell, “Machine learning: Trends, perspectives, and prospects,” Science, vol. 349, no. 6245, pp. 255–260, 2015.

N. M. Samsudin, C. F. binti Mohd Foozy, N. Alias, P. Shamala, N. F. Othman, and W. I. S. Wan Din, “Youtube spam detection framework using naïve bayes and logistic regression,” Indones. J. Electr. Eng. Comput. Sci., vol. 14, no. 3, p. 1508, Jun. 2019.

N. Alias, C. F. M. Foozy, S. N. Ramli, and N. Zainuddin, “Video spam comment features selection using machine learning techniques,” Indones. J. Electr. Eng. Comput. Sci., vol. 15, no. 2, pp. 1046–1053, Aug. 2019.

A. T. Basuki, Bahan Ajar Ekonometrika. Yogyakarta: Universitas Muhammadiyah Yogyakarta, 2017.

K. Hastuti and E. Y. Hidayat, “Analisis Algoritma Decision Tree untuk Prediksi Mahasiswa Non Aktif,” 2013.

A. Saputra, Pengantar Data Mining: Menambang Permata Pengetahuan di Gunung Data. 2016.

X. Luo, “A New Text Classifier Based on Random Forests,” vol. 107, no. Meita 2016, pp. 290–293, 2017.

T. T. A. Putri, H. W. S, I. Y. Sitepu, M. Sihombing, and Silvi, “Analysis and Detection of Hoax Contents in Indonesian News Based on Machine Learning,” JIPN (Journal Informatics Pelita Nusantara), vol. 4, no. 1, pp. 19–26, 2019.

S. S. Pangastuti, “Perbandingan Metode Ensemble Random Forest dengan Smote-Boosting dan Smote-Bagging pada Klasifikasi Data Mining untuk Kelas Imbalance,” Institut Teknologi Sepuluh Nopember, Surabaya, 2018.

H. W. Nugroho, T. B. Adji, and N. A. Setiawan, “Random Forest Weighting based Feature Selection for C4.5 Algorithm on Wart Treatment Selection Method,” Int. J. Adv. Sci. Eng. Inf. Technol., vol. 8, no. 5, p. 1858, Oct. 2018.

S. Samsuddin, Z. Ali Shah, R. R. Saedudin, S. Kasim, and C. Sen Seah, “Analysis of Attribute Selection and Classification Algorithm Applied to Hepatitis Patients,” Int. J. Adv. Sci. Eng. Inf. Technol., vol. 9, no. 3, p. 967, May 2019.

A. R. Chrismanto and Y. Lukito, “Deteksi Komentar Spam Bahasa Indonesia Pada Instagram Menggunakan Naive Bayes,” J. Ultim., vol. IX, no. 1, 2017.

Downloads

Published

2019-09-06

Issue

Section

Article

Citation Check