Cyberbullying Detection in the Libyan Dialect Using Convolutional Neural Networks

Sara M. Elgoud; Mustafa Ali Abuzaraida; Zainab S. Attarbashi; Mohamed Ali  Saip

doi:10.15575/join.v10i2.1631

Authors

Sara M. Elgoud Department of Computer Science, College of Information Technology, Al-Asmaria Islamic University, Libya https://orcid.org/0009-0002-9920-0629
Mustafa Ali Abuzaraida Department of Computer Science, Faculty of Information Technology, Misurata University, Libya https://orcid.org/0000-0002-9327-8639
Zainab S. Attarbashi Faculty of Information and Communication Technology, International Islamic University Malaysia, Selangor, Malaysia https://orcid.org/0000-0002-1452-8098
Mohamed Ali Saip School of Computing, Universiti Utara Malaysia, Sintok, Kedah, Malaysia https://orcid.org/0000-0002-9777-172X

DOI:

https://doi.org/10.15575/join.v10i2.1631

Keywords:

Arabic Dialect, Convolutional Neural Network, Cyberbullying, Deep Learning, Meta-Learning, Natural language processing, Removing stopwords

Abstract

ecently, the widespread use of social media has increased, leading to increased concerns about cyberbullying. It has become imperative to intensify efforts and methods to detect and manage cyberbullying through social media. Arabic has recently received increasing attention to improve the classification of Arabic texts. Given the multitude of Arabic dialects used on social media platforms by Arabic speakers to express their opinions and communicate with each other, applying this approach to Arabic becomes extremely challenging due to its structural and morphological complexity. Analyzing Arabic dialects using Natural Language Processing (NLP) tools can be more challenging than Standard Arabic. In this paper, the impact of using stopword removal and derivation techniques on detecting cyberbullying in the Libyan dialect was presented. The efficiency of text classification was compared when using a Libyan dialect word list alongside pre-generated Modern Standard Arabic (MSA) lists. The texts were classified using Convolutional Neural Network (CNN) classifiers, and the experiments showed that when using Libyan dialect words, the accuracy results were 92% and 83%, and when using only Standard Arabic stop words, the accuracy results were dropped to 91% and 77%. Based on these results, the higher accuracy was obtained when using the presented stop words list which it is specific to the Libyan dialect, and they had a positive impact on the results, better than Standard Arabic stop words.

References

[1] W. Medhat, A. H. Yousef, and H. Korashy, “Corpora preparation and stopword list generation for Arabic data in social network,” arXiv preprint, arXiv:1410.1135, 2014. [Online]. Available: https://arxiv.org/abs/1410.1135

[2] B. Haidar, M. Chamoun, and A. Serhrouchni, “Arabic cyberbullying detection: Using deep learning,” in Proc. 7th Int. Conf. Comput. Commun. Eng. (ICCCE), Kuala Lumpur, Malaysia, 2018, pp. 1–6.

[3] M. Khairy et al., “Comparative performance of ensemble machine learning for Arabic cyberbullying and offensive language detection,” Lang. Resour. Eval., vol. 58, no. 2, pp. 695–712, 2024, doi: 10.1007/s10579-023-09683-y.

[4] M. Jarrar et al., “Lîsan: Yemeni, Iraqi, Libyan, and Sudanese Arabic dialect corpora with morphological annotations,” arXiv preprint, arXiv:2212.06468, 2022. [Online]. Available: https://arxiv.org/abs/2212.06468

[5] S. Almutiry and M. Abdel Fattah, “Arabic cyberbullying detection using Arabic sentiment analysis,” Egypt. J. Lang. Eng., vol. 8, no. 1, pp. 39–50, Apr. 2021, doi: 10.21608/ejle.2021.50240.1017.

[6] M. M. Abubaera and S. M. Jiddah, “Natural language processing and sentiment analysis for Libyan Arabic language dataset,” Int. J. Adv. Res. Eng. Sci. Manag., vol. 9, no. 7, pp. 1–6, Jul. 2023.

[7] A. Alhazmi et al., “Code-mixing unveiled: Enhancing the hate speech detection in Arabic dialect tweets using machine learning models,” PLoS One, vol. 19, no. 7, p. e0305657, 2024, doi: 10.1371/journal.pone.0305657.

[8] M. Alkhatib et al., “Deep learning approaches for detecting Arabic cyberbullying social media,” Procedia Comput. Sci., vol. 244, pp. 278–286, 2024, doi: 10.1016/j.procs.2024.10.201.

[9] W. M. Yafooz, “Enhancing Arabic dialect detection on social media: A hybrid model with an attention mechanism,” Information, vol. 15, no. 6, p. 316, 2024, doi: 10.3390/info15060316.

[10] W. Medhat, A. Yousef, and H. Korashy, “Egyptian dialect stopword list generation from social network data,” Egypt. J. Lang. Eng., vol. 2, no. 1, pp. 43–55, 2015.

[11] Y. A.-A. Hazzaimeh, N. M. Norwawi, and N. A. R. Khalaf, “Generating Arabic stop-word for Hadith,” Malays. J. Sci. Health Technol., vol. 4, pp. 1–6, 2019, doi: 10.33102/mjosht.v4iSpecial%20Issue.86.

[12] T. Kanan et al., “Improving Arabic text classification using P-stemmer,” Recent Adv. Comput. Sci. Commun., vol. 15, no. 3, pp. 404–411, 2022.

[13] Z. Benmounah, A. Boulesnane, A. Fadheli, and M. Khial, “Sentiment analysis on Algerian dialect with transformers,” Appl. Sci., vol. 13, no. 20, p. 11157, 2023, doi: 10.3390/app132011157.

[14] E. M. Cherrat, H. Ouahi, and A. Bekkar, “Sentiment analysis from texts written in standard Arabic and Moroccan dialect based on deep learning approaches,” Int. J. Comput. Digit. Syst., vol. 16, no. 1, pp. 447–458, 2024.

[15] T. T. Dien, B. H. Loc, and N. Thai-Nghe, “Article classification using natural language processing and machine learning,” in Proc. Int. Conf. Adv. Comput. Appl. (ACOMP), 2019, pp. 1–6.

[16] A. Omar, M. Essgaer, and K. M. Ahmed, “Using machine learning model to predict Libyan telecom company customer satisfaction,” in Proc. Int. Conf. Eng. MIS (ICEMIS), 2022, pp. 1–5.

[17] A. Habberrih and M. A. Abuzaraida, “Sentiment analysis of Libyan middle region using machine learning with TF-IDF and N-grams,” in Proc. Int. Conf. Inf. Commun. Technol., 2023, pp. 1–10.

[18] A. Habberrih and M. A. Abuzaraida, “Sentiment analysis of Libyan dialect using machine learning with stemming and stop-words removal,” in Proc. 5th Int. Conf. Commun. Eng. Comput. Sci. (CIC-COCOS’24), 2024, pp. 1–8.

[19] A. A. Freihat et al., “Towards an optimal solution to lemmatization in Arabic,” Procedia Comput. Sci., vol. 142, pp. 132–140, 2018.

[20] I. Zeroual and A. Lakhouaja, “Arabic information retrieval: Stemming or lemmatization?,” in Proc. Int. Conf. Intell. Syst. Comput. Vis. (ISCV), 2017, pp. 1–6.

[21] A. S. Alammary, “Arabic questions classification using modified TF-IDF,” IEEE Access, vol. 9, pp. 95109–95122, 2021, doi: 10.1109/ACCESS.2021.3092755.

[22] Charfi, A., et al. (2024). "Hate speech detection with ADHAR: a multi-dialectal hate speech corpus in Arabic." Frontiers in Artificial Intelligence 7: 1391472.

[23] Hashmi, E., et al. (2024). "Enhancing multilingual hate speech detection: From language-specific insights to cross-linguistic integration." IEEE Access.

[24] Daraghmi, E. Y., et al. (2024). "From Text to Insight: An Integrated CNN-BiLSTM-GRU Model for Arabic Cyberbullying Detection." IEEE Access.

[25] Lanasri, D., et al. (2023). "Hate speech detection in algerian dialect using deep learning." arXiv preprint arXiv:2309.11611.