Improving Imbalanced Data Handling in Intrusion Detection Systems using SMOTE with an Extended Kalman Filter

Authors

  • Guntoro Guntoro Department of Informatics Engineering, Universitas Lancang Kuning, Indonesia and School of Computing, Universiti Utara Malaysia, , Malaysia
  • Mohd. Nizam Omar School of Computing, Universiti Utara Malaysia, Malaysia
  • Mohamad Farhan Mohamad Mohsin School of Computing, Universiti Utara Malaysia, Malaysia

DOI:

https://doi.org/10.15575/join.v11i1.1687

Keywords:

Extended Kalman Filter, Imbalanced Data, Intrusion Detection System, Machine Learning, NSL-KDD, SMOTE-EKF

Abstract

Class imbalance is a major hurdle when building intrusion detection systems (IDS). Most network traffic is normal, while certain types of attacks are very rare. This uneven distribution makes it hard for machine learning models to perform well—they often focus on the common traffic and miss the less frequent but critical attacks, like Remote to Local (R2L) and User to Root (U2R). To tackle this problem, this study proposes an improved oversampling method called SMOTE-EKF. It combines the Synthetic Minority Oversampling Technique (SMOTE) with the Extended Kalman Filter (EKF). By treating the creation of synthetic data as a nonlinear estimation problem, the EKF helps refine the generated samples, making them more accurate and reducing noise or overly broad boundaries. The method was tested on the NSL-KDD dataset using a Random Forest classifier, with performance evaluated through metrics like Accuracy, Precision, Recall, F1-score, G-Mean, and AUC-ROC, along with runtime analysis and cross-validation. The results show that SMOTE-EKF outperforms the baseline approaches, achieving impressive scores: 99.70% accuracy, 98.33% precision, 98.38% recall, 98.35% F1-score, a G-Mean of 98.29%, and an AUC-ROC of 0.993. Importantly, it also improves detection of rare attacks, with F1-scores of 96.76% for R2L and 93.65% for U2R. The SMOTE-EKF model proves to be more balanced in detecting all attack classes, without succumbing to overfitting. This study also suggests that incorporating predictive methods into the oversampling process can serve as a valuable strategy for improving the performance of machine learning-based intrusion detection systems.

References

[1] S. P. K. Sarker and R. Z. Khan, “Cybersecurity Considerations for Smart Bangladesh: Challenges and Solutions,” Asian J. Res. Comput. Sci., vol. 17, no. 6, pp. 145–156, Apr. 2024, doi: 10.9734/ajrcos/2024/v17i6464.

[2] R. K. Sharma, “Defending cyberspace india–uS joint efforts against cybercrime,” J. Def. Stud., vol. 19, no. 1, pp. 136–163.

[3] D. Yan, “A Systems Thinking for Cybersecurity Modeling,” 2020, arXiv. doi: 10.48550/ARXIV.2001.05734.

[4] Q. A. Al‐Haija and A. Droos, “A comprehensive survey on deep learning‐based intrusion detection systems in Internet of Things (IoT),” Expert Syst., vol. 42, no. 2, p. e13726, Feb. 2025, doi: 10.1111/exsy.13726.

[5] A. Divekar, M. Parekh, V. Savla, R. Mishra, and M. Shirole, “Benchmarking datasets for Anomaly-based Network Intrusion Detection: KDD CUP 99 alternatives,” in 2018 IEEE 3rd International Conference on Computing, Communication and Security (ICCCS), Kathmandu: IEEE, Oct. 2018, pp. 1–8. doi: 10.1109/CCCS.2018.8586840.

[6] Asniar, N. U. Maulidevi, and K. Surendro, “SMOTE-LOF for noise identification in imbalanced data classification,” J. King Saud Univ. - Comput. Inf. Sci., vol. 34, no. 6, pp. 3413–3423, Jun. 2022, doi: 10.1016/j.jksuci.2021.01.014.

[7] V. Shanmugam, R. Razavi-Far, and E. Hallaji, “Addressing Class Imbalance in Intrusion Detection: A Comprehensive Evaluation of Machine Learning Approaches,” Electronics, vol. 14, no. 1, p. 69, Dec. 2024, doi: 10.3390/electronics14010069.

[8] A. Abdelkhalek and M. Mashaly, “Addressing the class imbalance problem in network intrusion detection systems using data resampling and deep learning,” J. Supercomput., vol. 79, no. 10, pp. 10611–10644, Jul. 2023, doi: 10.1007/s11227-023-05073-x.

[9] N. Gupta, V. Jindal, and P. Bedi, “CSE-IDS: Using cost-sensitive deep learning and ensemble algorithms to handle class imbalance in network-based intrusion detection systems,” Comput. Secur., vol. 112, 2022, doi: 10.1016/j.cose.2021.102499.

[10] A. Binbusayyis and T. Vaiyapuri, “Identifying and Benchmarking Key Features for Cyber Intrusion Detection: An Ensemble Approach,” IEEE Access, vol. 7, pp. 106495–106513, 2019, doi: 10.1109/ACCESS.2019.2929487.

[11] W. Chen, K. Yang, Z. Yu, Y. Shi, and C. L. P. Chen, “A survey on imbalanced learning: latest research, applications and future directions,” Artif. Intell. Rev., vol. 57, no. 6, p. 137, May 2024, doi: 10.1007/s10462-024-10759-6.

[12] J. Xie et al., “State of charge estimation of lithium-ion battery based on extended Kalman filter algorithm,” Front. Energy Res., vol. 11, p. 1180881, May 2023, doi: 10.3389/fenrg.2023.1180881.

[13] T. G.S., Y. Hariprasad, S. S. Iyengar, N. R. Sunitha, P. Badrinath, and S. Chennupati, “An extension of Synthetic Minority Oversampling Technique based on Kalman filter for imbalanced datasets,” Mach. Learn. Appl., vol. 8, p. 100267, Jun. 2022, doi: 10.1016/j.mlwa.2022.100267.

[14] D. D. Kulkarni, S. Rathore, and R. K. Jaiswal, “Intrusion Detection System For IoT Networks Using Neural Networks With Extended Kalman Filter,” in 2021 International Conference on Computer Communications and Networks (ICCCN), Athens, Greece: IEEE, Jul. 2021, pp. 1–7. doi: 10.1109/ICCCN52240.2021.9522335.

[15] Y. Xiao, C. Xing, T. Zhang, and Z. Zhao, “An Intrusion Detection Model Based on Feature Reduction and Convolutional Neural Networks,” IEEE Access, vol. 7, pp. 42210–42219, 2019, doi: https://doi.org/10.1109/ACCESS.2019.2904620.

[16] T. Wu, H. Fan, H. Zhu, C. You, H. Zhou, and X. Huang, “Intrusion detection system combined enhanced random forest with SMOTE algorithm,” EURASIP J. Adv. Signal Process., vol. 2022, no. 1, p. 39, Dec. 2022, doi: 10.1186/s13634-022-00871-6.

[17] N. V. Chawla, “Data Mining for Imbalanced Datasets: An Overview,” in Data Mining and Knowledge Discovery Handbook, O. Maimon and L. Rokach, Eds., New York: Springer-Verlag, 2005, pp. 853–867. doi: 10.1007/0-387-25465-X_40.

[18] R. Alshamy and M. A. Akcayol, “Intrusion Detection Model using Machine Learning Algorithms on NSL-KDD Dataset,” Int. J. Comput. Netw. Commun., vol. 16, no. 6, pp. 75–88, Nov. 2024, doi: 10.5121/ijcnc.2024.16605.

[19] A. O. Widodo, B. Setiawan, and R. Indraswari, “Machine Learning-Based Intrusion Detection on Multi-Class Imbalanced Dataset Using SMOTE,” Procedia Comput. Sci., vol. 234, pp. 578–583, 2024, doi: 10.1016/j.procs.2024.03.042.

[20] R. Ahsan, W. Shi, and J. Corriveau, “Network intrusion detection using machine learning approaches: Addressing data imbalance,” IET Cyber-Phys. Syst. Theory Appl., vol. 7, no. 1, pp. 30–39, Mar. 2022, doi: https://doi.org/10.1049/cps2.12013.

[21] Y. Yang, Y. Gu, and Y. Yan, “Machine Learning-Based Intrusion Detection for Rare-Class Network Attacks,” Electronics, vol. 12, no. 18, p. 3911, Sep. 2023, doi: 10.3390/electronics12183911.

Downloads

Published

2026-04-24

Issue

Section

Article

Citation Check

Similar Articles

1 2 3 4 5 6 7 8 9 10 > >> 

You may also start an advanced similarity search for this article.