Analysis of Data and Feature Processing on Stroke Prediction using Wide Range Machine Learning Model
DOI:
https://doi.org/10.15575/join.v9i1.1249Keywords:
Stroke Prediction, Machine Learning, Sampling Data, Pearson Correlation, PCAAbstract
References
B. W. Negasa, T. W. Wotale, M. E. Lelisho, L. K. Debusho, K. Sisay, and W. Gezimu, “Modeling Survival Time to Death among Stroke Patients at Jimma University Medical Center, Southwest Ethiopia: A Retrospective Cohort Study,” Stroke Res. Treat., vol. 2023, pp. 1–10, Nov. 2023, doi: 10.1155/2023/1557133.
“Acute Ischemic Stroke: Management Approach,” Indian J. Crit. Care Med., vol. 23, no. S2, pp. 140–146, Jun. 2019, doi: 10.5005/jp-journals-10071-23192.
D. Kuriakose and Z. Xiao, “Pathophysiology and Treatment of Stroke: Present Status and Future Perspectives,” Int. J. Mol. Sci., vol. 21, no. 20, p. 7609, Oct. 2020, doi: 10.3390/ijms21207609.
G. Fekadu, L. Chelkeba, and A. Kebede, “Risk factors, clinical presentations and predictors of stroke among adult patients admitted to stroke unit of Jimma university medical center, south west Ethiopia: prospective observational study,” BMC Neurol., vol. 19, no. 1, p. 187, Dec. 2019, doi: 10.1186/s12883-019-1409-0.
fedesoriano, “Stroke Prediction Dataset.” 2020. [Online]. Available: https://www.kaggle.com/datasets/fedesoriano/stroke-prediction-dataset/data
M. Guhdar, A. Ismail Melhum, and A. Luqman Ibrahim, “Optimizing Accuracy of Stroke Prediction Using Logistic Regression,” J. Technol. Inform. JoTI, vol. 4, no. 2, pp. 41–47, Jan. 2023, doi: 10.37802/joti.v4i2.278.
E. Dritsas and M. Trigka, “Stroke Risk Prediction with Machine Learning Techniques,” Sensors, vol. 22, no. 13, p. 4670, Jun. 2022, doi: 10.3390/s22134670.
Md. M. Islam, S. Akter, Md. Rokunojjaman, J. H. Rony, A. Amin, and S. Kar, “Stroke Prediction Analysis using Machine Learning Classifiers and Feature Technique,” Int. J. Electron. Commun. Syst., vol. 1, no. 2, pp. 57–62, Dec. 2021, doi: 10.24042/ijecs.v1i2.10393.
O. Shobayo, O. Zachariah, M. O. Odusami, and B. Ogunleye, “Prediction of Stroke Disease with Demographic and Behavioural Data Using Random Forest Algorithm,” Analytics, vol. 2, no. 3, pp. 604–617, Aug. 2023, doi: 10.3390/analytics2030034.
T. Tazin, M. N. Alam, N. N. Dola, M. S. Bari, S. Bourouis, and M. Monirujjaman Khan, “Stroke Disease Detection and Prediction Using Robust Learning Approaches,” J. Healthc. Eng., vol. 2021, pp. 1–12, Nov. 2021, doi: 10.1155/2021/7633381.
G. Sailasya and G. L. A. Kumari, “Analyzing the Performance of Stroke Prediction using ML Classification Algorithms,” Int. J. Adv. Comput. Sci. Appl., vol. 12, no. 6, 2021, doi: 10.14569/IJACSA.2021.0120662.
A. M. A. Rahim, A. Sunyoto, and M. R. Arief, “Stroke Prediction Using Machine Learning Method with Extreme Gradient Boosting Algorithm,” MATRIK J. Manaj. Tek. Inform. Dan Rekayasa Komput., vol. 21, no. 3, pp. 595–606, Jul. 2022, doi: 10.30812/matrik.v21i3.1666.
S. Dev, H. Wang, C. S. Nwosu, N. Jain, B. Veeravalli, and D. John, “A predictive analytics approach for stroke prediction using machine learning and neural networks,” Healthc. Anal., vol. 2, p. 100032, Nov. 2022, doi: 10.1016/j.health.2022.100032.
F. Zinzendoff Okwonu, B. Laro Asaju, and F. Irimisose Arunaye, “Breakdown Analysis of Pearson Correlation Coefficient and Robust Correlation Methods,” IOP Conf. Ser. Mater. Sci. Eng., vol. 917, no. 1, p. 012065, Sep. 2020, doi: 10.1088/1757-899X/917/1/012065.
E. I. Obilor and E. C. Amadi, “Test for Significance of Pearson’s Correlation Coefficient (r),” Int. J. Innov. Math. Stat. Energy Policies, vol. 6, no. 1, pp. 11–23, 2018.
E. Elhaik, “Principal Component Analyses (PCA)-based findings in population genetic studies are highly biased and must be reevaluated,” Sci. Rep., vol. 12, no. 1, p. 14683, Aug. 2022, doi: 10.1038/s41598-022-14395-4.
L. Peng, G. Han, A. Landjobo Pagou, and J. Shu, “Electric submersible pump broken shaft fault diagnosis based on principal component analysis,” J. Pet. Sci. Eng., vol. 191, p. 107154, Aug. 2020, doi: 10.1016/j.petrol.2020.107154.
M. Saripuddin, A. Suliman, S. Syarmila Sameon, and B. N. Jorgensen, “Random Undersampling on Imbalance Time Series Data for Anomaly Detection,” in 2021 The 4th International Conference on Machine Learning and Machine Intelligence, Hangzhou China: ACM, Sep. 2021, pp. 151–156. doi: 10.1145/3490725.3490748.
M. Bach, A. Werner, and M. Palt, “The Proposal of Undersampling Method for Learning from Imbalanced Datasets,” Procedia Comput. Sci., vol. 159, pp. 125–134, 2019, doi: 10.1016/j.procs.2019.09.167.
R. G, A. K. Tyagi, and V. K. Reddy, “Performance Analysis of Under-Sampling and Over-Sampling Techniques for Solving Class Imbalance Problem,” SSRN Electron. J., 2019, doi: 10.2139/ssrn.3356374.
D. Elreedy and A. F. Atiya, “A Comprehensive Analysis of Synthetic Minority Oversampling Technique (SMOTE) for handling class imbalance,” Inf. Sci., vol. 505, pp. 32–64, Dec. 2019, doi: 10.1016/j.ins.2019.07.070.
B. S. Raghuwanshi and S. Shukla, “SMOTE based class-specific extreme learning machine for imbalanced learning,” Knowl.-Based Syst., vol. 187, p. 104814, Jan. 2020, doi: 10.1016/j.knosys.2019.06.022.
I. D. Mienye, Y. Sun, and Z. Wang, “Prediction performance of improved decision tree-based algorithms: a review,” Procedia Manuf., vol. 35, pp. 698–703, 2019, doi: 10.1016/j.promfg.2019.06.011.
C. Zhang, C. Hu, S. Xie, and S. Cao, “Research on the application of Decision Tree and Random Forest Algorithm in the main transformer fault evaluation,” J. Phys. Conf. Ser., vol. 1732, no. 1, p. 012086, Jan. 2021, doi: 10.1088/1742-6596/1732/1/012086.
M. Schonlau and R. Y. Zou, “The random forest algorithm for statistical learning,” Stata J. Promot. Commun. Stat. Stata, vol. 20, no. 1, pp. 3–29, Mar. 2020, doi: 10.1177/1536867X20909688.
Y. Ding, H. Zhu, R. Chen, and R. Li, “An Efficient AdaBoost Algorithm with the Multiple Thresholds Classification,” Appl. Sci., vol. 12, no. 12, p. 5872, Jun. 2022, doi: 10.3390/app12125872.
Y. Zhang et al., “Research and Application of AdaBoost Algorithm Based on SVM,” in 2019 IEEE 8th Joint International Information Technology and Artificial Intelligence Conference (ITAIC), Chongqing, China: IEEE, May 2019, pp. 662–666. doi: 10.1109/ITAIC.2019.8785556.
T. Chen and C. Guestrin, “XGBoost: A Scalable Tree Boosting System,” in Proceedings of the 22nd ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, San Francisco California USA: ACM, Aug. 2016, pp. 785–794. doi: 10.1145/2939672.2939785.
J. Shen and H. Fang, “Human Activity Recognition Using Gaussian Naïve Bayes Algorithm in Smart Home,” J. Phys. Conf. Ser., vol. 1631, no. 1, p. 012059, Sep. 2020, doi: 10.1088/1742-6596/1631/1/012059.
S. Uddin, I. Haque, H. Lu, M. A. Moni, and E. Gide, “Comparative performance analysis of K-nearest neighbour (KNN) algorithm and its different variants for disease prediction,” Sci. Rep., vol. 12, no. 1, p. 6256, Apr. 2022, doi: 10.1038/s41598-022-10358-x.
J. Cervantes, F. Garcia-Lamont, L. Rodríguez-Mazahua, and A. Lopez, “A comprehensive survey on support vector machine classification: Applications, challenges and trends,” Neurocomputing, vol. 408, pp. 189–215, Sep. 2020, doi: 10.1016/j.neucom.2019.10.118.
B. Gaye, D. Zhang, and A. Wulamu, “Improvement of Support Vector Machine Algorithm in Big Data Background,” Math. Probl. Eng., vol. 2021, pp. 1–9, Jun. 2021, doi: 10.1155/2021/5594899.
J. Singh and R. Banerjee, “A Study on Single and Multi-layer Perceptron Neural Network,” in 2019 3rd International Conference on Computing Methodologies and Communication (ICCMC), Erode, India: IEEE, Mar. 2019, pp. 35–40. doi: 10.1109/ICCMC.2019.8819775.
H. Alla, L. Moumoun, and Y. Balouki, “A Multilayer Perceptron Neural Network with Selective-Data Training for Flight Arrival Delay Prediction,” Sci. Program., vol. 2021, pp. 1–12, Jun. 2021, doi: 10.1155/2021/5558918.
D. P. Kingma and J. Ba, “Adam: A Method for Stochastic Optimization,” 2014, doi: 10.48550/ARXIV.1412.6980.
R. Wu and N. Hao, “Quadratic discriminant analysis by projection,” J. Multivar. Anal., vol. 190, p. 104987, Jul. 2022, doi: 10.1016/j.jmva.2022.104987.
A. Araveeporn, “Comparing the Linear and Quadratic Discriminant Analysis of Diabetes Disease Classification Based on Data Multicollinearity,” Int. J. Math. Math. Sci., vol. 2022, pp. 1–11, Sep. 2022, doi: 10.1155/2022/7829795.
Downloads
Published
Issue
Section
Citation Check
License
Copyright (c) 2024 Untari Novia Wisesty, Tjokorda Agung Budi Wirayuda, Febryanti Sthevanie, Rita Rismala
This work is licensed under a Creative Commons Attribution-NoDerivatives 4.0 International License.
You are free to:
- Share — copy and redistribute the material in any medium or format for any purpose, even commercially.
- The licensor cannot revoke these freedoms as long as you follow the license terms.
Under the following terms:
-
Attribution — You must give appropriate credit, provide a link to the license, and indicate if changes were made. You may do so in any reasonable manner, but not in any way that suggests the licensor endorses you or your use.
-
NoDerivatives — If you remix, transform, or build upon the material, you may not distribute the modified material.
-
No additional restrictions — You may not apply legal terms or technological measures that legally restrict others from doing anything the license permits.
Notices:
- You do not have to comply with the license for elements of the material in the public domain or where your use is permitted by an applicable exception or limitation.
- No warranties are given. The license may not give you all of the permissions necessary for your intended use. For example, other rights such as publicity, privacy, or moral rights may limit how you use the material.
This work is licensed under a Creative Commons Attribution-NoDerivatives 4.0 International License