Stroke is a leading cause of mortality and disability globally, and precise risk prediction models are required. Herein, machine learning classifiers are utilized to predict and estimate stroke incidence based on a publicly accessible healthcare dataset of demographic, lifestyle, and clinical variables. Logistic Regression and Random Forest were experimented and trained in a class imbalance setting following preprocessing steps included missing values imputation and encoding categorical variables. Random Forest model had 95.4% accuracy, precision of 79.2%, recall of 62.7%, F1-score of 70.1%, and ROC-AUC of 91.6%, higher than Logistic Regression model with 92.8% accuracy, precision of 64.8%, recall of 55.1%, F1-score of 59.6%, and ROC-AUC of 87.9%. Confusion matrices and ROC/Precision-Recall curves also showed the discriminative power of the models. The importances of the features plot indicated that the primary predictors were hypertension, body mass index, mean glucose value, and age. The results demonstrate the potential of machine learning to facilitate early risk prediction of stroke, hence facilitating timely clinical intervention and resource planning. The study augments the evidence for the use of predictive analytics in healthcare decision support systems.
A. W. Nugroho, H. Arima, I. Miyazawa, T. Fujii, N. Miyamatsu, Y. Sugimoto, S. Nagata, M. Komori, N. Takashima, Y. Kita, et al., “The Association between Glomerular Filtration Rate Estimated on Admission and Acute Stroke Outcome: The Shiga Stroke Registry,” Journal of Atherosclerosis and Thrombosis, vol. 25, pp. 570-579, 2018.
M. Lee, J. L. Saver, K. H. Chang, H. W. Liao, S. C. Chang, and B. Ovbiagele, “Low glomerular filtration rate and risk of stroke: Meta-analysis,” BMJ, vol. 341, c4249, 2010.
C. H. Chao, C. L. Wu, and W. Y. Huang, “Association between estimated glomerular filtration rate and clinical outcomes in ischemic stroke patients with high-grade carotid artery stenosis,” BMC Neurology, vol. 21, p. 124, 2021.
A. M. Penn, N. S. Croteau, K. Votova, C. Sedgwick, R. F. Balshaw, S. B. Coutts, M. Penn, K. Blackwood, M. B. Bibok, V. Saly, et al., “Systolic blood pressure as a predictor of transient ischemic attack/minor stroke in emergency department patients under age 80: A prospective cohort study,” BMC Neurology, vol. 19, p. 251, 2019.
Y. Turana, J. Tengkawan, Y. C. Chia, M. Nathaniel, J. Wang, A. Sukonthasarn, C. Chen, H. V. Minh, P. Buranakitjaroen, J. Shin, et al., “Hypertension and stroke in Asia: A comprehensive review from HOPE Asia,” Journal of Clinical Hypertension, vol. 23, pp. 513-521, 2021.
R. Hajhosseiny, G. K. Matthews, and G. Y. Lip, “Metabolic syndrome, atrial fibrillation, and stroke: Tackling an emerging epidemic,” Heart Rhythm, vol. 12, pp. 2332-2343, 2015.
A. P. Carson, P. Muntner, B. M. Kissela, D. O. Kleindorfer, V. J. Howard, J. F. Meschia, L. S. Williams, R. J. Prineas, G. Howard, and M. M. Safford, “Association of Prediabetes and Diabetes with Stroke Symptoms,” Diabetes Care, vol. 35, pp. 1845-1852, 2012.
R. T. Ribeiro, M. P. Macedo, and J. F. Raposo, “HbA1c, Fructosamine, and Glycated Albumin in the Detection of Dysglycaemic Conditions,” Current Diabetes Reviews, vol. 12, pp. 14-19, 2015.
E. Selvin, A. M. Rawlings, P. L. Lutsey, N. Maruthur, J. S. Pankow, M. Steffes, and J. Coresh, “Fructosamine and Glycated Albumin and the Risk of Cardiovascular Outcomes and Death,” Circulation, vol. 132, pp. 269-277, 2015.
A. Grzywacz, A. Lubas, J. Smoszna, and S. Niemczyk, “Risk Factors Associated with All-Cause Death Among Dialysis Patients with Diabetes,” Medical Science Monitor, vol. 27, e930152-1, 2021.
B. Panwar, S. E. Judd, D. G. Warnock, W. M. McClellan, J. N. Booth, P. Muntner, and O. M. Gutiérrez, “Hemoglobin Concentration and Risk of Incident Stroke in Community-Living Adults,” Stroke, vol. 47, pp. 2017-2024, 2016.
M. Y. Kim, S. H. Jee, J. E. Yun, S. J. Baek, and D. C. Lee, “Hemoglobin Concentration and Risk of Cardiovascular Disease in Korean Men and Women—The Korean Heart Study,” Journal of Korean Medical Science, vol. 28, p. 1316, 2013.
S. V. Prabhu, B. Tripathi, Y. Agarwal, B. Kabi, and R. Kumar, “Association of serum calcium levels with clinical severity of ischemic stroke at the time of admission as defined by NIHSS score: A cross-sectional, observational study,” Journal of Family Medicine and Primary Care, vol. 11, p. 6427, 2022.
D. T. Dibaba, P. Xun, A. D. Fly, A. Bidulescu, C. L. Tsinovoi, S. E. Judd, L. A. McClure, M. Cushman, F. W. Unverzagt, and K. He, “Calcium Intake and Serum Calcium Level in Relation to the Risk of Ischemic Stroke: Findings from the REGARDS Study,” Journal of Stroke, vol. 21, pp. 312-323, 2019.
S. Rohrmann, H. Garmo, H. Malmström, N. Hammar, I. Jungner, G. Walldius, and M. V. Hemelrijck, “Association between serum calcium concentration and risk of incident and fatal cardiovascular disease in the prospective AMORIS study,” Atherosclerosis, vol. 251, pp. 85-93, 2016.
S. C. Larsson, S. Burgess, and K. Michaëlsson, “Association of Genetic Variants Related to Serum Calcium Levels with Coronary Artery Disease and Myocardial Infarction,” JAMA, vol. 318, p. 371, 2017.
L. Jahangiry, M. A. Farhangi, and F. Rezaei, “Framingham risk score for estimation of 10-years of cardiovascular diseases risk in patients with metabolic syndrome,” Journal of Health, Population and Nutrition, vol. 36, p. 36, 2017.
A. Arafa, Y. Kokubo, H. A. Sheerah, Y. Sakai, E. Watanabe, J. Li, K. Honda-Kohmo, M. Teramoto, R. Kashima, Y. M. Nakao, et al., “Developing a Stroke Risk Prediction Model Using Cardiovascular Risk Factors: The Suita Study,” Cerebrovascular Diseases, vol. 51, pp. 323-330, 2022.
Y. Miyamoto, T. Itaya, Y. Terasawa, and T. Kohriyama, “Association between the Suita Score and Stroke Recurrence in Patients with First-ever Ischemic Stroke: A Prospective Cohort Study,” Internal Medicine, vol. 61, pp. 773-780, 2022.
K. Nishimura, T. Okamura, M. Watanabe, M. Nakai, M. Takegami, A. Higashiyama, Y. Kokubo, A. Okayama, and Y. Miyamoto, “Predicting Coronary Heart Disease Using Risk Factor Categories for a Japanese Urban Population, and Comparison with the Framingham Risk Score: The Suita Study,” Journal of Atherosclerosis and Thrombosis, vol. 21, pp. 784-798, 2014.
A. Guzik and C. Bushnell, “Stroke Epidemiology and Risk Factor Management,” CONTINUUM: Lifelong Learning in Neurology, vol. 23, pp. 15-39, 2017.
F. Soriano, “Stroke Prediction Dataset,” Kaggle, 2021, [Online]. Available: https://www.kaggle.com/datasets/fedesoriano/stroke-prediction-dataset.