Background: Type 2 Diabetes Mellitus (T2DM) is an increasing global health challenge, requiring strong predictive models for prompt intervention. This study sought to create and validate a logistic regression-based predictive framework for diabetes risk utilizing electronic health record (EHR) data. A retrospective cohort of 10,000 adults devoid of previous diabetes was derived from anonymized electronic health records (EHRs). Demographics, vital signs, laboratory biomarkers, comorbidities, and medication history were all possible predictors. Data preprocessing included dealing with outliers, filling in missing values, and making features more consistent. We used logistic regression with elastic net regularization and divided the data into training, validation, and independent test sets. We used AUROC, AUPRC, calibration, Brier score, and decision curve analysis to figure out how well the model worked. The model got an AUROC of 0.81 and an AUPRC of 0.46 on the test set. It also had good calibration and subgroup consistency. Logistic regression was easier to understand than machine learning comparisons, but it still had similar levels of accuracy. An understandable, EHR-based logistic regression model offers a useful and clinically significant method for predicting diabetes risk. Future research should broaden validation efforts across diverse populations and investigate the integration of advanced AI methodologies.
Keywords
Type 2 DiabetesElectronic Health RecordsLogistic RegressionPredictive ModelingCalibrationClinical Decision Support.
References
R. D. Joshi and C. K. Dhakal, “Predicting type 2 diabetes using logistic regression and machine learning approaches,” International Journal of Environmental Research and Public Health, vol. 18, no. 14, p. 7346, 2021.
H. Lai, H. Huang, K. Keshavjee, A. Guergachi, and
X. Gao, “Predictive models for diabetes mellitus using machine learning techniques,” BMC Endocrine Disorders, vol. 19, no. 1, p. 101, 2019.
M. E. Bowen, I. Lingvay, L. Meneghini, B. Moran,
N. O. Santini, S. Zhang, and E. A. Halm, “Derivation and validation of D-RISK: an electronic health record-driven risk score to detect undiagnosed dysglycemia in clinical practice,” Diabetes Care, vol. 48, no. 5, pp. 703-710, 2025.
D. M. Kent, J. Nelson, A. Pittas, F. Colangelo,
C. Koenig, D. van Klaveren, and J. Cuddeback, “An electronic health record-compatible model to predict personalized treatment effects from the Diabetes Prevention Program: a cross-evidence synthesis approach using clinical trial and real-world data,” in Mayo Clinic Proceedings, vol. 97, no. 4, pp. 703-715, Elsevier, Apr. 2022.
F. Mesquita, J. Bernardino, J. Henriques, J. F. Raposo, R. T. Ribeiro, and S. Paredes, “Machine learning techniques to predict the risk of developing diabetic nephropathy: a literature review,” Journal of Diabetes & Metabolic Disorders, vol. 23, no. 1, pp. 825-839, 2024.
L. T. Nguyen and M. Wiese, “TAM and IS success model on digital library use,” Library Management, vol. 24, no. 1-2, pp. 173-185, 2003, [Online]. Available: https://doi.org/10.1108/01435120310454592.
Y. Zhang, H. Li, and X. Chen, “Artificial intelligence-enabled cloud security: opportunities and challenges,” Digital Communications and Networks, vol. 11, no. 2, pp. 55-66, 2025, [Online]. Available: https://doi.org/10.1016/j.dcan.2025.01.005.
Y. Edlitz and E. Segal, “Prediction of type 2 diabetes mellitus onset using logistic regression-based scorecards,” eLife, vol. 11, p. e71862, 2022.
C. Zhu, C. U. Idemudia, and W. Feng, “Improved logistic regression model for diabetes prediction by integrating PCA and K-means techniques,” Informatics in Medicine Unlocked, vol. 17, p. 100179, 2019.
J. Lu, S. Lu, Y. Zhao, L. Yang, W. C. Chan, J. Lian, and D. H. Shum, “An electronic health record-linked machine learning tool for diabetes risk assessment in adults with prediabetes,” The Innovation Medicine, vol. 3, no. 1, 2025.
S. Afolabi, N. Ajadi, A. Jimoh, and I. Adenekan, “Predicting diabetes using supervised machine learning algorithms on e-health records,” Informatics and Health, vol. 2, no. 1, pp. 9-16, 2025.
F. Mohsen, H. R. Al-Absi, N. A. Yousri, N. El Hajj, and Z. Shah, “A scoping review of artificial intelligence-based methods for diabetes risk prediction,” npj Digital Medicine, vol. 6, no. 1, p. 197, 2023.
R. Sharma, P. Gupta, and A. Singh, “Human-computer interaction frameworks for secure digital adoption,” International Journal of Human-Computer Interaction, vol. 41, no. 7, pp. 845-862, 2025, [Online]. Available: https://doi.org/10.1080/10447318.2025.2495843.
A. Barwise and D. Tschida-Reuter and B. Sutor, “Adaptations to interpreter services for hospitalized patients during the COVID-19 pandemic,” in Mayo Clinic Proceedings, vol. 96, no. 12, p. 3184, Oct. 2021.