Early and accurate cancer identification remains one of the most significant problems in clinical data analysis. In this paper, a hybrid classification model that integrates Random Forest (RF) and Support Vector Machine (SVM) models is presented and evaluated to enhance diagnosis accuracy based on available cancer datasets from Kaggle. The method employs a hybrid soft-voting ensemble (RF + SVM) over the standalone models to measure predictability performance and gains in robustness. Feature scaling, stratified data split, and probabilistic voting were applied in order to ensure balanced learning and generalization. The comparative evaluation demonstrates that the hybrid model outperformed both base classifiers with overall accuracy 0.972, precision 1.000, recall 0.925, and F1-score 0.961, along with AUC 0.998. Performance visualization by ROC, Precision–Recall, and calibration plots repeated the hybrid model's superior discriminative performance and calibration robustness. Threshold sweep analysis continued to further improve classification sensitivity for identifying malignancies. The results affirm that tree-based ensemble learning and kernel-based separation integration enhances classification robustness in medical diagnosis tasks. The hybrid solution is applicable as a likely foundation for next-generation intelligent diagnostic systems integrating multiple learning paradigms in handling complex biomedical data.
Keywords
Cancer ClassificationHybrid ModelEnsemble LearningPrecision-Recall AnalysisROC CurveSoft VotingMachine LearningMedical Data Analytics.
References
H. Zerouaoui and A. Idri, “Reviewing machine learning and image processing based decision-making systems for breast cancer imaging,” Journal of Medical Systems, vol. 45, 2021, [Online]. Available: https://doi.org/10.1007/s10916-020-01689-1.
C. D. Lehman, S. Mercaldo, L. R. Lamb, T. A. King, L. W. Ellisen, M. Specht, R. M. Tamimi, and, “Deep learning vs traditional breast cancer risk models to support risk-based mammography screening,” Journal of the National Cancer Institute, vol. 114, pp. 1355-1363, 2022, [Online]. Available: https://doi.org/10.1093/jnci/djac142.
A. Yala, P. G. Mikhael, F. Strand, G. Lin, K. Smith, Y.-L. Wan, L. Lamb, K. Hughes, C. Lehman, and R. Barzilay, “Toward robust mammography-based models for breast cancer risk,” Science Translational Medicine, vol. 13, 2021, [Online]. Available: https://doi.org/10.1126/scitranslmed.aba4373.
F. Janan and M. Brady, “RICE: A method for quantitative mammographic image enhancement,” Medical Image Analysis, vol. 71, 2021, [Online]. Available: https://doi.org/10.1016/j.media.2021.102043.
J. H. Yoon and E. K. Kim, “Deep learning-based artificial intelligence for mammography,” Korean Journal of Radiology, vol. 22, pp. 1225-1239, 2021, [Online]. Available: https://doi.org/10.3348/KJR.2020.1210.
P. E. Jebarani, N. Umadevi, H. Dang, and M. Pomplun, “A novel hybrid K-means and GMM machine learning model for breast cancer detection,” IEEE Access, vol. 9, pp. 146153-146162, 2021, [Online]. Available: https://doi.org/10.1109/ACCESS.2021.3123425.
S. Prakash, M. V. Kumar, R. S. Ram, M. Zivkovic, N. Bacanin, and M. Antonijevic, “Hybrid GLFIL enhancement and encoder animal migration classification for breast cancer detection,” Computer Systems Science & Engineering, vol. 41, pp. 735-749, 2022, [Online]. Available: https://doi.org/10.32604/csse.2022.020533.
N. Dhungel, G. Carneiro, and A. P. Bradley, “Deep structured learning for mass segmentation from mammograms,” in Proceedings of the IEEE International Conference on Image Processing (ICIP), 2015, pp. 2950-2954.
D. Arefan, A. A. Mohamed, W. A. Berg, M. L. Zuley, J. H. Sumkin, and S. Wu, “Deep learning modeling using normal mammograms for predicting breast cancer risk,” Medical Physics, vol. 47, pp. 110-118, 2020, [Online]. Available: https://doi.org/10.1002/mp.13886.
J. Dabass, S. Arora, R. Vig, and M. Hanmandlu, “Mammogram image enhancement using entropy and CLAHE based intuitionistic fuzzy method,” in Proceedings of the 2019 6th International Conference on Signal Processing and Integrated Networks (SPIN), 2019, pp. 24-29, [Online]. Available: https://doi.org/10.1109/SPIN.2019.8711673.
N. Kharel, A. Alsadoon, P. W. C. Prasad, and A. Elchouemi, “Early diagnosis of breast cancer using contrast limited adaptive histogram equalization (CLAHE) and Morphology methods,” in Proceedings of the 2017 8th International Conference on Information and Communication Systems (ICICS), 2017, pp. 120-124.
V. D. P. Jasti, A. S. Zamani, K. Arumugam, M. Naved, F. Sammy, A. Raghuvanshi, and K. Kaliyaperumal, “Computational technique based on machine learning and image processing for medical image analysis of breast cancer diagnosis,” Security and Communication Networks, 2022, [Online]. Available: https://doi.org/10.1155/2022/1918379.
X. Wang, I. Ahmad, D. Javeed, S. A. Zaidi, F. M. Alotaibi, M. E. Ghoneim, Y. I. Daradkeh, J. Asghar, and E. T. Eldin, “Intelligent hybrid deep learning model for breast cancer detection,” Electronics, vol. 11, 2022, [Online]. Available: https://doi.org/10.3390/electronics11172767.
A. Akselrod-Ballin, M. Chorev, Y. Shoshan, A. Spiro, A. Hazan, R. Melamed, E. Barkan, E. Herzel, S. Naor, E. Karavani, G. Koren, Y. Goldschmidt, V. Shalev, M. Guindy, and M. Rosen-Zvi, “Predicting breast cancer by applying deep learning to linked health records and mammograms,” Radiology, vol. 292, pp. 331-342, 2019, [Online]. Available: https://doi.org/10.1148/radiol.2019182622.
T. Kyono, F. J. Gilbert, and M. van der Schaar, “Improving workflow efficiency for mammography using machine learning,” Journal of the American College of Radiology, vol. 17, pp. 56-63, 2020, [Online]. Available: https://doi.org/10.1016/j.jacr.2019.05.012.
D. Oyewola, D. Hakimi, K. Adeboye, and M. D. Shehu, “Using five machine learning for breast cancer biopsy predictions based on mammographic diagnosis,” International Journal of Engineering Technologies (IJET), vol. 2, pp. 142-145, 2017, [Online]. Available: https://doi.org/10.19072/ijet.280563.
M. L. Giger, N. Karssemeijer, and J. A. Schnabel, “Breast image analysis for risk assessment, detection, diagnosis, and treatment of cancer,” Annual Review of Biomedical Engineering, vol. 15, pp. 327-357, 2013, [Online]. Available: https://doi.org/10.1146/annurev-bioeng-071812-152416.
M. A. Elshafey and T. E. Ghoniemy, “A hybrid ensemble deep learning approach for reliable breast cancer detection,” International Journal of Advances in Intelligent Informatics, vol. 7, p. 112, 2021.
S. Zhang, L. Yao, A. Sun, and Y. Tay, “Deep learning based recommender system: a survey and new perspectives,” ACM Computing Surveys, vol. 52, 2019, [Online]. Available: https://doi.org/10.1145/3285029.
T. Liu, J. Huang, T. Liao, R. Pu, S. Liu, and Y. Peng, “A hybrid deep learning model for predicting molecular subtypes of human breast cancer using multimodal data,” IRBM, vol. 43, pp. 62-74, 2022, [Online]. Available: https://doi.org/10.1016/j.irbm.2020.12.002.
G. Jayandhi, J. S. Leena Jasmine, R. Seetharaman, S. M. Joans, and R. Priscilla Joy, “Efficient breast cancer prediction using hybrid deep learning in mammographic images,” in Proceedings of the International Conference on Electronics and Renewable Systems (ICEARS 2022), 2022, pp. 1366-1371.
L. Narayanan, S. Krishnan, and H. Robinson, “A hybrid deep learning based assist system for detection and classification of breast cancer from mammogram images,” International Arab Journal of Information Technology, vol. 19, 2022.
F. Yan, H. Huang, W. Pedrycz, and K. Hirota, “A disease diagnosis system for smart healthcare based on fuzzy clustering and battle royale optimization,” Applied Soft Computing, vol. 151, 2024, [Online]. Available: https://doi.org/10.1016/J.ASOC.2023.111123.
J. de Nazaré Silva, A. O. de Carvalho Filho, A. Corrêa Silva, A. Cardoso de Paiva, and M. Gattass, “Automatic detection of masses in mammograms using quality threshold clustering, correlogram function, and SVM,” Journal of Digital Imaging, vol. 28, p. 323, 2015, [Online]. Available: https://doi.org/10.1007/S10278-014-9739-3.
W. Borges De Sampaio, A. Corrêa Silva, A. Cardoso De Paiva, and M. Gattass, “Detection of masses in mammograms with adaption to breast density using genetic algorithm, phylogenetic trees, LBP and SVM,” Expert Systems with Applications, vol. 42, pp. 8911-8928, 2015, [Online]. Available: https://doi.org/10.1016/j.eswa.2015.07.046.
N. Dhungel, G. Carneiro, and A. P. Bradley, “Deep structured learning for mass segmentation from mammograms,” in Proceedings of the IEEE International Conference on Image Processing (ICIP), 2015, pp. 2950-2954.
M. A. Al-antari, M. A. Al-masni, M. T. Choi, S. M. Han, and T. S. Kim, “A fully integrated computer-aided diagnosis system for digital X-ray mammograms via deep learning detection, segmentation, and classification,” International Journal of Medical Informatics, vol. 117, pp. 44-54, 2018, [Online]. Available: https://doi.org/10.1016/J.IJMEDINF.2018.06.003.
E. Taha, “Cancer Data,” Kaggle Datasets, 2023, [Online]. Available: https://www.kaggle.com/datasets/erdemtaha/cancer-data.