Anomaly detection is crucial in cybersecurity logs; nevertheless, system logs' extensive size and complexity render manual analysis impractical. Traditional supervised methods necessitate extensive labelled datasets, whereas unsupervised methods lack robustness. To address these difficulties, we proposed a novel semi-supervised framework for anomaly detection in cybersecurity logs. It employs a hybrid feature representation, deep learning, traditional models, and ensemble techniques. The framework has many critical layers: hybrid feature representation TF-IDF (sparse feature), SBERT (semantic feature), and statistical features. Anomaly detection employs an Auto-Encoder, a Bi-LSTM module, and two traditional models: an isolation forest and a one-class support vector machine. The outputs of these models are integrated using a two-layer approach: weighted averaging (soft voting) and stacking via a random forest optimizer. Experimental findings on the HDFS dataset demonstrate that this hybrid semi-supervised approach enhances detection accuracy, scalability, and robustness, offering an efficient method for enhancing cybersecurity via log-based anomaly detection.
Z. T. M. Al-Ta’i and S. M. Sadoon, “Visual cryptography based on chaotic logistic map in multi-cloud,” in AIP Conference Proceedings, vol. 3097, no. 1, 2024.
S. A. H. Sándor R. Répás, “Anomaly Detection in Log Files Based on Machine Learning Techniques,” J. Electr. Syst., vol. 20, no. 3s, pp. 1299-1311, 2024, doi: 10.52783/jes.1505.
Y. Zhang et al., “Deep Learning for Anomaly Detection in Cybersecurity,” ACM Trans. Cybersecurity, no. February, 2021.
Y. Alaca, Y. Çelik, and S. Goel, “Anomaly Detection in Cyber Security with Graph-Based LSTM in Log Analysis,” Chaos Theory Appl., pp. 188-197, 2023, doi: 10.51537/chaos.1348302.
Y. Zhang, X. Chang, L. Fang, and Y. Lu, “Deeplog: Deep-learning-based log recommendation,” in 2023 IEEE/ACM 45th International Conference on Software Engineering: Companion Proceedings (ICSE-Companion), 2023, pp. 88-92.
F. Hadadi, J. H. Dawes, D. Shin, D. Bianculli, and L. Briand, “Systematic evaluation of deep learning models for log-based failure prediction,” Empir. Softw. Eng., vol. 29, no. 5, p. 105, 2024.
D. S. M. Meena Siwach, “Anomaly detection for web log data analysis: A review,” J. Algebr. Stat., vol. 13, no. 1, pp. 129-148, 2022.
L. Yang et al., “Try with simpler-an evaluation of improved principal component analysis in log-based anomaly detection,” ACM Trans. Softw. Eng. Methodol., vol. 33, no. 5, pp. 1-27, 2024.
C. Egersdoerfer, D. Zhang, and D. Dai, “Clusterlog: Clustering logs for effective log-based anomaly detection,” in 2022 IEEE/ACM 12th Workshop on Fault Tolerance for HPC at eXtreme Scale (FTXS), 2022, pp. 1-10.
P. Jia, S. Cai, B. C. Ooi, P. Wang, and Y. Xiong, “Robust and transferable log-based anomaly detection,” Proc. ACM Manag. Data, vol. 1, no. 1, pp. 1-26, 2023.
M. Goldstein and S. Uchida, “Behavior Analysis Using Unsupervised Anomaly Detection,” in 10th Jt. Work. Mach. Percept. Robot., no. October, 2014.
A. Aziz and K. Munir, “Anomaly Detection in Logs Using Deep Learning,” IEEE Access, vol. 12, no. November, pp. 176124-176135, 2024, doi: 10.1109/ACCESS.2024.3506332.
Y. Duan et al., “LogEDL: Log Anomaly Detection via Evidential Deep Learning,” Appl. Sci., vol. 14, no. 16, pp. 1-18, 2024, doi: 10.3390/app14167055.
M. Siwach and S. Mann, “Anomaly Detection for Web Log based Data: A Survey,” in 2022 IEEE Delhi Sect. Conf. (DELCON), vol. 13, no. 1, pp. 129-148, 2022, doi: 10.1109/DELCON54057.2022.9753130.
M. Fahim and A. Sillitti, “Anomaly Detection, Analysis and Prediction Techniques in IoT Environment: A Systematic Literature Review,” IEEE Access, vol. 7, pp. 81664-81681, 2019, doi: 10.1109/ACCESS.2019.2921912.
Y. Lee, J. Kim, and P. Kang, “LAnoBERT: System log anomaly detection based on BERT masked language model,” Appl. Soft Comput., vol. 146, 2023, doi: 10.1016/j.asoc.2023.110689.
C. Zhang et al., “LayerLog: Log sequence anomaly detection based on hierarchical semantics,” Appl. Soft Comput., vol. 132, p. 109860, 2023, doi: 10.1016/j.asoc.2022.109860.
T. Rajendran, N. Mohamed Imtiaz, K. Jagadeesh, and B. Sampathkumar, “Cybersecurity Threat Detection Using Deep Learning and Anomaly Detection Techniques,” in 2024 Int. Conf. Knowl. Eng. Commun. Syst. (ICKECS), vol. 1, pp. 1-7, 2024, doi: 10.1109/ICKECS61492.2024.10617347.
S. Wang, R. Jiang, Z. Wang, and Y. Zhou, “Deep Learning-based Anomaly Detection and Log Analysis for Computer Networks,” J. Inf. Comput., vol. 2024, no. 2, pp. 34-63, 2024, [Online]. Available: https://doi.org/10.30211/JIC.202402.005.
V. Çetin and O. Yıldız, “A comprehensive review on data preprocessing techniques in data analysis,” Pamukkale Üniversitesi Mühendislik Bilim. Derg., vol. 28, no. 2, pp. 299-312, 2022.
A. Sharma, M. Agrawal, S. D. Roy, V. Gupta, P. Vashisht, and T. Sidhu, “Deep learning to diagnose Peripapillary Atrophy in retinal images along with statistical features,” Biomed. Signal Process. Control, vol. 64, p. 102254, 2021.
M. M. Lasiyono, N. Nurhayati, T. G. Soares, and M. Mulyadi, “Enhancing Support Vector Machine Performance for Heart Attack Prediction using RobustScaler-Based Outlier Handling,” Bull. Informatics Data Sci., vol. 4, no. 1, pp. 1-9, 2025.
A. Falini, “A review on the selection criteria for the truncated SVD in Data Science applications,” J. Comput. Math. Data Sci., vol. 5, p. 100064, 2022.
W. Xu, L. Huang, A. Fox, D. Patterson, and M. I. Jordan, “Detecting large-scale system problems by mining console logs,” in Proc. ACM SIGOPS 22nd Symp. Operating Systems Principles, 2009, pp. 117-132.