Evaluation Framework for Crowd Counting Analysis Using Deep Learning Techniques

Mousa, Fatima

doi:10.25673/123511

Proceedings of International Conference on Applied Innovation in IT · 2026/03/31 · Vol. 14 · Issue 1 · pp. 19–31

Evaluation Framework for Crowd Counting Analysis Using Deep Learning Techniques

Fatima Jawad Kadhim and Ayad Hameed Mousa

📄 Download PDF DOI: 10.25673/123511

Abstract

Crowd counting refers to the process of counting the number of people in a specific area. It is a technique with broad applications in urban planning, healthcare, emergency management, security, and military strategies. However, this technique faces challenges such as visual distortions, perspective variation, and heterogeneous distribution of individuals, which increase the difficulty of calculations, especially in densely populated areas. Recent advances in convolutional neural networks (CNNs) and the creation of large datasets have contributed to significant advances in crowd counting methods in recent years. While deep learning has greatly advanced the field, a comprehensive analysis of the methodologies, challenges, and limitations of recent studies is lacking. This paper addresses this gap by: (1) conducting a Systematic Literature Review (SLR) of crowd-counting research; (2) proposing a novel evaluation framework (EFC2ADL) to classify and assess studies based on key criteria like datasets, loss functions, and metrics; and (3) validating the framework's relevance and comprehensiveness through expert review before employing it to evaluate 50 recent papers. The proposed Framework provides a structured basis for understanding trends and guiding future research in deep learning-based crowd counting.

Keywords

Crowd Counting Convolutional Neural Network Deep Learning Evaluation Framework Systematic Literature Review.

References

F. Xiong, X. Shi, and D.-Y. Yeung, “Spatiotemporal modeling for crowd counting in videos,” in Proc. IEEE Int. Conf. Computer Vision, 2017, pp. 5151–5159.
J. E. Almeida, R. Rosseti, and A. L. Coelho, “Crowd simulation modeling applied to emergency and evacuation simulations using multi-agent systems,” 2013. [Online]. Available: http://arxiv.org/abs/1303.4692
A. Abdelghany, K. Abdelghany, H. Mahmassani, and W. Alhalabi, “Modeling framework for optimal evacuation of large-scale crowded pedestrian facilities,” Eur. J. Oper. Res., vol. 237, no. 3, pp. 1105–1118, 2014, doi: 10.1016/j.ejor.2014.02.054.
S. Saxena, “Crowd behavior recognition for video surveillance,” in Int. Conf. Adv. Concepts for Intelligent Vision Systems, Berlin, 2008, pp. 970–981.
T. Ko, “A survey on behavior analysis in video surveillance for homeland security applications,” in 2008 37th IEEE Applied Imagery Pattern Recognition Workshop, 2008, pp. 1–8.
Y. Wang and Y. Zou, “Fast visual object counting via example-based density estimation,” in 2016 IEEE Int. Conf. Image Processing (ICIP), 2016, pp. 3653–3657.
V. Lempitsky and A. Zisserman, “Learning to count objects in images,” in Advances in Neural Information Processing Systems, 2010, p. 23.
D. Onoro-Rubio and R. J. López-Sastre, “Towards perspective-free object counting with deep learning,” in European Conf. Computer Vision, Cham, 2016, pp. 615–629, doi: 10.1007/978-3-319-46478-7.
G. French, M. Fisher, M. Mackiewicz, and C. Needle, “Convolutional neural networks for counting fish in fisheries surveillance video,” in Proc. British Machine Vision Conf. Workshop, BMVA Press, 2015, doi: 10.5244/c.29.mvab.7.
R. B. Girshick, J. Donahue, T. Darrell, and J. Malik, “Rich feature hierarchies for accurate object detection and semantic segmentation,” in Proc. IEEE Conf. Computer Vision and Pattern Recognition (CVPR), IEEE, 2014, pp. 580–587.
S. Ren, K. He, R. B. Girshick, and J. Sun, “Faster R-CNN: Towards real-time object detection with region proposal networks,” IEEE Trans. Pattern Anal. Mach. Intell., vol. 39, no. 6, pp. 1137–1149, 2017.
N. Paragios and V. Ramesh, “A MRF-based approach for real-time subway monitoring,” in Proc. IEEE Conf. Computer Vision and Pattern Recognition (CVPR 2001), IEEE, 2001, pp. 1034–1040.
A. B. Chan, Z.-S. J. Liang, and N. Vasconcelos, “Privacy preserving crowd monitoring: Counting people without people models or tracking,” in Proc. IEEE Conf. Computer Vision and Pattern Recognition, Anchorage, Alaska, 2008, pp. 1–7.
Y. Tian, L. Sigal, H. Badino, F. D. la Torre, and Y. Liu, “Latent Gaussian mixture regression for human pose estimation,” in Asian Conf. Computer Vision, Springer, Berlin, Heidelberg, 2010, pp. 679–690.
F. Min, X. Pei, X. Li, Q. Liu, and Y. Huang, “Fast crowd density estimation with convolutional neural networks,” Eng. Appl. Artif. Intell., vol. 43, pp. 81–88, 2015, doi: 10.1016/j.engappai.2015.04.006.
O. Elharrouss et al., “Loss functions in deep learning: A comprehensive review,” 2025. [Online]. Available: http://arxiv.org/abs/2504.04242
R. Gouiaa, M. A. Akhloufi, and M. Shahbazi, “Advances in convolution neural networks based crowd counting and density estimation,” Big Data Cogn. Comput., vol. 5, no. 4, p. 50, 2021.
S. Zhang, H. Li, W. Kong, L. Wang, and X. Niu, “An object counting network based on hierarchical context and feature fusion,” J. Vis. Commun. Image Represent., vol. 62, pp. 166–173, 2019, doi: 10.1016/j.jvcir.2019.05.003.
B. Li, H. Huang, A. Zhang, P. Liu, and C. Liu, “Approaches on crowd counting and density estimation: A review,” Pattern Anal. Appl., vol. 24, no. 3, pp. 853–874, 2021, doi: 10.1007/s10044-021-00959-z.
M. A. Hossain, M. Hosseinzadeh, O. Chanda, and Y. Wang, “Crowd counting using scale-aware attention networks,” Mar. 2019. [Online]. Available: http://arxiv.org/abs/1903.02025
N. Liu, C. Zou, Y. Long, Q. Niu, L. Pan, and H. Wu, “ADCrowdNet: An attention-injective deformable convolutional network for crowd understanding,” in Proc. IEEE/CVF Conf. Computer Vision and Pattern Recognition, 2019, pp. 3225–3234.
Y. Liu, M. Shi, Q. Zhao, and X. Wang, “Point in, box out: Beyond counting persons in crowds,” in Proc. IEEE/CVF Conf. Computer Vision and Pattern Recognition, 2019, pp. 6469–6478.
L. Zhu, Z. Zhao, C. Lu, Y. Lin, Y. Peng, and T. Yao, “Dual path multi-scale fusion networks with attention for crowd counting,” Feb. 2019. [Online]. Available: http://arxiv.org/abs/1902.01115
W. Liu, M. Salzmann, and P. Fua, “Context-aware crowd counting,” in Proc. IEEE/CVF Conf. Computer Vision and Pattern Recognition, 2019, pp. 5099–5108. [Online]. Available: https://sites.google.com/view/weizheliu/home/
Z. Ma, X. Wei, X. Hong, and Y. Gong, “Bayesian loss for crowd count estimation with point supervision,” in Proc. IEEE/CVF Int. Conf. Computer Vision, 2019, pp. 6142–6151.
F. Dai, H. Liu, Y. Ma, X. Zhang, and Q. Zhao, “Dense scale network for crowd counting,” in ICMR 2021 - Proc. Int. Conf. Multimedia Retrieval, ACM, 2021, pp. 64–72, doi: 10.1145/3460426.3463628.
V. M. Patel and V. A. Sindagi, “Multi-level bottom-top and top-bottom feature fusion for crowd counting,” in Proc. IEEE/CVF Int. Conf. Computer Vision, 2019, pp. 1002–1012.
Z. Cheng, J. Li, Q. Dai, X. Wu, J. He, and A. G. Hauptmann, “Improving the learning of multi-column convolutional neural network for crowd counting,” in Proc. 27th ACM Int. Conf. Multimedia, 2019, pp. 1897–1906.
P. Thanasutives, K. Fukui, M. Numao, and B. Kijsirikul, “Encoder-decoder based convolutional neural networks with multi-scale-aware modules for crowd counting,” in 2020 25th Int. Conf. Pattern Recognit. (ICPR), IEEE, Jan. 2021, pp. 2382–2389, doi: 10.1109/ICPR48806.2021.9413286.
L. Liu, J. Chen, H. Wu, T. Chen, G. Li, and L. Lin, “Efficient crowd counting via structured knowledge transfer,” in Proc. 28th ACM Int. Conf. Multimedia, 2020, pp. 2645–2654. [Online]. Available: http://arxiv.org/abs/2003.10120
J. J. Cheng, Z. Chen, X. Zhang, Y. Li, and X. Jing, “Exploit the potential of multi-column architecture for crowd counting,” 2020, pp. 1–9.
X. Ding, F. He, Z. Lin, Y. Wang, H. Guo, and Y. Huang, “Crowd density estimation using fusion of multi-layer features,” IEEE Trans. Intell. Transp. Syst., vol. 22, no. 8, pp. 4776–4787, 2020, doi: 10.1109/TITS.2020.2983475.
Y. Wang, W. Zhang, Y. Liu, and J. Zhu, “Multi-density map fusion network for crowd counting,” Neurocomputing, vol. 397, pp. 31–38, 2020, doi: 10.1016/j.neucom.2020.02.010.
L. Dong, H. Zhang, Y. Ji, and Y. Ding, “Crowd counting by using multi-level density-based spatial information: A multi-scale CNN framework,” Inf. Sci. (N. Y.), vol. 528, pp. 79–91, 2020, doi: 10.1016/j.ins.2020.04.001.
Z. Huo, B. I. N. Lu, A. Mi, F. E. N. Luo, and Y. Qiao, “Learning multi-level features to improve crowd counting,” IEEE Access, vol. 8, pp. 211391–211400, 2020, doi: 10.1109/ACCESS.2020.3039998.
M. Zhu, X. Wang, J. Tang, N. Wang, and L. Qu, “Attentive multi-stage convolutional neural network for crowd counting,” Pattern Recognit. Lett., vol. 135, pp. 279–285, 2020, doi: 10.1016/j.patrec.2020.05.009.
Y. Wang, W. Zhang, Y. Liu, and J. Zhu, “Two-branch fusion network with attention map for crowd counting,” Neurocomputing, vol. 411, pp. 1–8, 2020, doi: 10.1016/j.neucom.2020.06.034.
C. Wang et al., “Uniformity in heterogeneity: Diving deep into count interval partition for crowd counting,” in Proc. IEEE/CVF Int. Conf. Computer Vision, 2021, pp. 3234–3242.
Y. Tian, X. Chu, and H. Wang, “CCTrans: Simplifying and improving crowd counting with transformer,” 2021. [Online]. Available: http://arxiv.org/abs/2109.14483
U. Sajid, W. Ma, and G. Wang, “Multi-resolution fusion and multi-scale input priors based crowd counting,” in 2020 25th Int. Conf. Pattern Recognit. (ICPR), 2020, pp. 5790–5797. [Online]. Available: http://arxiv.org/abs/2010.01664
M. Tian, H. Guo, and C. Long, “Multi-level attentive convolutional neural network for crowd counting,” 2021. [Online]. Available: http://arxiv.org/abs/2105.11422
X. Zeng, Q. Guo, H. Duan, and Y. Wu, “Multi-level features extraction network with gating mechanism for crowd counting,” IET Image Process., vol. 15, no. 14, pp. 3534–3542, 2021.
G. Chen and P. Guo, “Enhanced information fusion network for crowd counting,” Jan. 2021. [Online]. Available: http://arxiv.org/abs/2101.04279
B. Zhang, N. Wang, Z. Zhao, A. Abraham, and H. Liu, “Crowd counting based on attention-guided multi-scale fusion networks,” Neurocomputing, vol. 451, pp. 12–24, 2021, doi: 10.1016/j.neucom.2021.04.045.
F. Zhu, H. Yan, X. Chen, T. Li, and Z. Zhang, “A multi-scale and multi-level feature aggregation network,” Neurocomputing, vol. 423, pp. 46–56, 2020, doi: 10.1016/j.neucom.2020.09.059.
Y. Xia, Y. He, S. Peng, Q. Yang, and B. Yin, “CFFNet: Coordinated feature fusion network for crowd counting,” Image Vis. Comput., vol. 112, p. 104242, 2021, doi: 10.1016/j.imavis.2021.104242.
S. D. Khan, Y. Salih, B. Zafar, and A. Noorwali, “A deep-fusion network for crowd counting in high-density crowded scenes,” Int. J. Comput. Intell. Syst., vol. 14, no. 1, p. 168, Dec. 2021, doi: 10.1007/s44196-021-00016-x.
Y. Ma, “Inception-based crowd counting – being fast while remaining accurate,” 2022. [Online]. Available: http://arxiv.org/abs/2210.09796
Y. Ma, V. Sanchez, and T. Guha, “Fusioncount: Efficient crowd counting via multiscale feature fusion,” in 2022 IEEE Int. Conf. Image Processing (ICIP), IEEE, 2022, pp. 3256–3260.
M. Wang, H. Cai, X. Han, J. Zhou, and M. Gong, “STNet: Scale tree network with multi-level auxiliator for crowd counting,” IEEE Trans. Multimedia, vol. 25, pp. 2074–2084, 2022.
H. Lin, Z. Ma, R. Ji, Y. Wang, and X. Hong, “Boosting crowd counting via multifaceted attention,” in Proc. IEEE/CVF Conf. Computer Vision and Pattern Recognition, 2022, pp. 19628–19637.
W. Shu, J. Wan, K. C. Tan, S. Kwong, and A. B. Chan, “Crowd counting in the frequency domain,” in Proc. IEEE/CVF Conf. Computer Vision and Pattern Recognition, 2022, pp. 19618–19627.
J. Gao et al., “Deep rank-consistent pyramid model for enhanced crowd counting,” IEEE Trans. Neural Netw. Learn. Syst., Nov. 2023, pp. 1–13, doi: 10.1109/TNNLS.2023.3336774.
Z. Du, M. Shi, J. Deng, and S. Zafeiriou, “Redesigning multi-scale neural network for crowd counting,” IEEE Trans. Image Process., vol. 32, pp. 3664–3678, 2023, doi: 10.1109/TIP.2023.3289290.
Z. Miao, Y. Zhang, Y. Peng, H. Peng, and B. Yin, “DTCC: Multi-level dilated convolution with transformer for weakly-supervised crowd counting,” Comput. Vis. Media (Beijing), vol. 9, no. 4, pp. 859–873, 2023.
J. Zhang, L. Ye, J. Wu, D. Sun, and C. Wu, “A fusion-based dense crowd counting method for multi-imaging systems,” Int. J. Intell. Syst., vol. 2023, no. 1, p. 6677622, 2023.
X. Guo et al., “Crowd counting via attention and multi-feature fused network,” Human-centric Comput. Inf. Sci., vol. 13, no. Nov., 2023.
Y. Chaudhuri, A. Kumar, O. C. Phukan, and A. B. Buduru, “A lightweight feature fusion architecture for resource-constrained crowd counting,” 2024. [Online]. Available: http://arxiv.org/abs/2401.05968
Y. Yin and B. Yin, “Cross-level feature relocation: Mitigating information loss in cross-layer feature fusion for crowd counting,” in Proc. 16th Asian Conf. Mach. Learn., 2024.
L. Chen et al., “The effectiveness of a simplified model structure for crowd counting,” 2024. [Online]. Available: http://arxiv.org/abs/2404.07847
H. Ma, L. Zhang, and S. Shi, “VMambaCC: A visual state space model for crowd counting,” 2024.
Y. Zhang, W. Song, M. Shao, and X. Liu, “MRSNet: Multi-resolution scale feature fusion-based universal density counting network,” Sensors, vol. 24, no. 18, p. 5974, 2024.
H.-Y. Ma, L. Zhang, and X.-Y. Wei, “FGENet: Fine-grained extraction network for congested crowd counting,” in Int. Conf. Multimedia Modeling, 2024, pp. 43–56. [Online]. Available: http://arxiv.org/abs/2401.01208
J. Yue, J. Cheng, W. Wu, and X. Tang, “FGEFNet: Fine-grained extraction and flow network for crowd counting,” 2024, doi: 10.21203/rs.3.rs-4607436/v1.
S. Goel and D. Koundal, “A MaskFormer EfficientNet instance segmentation approach for crowd counting,” Sci. Rep., vol. 15, no. 1, p. 13275, 2025.
S. Jiang et al., “ProgRoCC: A progressive approach to rough crowd counting,” 2025. [Online]. Available: http://arxiv.org/abs/2504.13405
J. Yu and H. Hu, “Multiscale regional calibration network for crowd counting,” Sci. Rep., vol. 15, no. 1, p. 2866, 2025.
P. Liu, H. Li, S. Lei, N. Liu, B. Feng, and X. Wu, “RCCFormer: A robust crowd counting network based on transformer,” Apr. 2025. [Online]. Available: http://arxiv.org/abs/2504.04935