Proceedings of International Conference on Applied Innovation in IT  ·  2026/03/31  ·  Vol. 14  ·  Issue 1  ·  pp. 19–31
Evaluation Framework for Crowd Counting Analysis Using Deep Learning Techniques
Fatima Jawad Kadhim and Ayad Hameed Mousa
Crowd counting refers to the process of counting the number of people in a specific area. It is a technique with broad applications in urban planning, healthcare, emergency management, security, and military strategies. However, this technique faces challenges such as visual distortions, perspective variation, and heterogeneous distribution of individuals, which increase the difficulty of calculations, especially in densely populated areas. Recent advances in convolutional neural networks (CNNs) and the creation of large datasets have contributed to significant advances in crowd counting methods in recent years. While deep learning has greatly advanced the field, a comprehensive analysis of the methodologies, challenges, and limitations of recent studies is lacking. This paper addresses this gap by: (1) conducting a Systematic Literature Review (SLR) of crowd-counting research; (2) proposing a novel evaluation framework (EFC2ADL) to classify and assess studies based on key criteria like datasets, loss functions, and metrics; and (3) validating the framework's relevance and comprehensiveness through expert review before employing it to evaluate 50 recent papers. The proposed Framework provides a structured basis for understanding trends and guiding future research in deep learning-based crowd counting.
Crowd Counting Convolutional Neural Network Deep Learning Evaluation Framework Systematic Literature Review.
References
  1. F. Xiong, X. Shi, and D.-Y. Yeung, “Spatiotemporal modeling for crowd counting in videos,” in Proc. IEEE Int. Conf. Computer Vision, 2017, pp. 5151–5159.
  2. J. E. Almeida, R. Rosseti, and A. L. Coelho, “Crowd simulation modeling applied to emergency and evacuation simulations using multi-agent systems,” 2013. [Online]. Available: http://arxiv.org/abs/1303.4692
  3. A. Abdelghany, K. Abdelghany, H. Mahmassani, and W. Alhalabi, “Modeling framework for optimal evacuation of large-scale crowded pedestrian facilities,” Eur. J. Oper. Res., vol. 237, no. 3, pp. 1105–1118, 2014, doi: 10.1016/j.ejor.2014.02.054.
  4. S. Saxena, “Crowd behavior recognition for video surveillance,” in Int. Conf. Adv. Concepts for Intelligent Vision Systems, Berlin, 2008, pp. 970–981.
  5. T. Ko, “A survey on behavior analysis in video surveillance for homeland security applications,” in 2008 37th IEEE Applied Imagery Pattern Recognition Workshop, 2008, pp. 1–8.
  6. Y. Wang and Y. Zou, “Fast visual object counting via example-based density estimation,” in 2016 IEEE Int. Conf. Image Processing (ICIP), 2016, pp. 3653–3657.
  7. V. Lempitsky and A. Zisserman, “Learning to count objects in images,” in Advances in Neural Information Processing Systems, 2010, p. 23.
  8. D. Onoro-Rubio and R. J. López-Sastre, “Towards perspective-free object counting with deep learning,” in European Conf. Computer Vision, Cham, 2016, pp. 615–629, doi: 10.1007/978-3-319-46478-7.
  9. G. French, M. Fisher, M. Mackiewicz, and C. Needle, “Convolutional neural networks for counting fish in fisheries surveillance video,” in Proc. British Machine Vision Conf. Workshop, BMVA Press, 2015, doi: 10.5244/c.29.mvab.7.
  10. R. B. Girshick, J. Donahue, T. Darrell, and J. Malik, “Rich feature hierarchies for accurate object detection and semantic segmentation,” in Proc. IEEE Conf. Computer Vision and Pattern Recognition (CVPR), IEEE, 2014, pp. 580–587.
  11. S. Ren, K. He, R. B. Girshick, and J. Sun, “Faster R-CNN: Towards real-time object detection with region proposal networks,” IEEE Trans. Pattern Anal. Mach. Intell., vol. 39, no. 6, pp. 1137–1149, 2017.
  12. N. Paragios and V. Ramesh, “A MRF-based approach for real-time subway monitoring,” in Proc. IEEE Conf. Computer Vision and Pattern Recognition (CVPR 2001), IEEE, 2001, pp. 1034–1040.
  13. A. B. Chan, Z.-S. J. Liang, and N. Vasconcelos, “Privacy preserving crowd monitoring: Counting people without people models or tracking,” in Proc. IEEE Conf. Computer Vision and Pattern Recognition, Anchorage, Alaska, 2008, pp. 1–7.
  14. Y. Tian, L. Sigal, H. Badino, F. D. la Torre, and Y. Liu, “Latent Gaussian mixture regression for human pose estimation,” in Asian Conf. Computer Vision, Springer, Berlin, Heidelberg, 2010, pp. 679–690.
  15. F. Min, X. Pei, X. Li, Q. Liu, and Y. Huang, “Fast crowd density estimation with convolutional neural networks,” Eng. Appl. Artif. Intell., vol. 43, pp. 81–88, 2015, doi: 10.1016/j.engappai.2015.04.006.
  16. O. Elharrouss et al., “Loss functions in deep learning: A comprehensive review,” 2025. [Online]. Available: http://arxiv.org/abs/2504.04242
  17. R. Gouiaa, M. A. Akhloufi, and M. Shahbazi, “Advances in convolution neural networks based crowd counting and density estimation,” Big Data Cogn. Comput., vol. 5, no. 4, p. 50, 2021.
  18. S. Zhang, H. Li, W. Kong, L. Wang, and X. Niu, “An object counting network based on hierarchical context and feature fusion,” J. Vis. Commun. Image Represent., vol. 62, pp. 166–173, 2019, doi: 10.1016/j.jvcir.2019.05.003.
  19. B. Li, H. Huang, A. Zhang, P. Liu, and C. Liu, “Approaches on crowd counting and density estimation: A review,” Pattern Anal. Appl., vol. 24, no. 3, pp. 853–874, 2021, doi: 10.1007/s10044-021-00959-z.
  20. M. A. Hossain, M. Hosseinzadeh, O. Chanda, and Y. Wang, “Crowd counting using scale-aware attention networks,” Mar. 2019. [Online]. Available: http://arxiv.org/abs/1903.02025
  21. N. Liu, C. Zou, Y. Long, Q. Niu, L. Pan, and H. Wu, “ADCrowdNet: An attention-injective deformable convolutional network for crowd understanding,” in Proc. IEEE/CVF Conf. Computer Vision and Pattern Recognition, 2019, pp. 3225–3234.
  22. Y. Liu, M. Shi, Q. Zhao, and X. Wang, “Point in, box out: Beyond counting persons in crowds,” in Proc. IEEE/CVF Conf. Computer Vision and Pattern Recognition, 2019, pp. 6469–6478.
  23. L. Zhu, Z. Zhao, C. Lu, Y. Lin, Y. Peng, and T. Yao, “Dual path multi-scale fusion networks with attention for crowd counting,” Feb. 2019. [Online]. Available: http://arxiv.org/abs/1902.01115
  24. W. Liu, M. Salzmann, and P. Fua, “Context-aware crowd counting,” in Proc. IEEE/CVF Conf. Computer Vision and Pattern Recognition, 2019, pp. 5099–5108. [Online]. Available: https://sites.google.com/view/weizheliu/home/
  25. Z. Ma, X. Wei, X. Hong, and Y. Gong, “Bayesian loss for crowd count estimation with point supervision,” in Proc. IEEE/CVF Int. Conf. Computer Vision, 2019, pp. 6142–6151.
  26. F. Dai, H. Liu, Y. Ma, X. Zhang, and Q. Zhao, “Dense scale network for crowd counting,” in ICMR 2021 - Proc. Int. Conf. Multimedia Retrieval, ACM, 2021, pp. 64–72, doi: 10.1145/3460426.3463628.
  27. V. M. Patel and V. A. Sindagi, “Multi-level bottom-top and top-bottom feature fusion for crowd counting,” in Proc. IEEE/CVF Int. Conf. Computer Vision, 2019, pp. 1002–1012.
  28. Z. Cheng, J. Li, Q. Dai, X. Wu, J. He, and A. G. Hauptmann, “Improving the learning of multi-column convolutional neural network for crowd counting,” in Proc. 27th ACM Int. Conf. Multimedia, 2019, pp. 1897–1906.
  29. P. Thanasutives, K. Fukui, M. Numao, and B. Kijsirikul, “Encoder-decoder based convolutional neural networks with multi-scale-aware modules for crowd counting,” in 2020 25th Int. Conf. Pattern Recognit. (ICPR), IEEE, Jan. 2021, pp. 2382–2389, doi: 10.1109/ICPR48806.2021.9413286.
  30. L. Liu, J. Chen, H. Wu, T. Chen, G. Li, and L. Lin, “Efficient crowd counting via structured knowledge transfer,” in Proc. 28th ACM Int. Conf. Multimedia, 2020, pp. 2645–2654. [Online]. Available: http://arxiv.org/abs/2003.10120
  31. J. J. Cheng, Z. Chen, X. Zhang, Y. Li, and X. Jing, “Exploit the potential of multi-column architecture for crowd counting,” 2020, pp. 1–9.
  32. X. Ding, F. He, Z. Lin, Y. Wang, H. Guo, and Y. Huang, “Crowd density estimation using fusion of multi-layer features,” IEEE Trans. Intell. Transp. Syst., vol. 22, no. 8, pp. 4776–4787, 2020, doi: 10.1109/TITS.2020.2983475.
  33. Y. Wang, W. Zhang, Y. Liu, and J. Zhu, “Multi-density map fusion network for crowd counting,” Neurocomputing, vol. 397, pp. 31–38, 2020, doi: 10.1016/j.neucom.2020.02.010.
  34. L. Dong, H. Zhang, Y. Ji, and Y. Ding, “Crowd counting by using multi-level density-based spatial information: A multi-scale CNN framework,” Inf. Sci. (N. Y.), vol. 528, pp. 79–91, 2020, doi: 10.1016/j.ins.2020.04.001.
  35. Z. Huo, B. I. N. Lu, A. Mi, F. E. N. Luo, and Y. Qiao, “Learning multi-level features to improve crowd counting,” IEEE Access, vol. 8, pp. 211391–211400, 2020, doi: 10.1109/ACCESS.2020.3039998.
  36. M. Zhu, X. Wang, J. Tang, N. Wang, and L. Qu, “Attentive multi-stage convolutional neural network for crowd counting,” Pattern Recognit. Lett., vol. 135, pp. 279–285, 2020, doi: 10.1016/j.patrec.2020.05.009.
  37. Y. Wang, W. Zhang, Y. Liu, and J. Zhu, “Two-branch fusion network with attention map for crowd counting,” Neurocomputing, vol. 411, pp. 1–8, 2020, doi: 10.1016/j.neucom.2020.06.034.
  38. C. Wang et al., “Uniformity in heterogeneity: Diving deep into count interval partition for crowd counting,” in Proc. IEEE/CVF Int. Conf. Computer Vision, 2021, pp. 3234–3242.
  39. Y. Tian, X. Chu, and H. Wang, “CCTrans: Simplifying and improving crowd counting with transformer,” 2021. [Online]. Available: http://arxiv.org/abs/2109.14483
  40. U. Sajid, W. Ma, and G. Wang, “Multi-resolution fusion and multi-scale input priors based crowd counting,” in 2020 25th Int. Conf. Pattern Recognit. (ICPR), 2020, pp. 5790–5797. [Online]. Available: http://arxiv.org/abs/2010.01664
  41. M. Tian, H. Guo, and C. Long, “Multi-level attentive convolutional neural network for crowd counting,” 2021. [Online]. Available: http://arxiv.org/abs/2105.11422
  42. X. Zeng, Q. Guo, H. Duan, and Y. Wu, “Multi-level features extraction network with gating mechanism for crowd counting,” IET Image Process., vol. 15, no. 14, pp. 3534–3542, 2021.
  43. G. Chen and P. Guo, “Enhanced information fusion network for crowd counting,” Jan. 2021. [Online]. Available: http://arxiv.org/abs/2101.04279
  44. B. Zhang, N. Wang, Z. Zhao, A. Abraham, and H. Liu, “Crowd counting based on attention-guided multi-scale fusion networks,” Neurocomputing, vol. 451, pp. 12–24, 2021, doi: 10.1016/j.neucom.2021.04.045.
  45. F. Zhu, H. Yan, X. Chen, T. Li, and Z. Zhang, “A multi-scale and multi-level feature aggregation network,” Neurocomputing, vol. 423, pp. 46–56, 2020, doi: 10.1016/j.neucom.2020.09.059.
  46. Y. Xia, Y. He, S. Peng, Q. Yang, and B. Yin, “CFFNet: Coordinated feature fusion network for crowd counting,” Image Vis. Comput., vol. 112, p. 104242, 2021, doi: 10.1016/j.imavis.2021.104242.
  47. S. D. Khan, Y. Salih, B. Zafar, and A. Noorwali, “A deep-fusion network for crowd counting in high-density crowded scenes,” Int. J. Comput. Intell. Syst., vol. 14, no. 1, p. 168, Dec. 2021, doi: 10.1007/s44196-021-00016-x.
  48. Y. Ma, “Inception-based crowd counting – being fast while remaining accurate,” 2022. [Online]. Available: http://arxiv.org/abs/2210.09796
  49. Y. Ma, V. Sanchez, and T. Guha, “Fusioncount: Efficient crowd counting via multiscale feature fusion,” in 2022 IEEE Int. Conf. Image Processing (ICIP), IEEE, 2022, pp. 3256–3260.
  50. M. Wang, H. Cai, X. Han, J. Zhou, and M. Gong, “STNet: Scale tree network with multi-level auxiliator for crowd counting,” IEEE Trans. Multimedia, vol. 25, pp. 2074–2084, 2022.
  51. H. Lin, Z. Ma, R. Ji, Y. Wang, and X. Hong, “Boosting crowd counting via multifaceted attention,” in Proc. IEEE/CVF Conf. Computer Vision and Pattern Recognition, 2022, pp. 19628–19637.
  52. W. Shu, J. Wan, K. C. Tan, S. Kwong, and A. B. Chan, “Crowd counting in the frequency domain,” in Proc. IEEE/CVF Conf. Computer Vision and Pattern Recognition, 2022, pp. 19618–19627.
  53. J. Gao et al., “Deep rank-consistent pyramid model for enhanced crowd counting,” IEEE Trans. Neural Netw. Learn. Syst., Nov. 2023, pp. 1–13, doi: 10.1109/TNNLS.2023.3336774.
  54. Z. Du, M. Shi, J. Deng, and S. Zafeiriou, “Redesigning multi-scale neural network for crowd counting,” IEEE Trans. Image Process., vol. 32, pp. 3664–3678, 2023, doi: 10.1109/TIP.2023.3289290.
  55. Z. Miao, Y. Zhang, Y. Peng, H. Peng, and B. Yin, “DTCC: Multi-level dilated convolution with transformer for weakly-supervised crowd counting,” Comput. Vis. Media (Beijing), vol. 9, no. 4, pp. 859–873, 2023.
  56. J. Zhang, L. Ye, J. Wu, D. Sun, and C. Wu, “A fusion-based dense crowd counting method for multi-imaging systems,” Int. J. Intell. Syst., vol. 2023, no. 1, p. 6677622, 2023.
  57. X. Guo et al., “Crowd counting via attention and multi-feature fused network,” Human-centric Comput. Inf. Sci., vol. 13, no. Nov., 2023.
  58. Y. Chaudhuri, A. Kumar, O. C. Phukan, and A. B. Buduru, “A lightweight feature fusion architecture for resource-constrained crowd counting,” 2024. [Online]. Available: http://arxiv.org/abs/2401.05968
  59. Y. Yin and B. Yin, “Cross-level feature relocation: Mitigating information loss in cross-layer feature fusion for crowd counting,” in Proc. 16th Asian Conf. Mach. Learn., 2024.
  60. L. Chen et al., “The effectiveness of a simplified model structure for crowd counting,” 2024. [Online]. Available: http://arxiv.org/abs/2404.07847
  61. H. Ma, L. Zhang, and S. Shi, “VMambaCC: A visual state space model for crowd counting,” 2024.
  62. Y. Zhang, W. Song, M. Shao, and X. Liu, “MRSNet: Multi-resolution scale feature fusion-based universal density counting network,” Sensors, vol. 24, no. 18, p. 5974, 2024.
  63. H.-Y. Ma, L. Zhang, and X.-Y. Wei, “FGENet: Fine-grained extraction network for congested crowd counting,” in Int. Conf. Multimedia Modeling, 2024, pp. 43–56. [Online]. Available: http://arxiv.org/abs/2401.01208
  64. J. Yue, J. Cheng, W. Wu, and X. Tang, “FGEFNet: Fine-grained extraction and flow network for crowd counting,” 2024, doi: 10.21203/rs.3.rs-4607436/v1.
  65. S. Goel and D. Koundal, “A MaskFormer EfficientNet instance segmentation approach for crowd counting,” Sci. Rep., vol. 15, no. 1, p. 13275, 2025.
  66. S. Jiang et al., “ProgRoCC: A progressive approach to rough crowd counting,” 2025. [Online]. Available: http://arxiv.org/abs/2504.13405
  67. J. Yu and H. Hu, “Multiscale regional calibration network for crowd counting,” Sci. Rep., vol. 15, no. 1, p. 2866, 2025.
  68. P. Liu, H. Li, S. Lei, N. Liu, B. Feng, and X. Wu, “RCCFormer: A robust crowd counting network based on transformer,” Apr. 2025. [Online]. Available: http://arxiv.org/abs/2504.04935

Proceedings of the International Conference on Applied Innovations in IT by Anhalt University of Applied Sciences is licensed under CC BY-SA 4.0  ·  This work is licensed under a Creative Commons Attribution-ShareAlike 4.0 International License

ICAIIT 2026
International Conference on Applied Innovation in IT
Navigation
Publisher
ISSN2199-8876
Location Anhalt University of Applied Sciences
Phone +49 (0) 3496 67 5611
Address Building 01, Room 425
Bernburger Str. 55
D-06366 Köthen, Germany
Open Access License

All works are licensed under the Creative Commons Attribution-ShareAlike 4.0 International License (CC BY-SA 4.0), unless otherwise noted.

Published by ICAIIT in cooperation with Anhalt University of Applied Sciences.

© 2026 ICAIIT — International Conference on Applied Innovations in IT. Anhalt University of Applied Sciences, Köthen, Germany.
Visitors: site traffic counter