Proceedings of International Conference on Applied Innovation in IT
2026/03/31, Volume 14, Issue 1, pp.651-660

Recognizing Gesture images with ViT and Spatial Attention Regularization


Zahraa Thamer, Noor S.Sagheer, Ashwan A.Abdulmunem, Hawraa Thamer1 and Og˘uz Ata


Abstract: One important area of Human-Computer Interaction (HCI) is image-based gesture recognition. Despite tremendous advancements, it is still very difficult to achieve reliable and accurate gesture recognition in unrestricted, real-world settings. Conventional techniques frequently find it difficult to handle changes in lighting, background noise, occlusions, size variations, and the innate similarity between various gestures. To enhance the discriminative ability of the Vision Transformer (ViT) model for intricate hand gestures, this work presents a carefully planned fine-tuning methodology. Encourage ViT to concentrate on salient gesture regions while remaining resilient to environmental noise; the proposed method combines an adaptive learning rate scheduling system with a novel spatial attention regulator during fine-tuning. Experiments on a challenging and varied gesture dataset demonstrate that the proposed approach significantly performs better than state-of-the-art methods, attaining superior accuracy reaching 100% and demonstrating generalization capabilities. This study opens the door for more user-friendly human-computer interaction systems by providing a highly effective and flexible framework for sophisticated image-based gesture recognition systems.

Keywords: Computer Vision, ViT, Gesture Images, Spatial Attention Regularization, Image Analysis, Hand Gesture Recognition d.

DOI: Under indexing

Download: PDF

References:

  1. A. Osman Hashi, S. Zaiton Mohd Hashim, and A. Bte Asamah, “A systematic review of hand gesture recognition: An update from 2018 to 2024,” IEEE Access, vol. 12, pp. 143599-143626, 2024, doi: 10.1109/ACCESS.2024.3421992.
  2. A. Krizhevsky, I. Sutskever, and G. E. Hinton, “ImageNet classification with deep convolutional neural networks,” Commun. ACM, vol. 60, no. 6, pp. 84-90, May 2017, doi: 10.1145/3065386.
  3. P. Molchanov, S. Gupta, K. Kim, and J. Kautz, “Hand gesture recognition with 3D convolutional neural networks,” in Proc. IEEE Conf. Comput. Vis. Pattern Recognit. Workshops (CVPRW), Boston, MA, USA, 2015, pp. 1-7, doi: 10.1109/CVPRW.2015.7301342.
  4. P. Mittal, B. Sharma, and D. P. Yadav, “Comparative analysis between CNN and ViT using brain MRI dataset,” in Proc. 8th Int. Conf. Parallel, Distributed and Grid Comput. (PDGC), Solan, India, 2024, pp. 290-295, doi: 10.1109/PDGC64653.2024.10984339.
  5. I. Pacal, B. Ozdemir, J. Zeynalov, H. Gasimov, and N. Pacal, “A novel CNN-ViT-based deep learning model for early skin cancer diagnosis,” Biomed. Signal Process. Control, vol. 104, p. 107627, 2025, doi: 10.1016/j.bspc.2025.107627.
  6. A. Al-Zebari, N. Omar, and A. Sengur, “Vision transformers-based hand gesture classification,” in Proc. 3rd Int. Informatics and Software Eng. Conf. (IISEC), Ankara, Turkey, 2022, pp. 1-3, doi: 10.1109/IISEC56263.2022.9998295.
  7. T. Kaggle, “Hand gesture recognition dataset,” Kaggle, 2022. [Online]. Available: https://www.kaggle.com/datasets/tapakah68/hand-gesture-recognition-dataset
  8. T.-H. Nguyen, B.-V. Ngo, and T.-N. Nguyen, “Vision-based hand gesture recognition using a YOLOv8n model for the navigation of a smart wheelchair,” Electronics, vol. 14, no. 4, p. 734, 2025, doi: 10.3390/electronics14040734.
  9. Shivani and S. B. Gupta, “A comprehensive analysis of recognition of hand gestures using machine learning,” Makara J. Technol., vol. 29, no. 1, Art. no. 5, 2025, doi: 10.7454/mst.v29i1.1679.
  10. C. K. Tan, K. M. Lim, R. K. Y. Chang, C. P. Lee, and A. Alqahtani, “HGR-ViT: Hand gesture recognition with vision transformer,” Sensors, vol. 23, no. 12, p. 5555, 2023, doi: 10.3390/s23125555.
  11. Y. Altaf, “Efficient hand sign recognition with fine-tuned faster vision transformers: A comparative study on benchmark image datasets,” J. Electr. Syst., vol. 20, no. 3, pp. 8082-8098, 2024.
  12. A. R. Asif et al., “Performance evaluation of convolutional neural network for hand gesture recognition using EMG,” Sensors, vol. 20, no. 6, p. 1642, 2020, doi: 10.3390/s20061642.
  13. H. Hellara, R. Barioul, S. Sahnoun, A. Fakhfakh, and O. Kanoun, “Comparative study of sEMG feature evaluation methods based on the hand gesture classification performance,” Sensors, vol. 24, no. 11, p. 3638, 2024, doi: 10.3390/s24113638.
  14. V.-D. Do, V.-H. Le, H.-S. Do, V.-N. Phan, and T.-H. Te, “TQU-HG dataset and comparative study for hand gesture recognition of RGB-based images using deep learning,” Indones. J. Electr. Eng. Comput. Sci., vol. 34, no. 3, pp. 1603-1617, 2024.
  15. K. Myagila and H. Kilavo, “A comparative study on performance of SVM and CNN in Tanzania sign language translation using image recognition,” Appl. Artif. Intell., vol. 36, no. 1, p. 2005297, 2021, doi: 10.1080/08839514.2021.2005297.
  16. S. Bhushan, M. Alshehri, I. Keshta, A. K. Chakraverti, J. Rajpurohit, and A. Abugabah, “An experimental analysis of various machine learning algorithms for hand gesture recognition,” Electronics, vol. 11, no. 6, p. 968, 2022, doi: 10.3390/electronics11060968.
  17. A. Vaswani, N. Shazeer, N. Parmar, J. Uszkoreit, L. Jones, A. N. Gomez, Ł. Kaiser, and I. Polosukhin, “Attention is all you need,” in Adv. Neural Inf. Process. Syst. (NeurIPS), vol. 30, 2017, pp. 5998-6008.
  18. A. Dosovitskiy, L. Beyer, A. Kolesnikov, D. Weissenborn, X. Zhai, T. Unterthiner, M. Dehghani, M. Minderer, G. Heigold, S. Gelly, J. Uszkoreit, and N. Houlsby, “An image is worth 16x16 words: Transformers for image recognition at scale,” arXiv preprint arXiv:2010.11929, 2020, doi: 10.48550/arXiv.2010.11929.
  19. K. Gupta, A. Singh, S. R. Yeduri, M. B. Srinivas, and L. R. Cenkeramaddi, “Hand gestures recognition using edge computing system based on vision transformer and lightweight CNN,” J. Ambient Intell. Humanized Comput., vol. 14, no. 3, pp. 2601-2615, 2023, doi: 10.1007/s12652-022-04506-4.


    HOME

       - Conference
       - Journal
       - Paper Submission to Conference
       - Paper Submission to Journal
       - Fee Payment
       - For Authors
       - For Reviewers
       - Important Dates
       - Conference Committee
       - Editorial Board
       - Reviewers
       - Last Proceeding


    PROCEEDINGS

       - Volume 14, Issue 1 (ICAIIT 2026)
       - Volume 13, Issue 5 (ICAIIT 2025)
       - Volume 13, Issue 4 (ICAIIT 2025)
       - Volume 13, Issue 3 (ICAIIT 2025)
       - Volume 13, Issue 2 (ICAIIT 2025)
       - Volume 13, Issue 1 (ICAIIT 2025)
       - Volume 12, Issue 2 (ICAIIT 2024)
       - Volume 12, Issue 1 (ICAIIT 2024)
       - Volume 11, Issue 2 (ICAIIT 2023)
       - Volume 11, Issue 1 (ICAIIT 2023)
       - Volume 10, Issue 1 (ICAIIT 2022)
       - Volume 9, Issue 1 (ICAIIT 2021)
       - Volume 8, Issue 1 (ICAIIT 2020)
       - Volume 7, Issue 1 (ICAIIT 2019)
       - Volume 7, Issue 2 (ICAIIT 2019)
       - Volume 6, Issue 1 (ICAIIT 2018)
       - Volume 5, Issue 1 (ICAIIT 2017)
       - Volume 4, Issue 1 (ICAIIT 2016)
       - Volume 3, Issue 1 (ICAIIT 2015)
       - Volume 2, Issue 1 (ICAIIT 2014)
       - Volume 1, Issue 1 (ICAIIT 2013)


    LAST CONFERENCE

       ICAIIT 2026
         - Photos
         - Reports

    PAST CONFERENCES

    ETHICS IN PUBLICATIONS

    ACCOMODATION

    CONTACT US

 

        

         Proceedings of the International Conference on Applied Innovations in IT by Anhalt University of Applied Sciences is licensed under CC BY-SA 4.0


                                                   This work is licensed under a Creative Commons Attribution-ShareAlike 4.0 International License


           ISSN 2199-8876
           Publisher: Edition Hochschule Anhalt
           Location: Anhalt University of Applied Sciences
           Email: leiterin.hsb@hs-anhalt.de
           Phone: +49 (0) 3496 67 5611
           Address: Building 01 - Red Building, Top floor, Room 425, Bernburger Str. 55, D-06366 Köthen, Germany

        site traffic counter

Creative Commons License
Except where otherwise noted, all works and proceedings on this site is licensed under Creative Commons Attribution-ShareAlike 4.0 International License.