Proceedings of International Conference on Applied Innovation in IT  ·  2026/04/22  ·  Vol. 14  ·  Issue 2  ·  pp. 103–110
A Comparative Study of Noise Effects, VAD, and ASR Methods in
Anastasiia Sapeha, Ibrahim Kovan, Subashkumar Rajanayagam, Kirill Karpov, Maksim Gering,
Automatic speech recognition (ASR) deployed in real environments is strongly affected by background noise
Speech Recognition Noise-Robust Speech Processing Voice Activity Detection Automatic Speech Recog-
References
  1. J. Ramírez, J. C. Segura, C. Benítez, A. de la Torre, and A. Rubio, “A new voice activity detector using subband order-statistics filters for robust speech recognition,” in Proc. IEEE International Conference on Acoustics, Speech, and Signal Processing (ICASSP), 2004.
  2. A. Baevski, H. Zhou, A. Mohamed, and M. Auli, “wav2vec 2.0: A framework for self-supervised learning of speech representations,” in Proc. Advances in Neural Information Processing Systems (NeurIPS), 2020.
  3. K. Kuhn, V. Kersken, B. Reuter, N. Egger, and G. Zimmermann, “Measuring the accuracy of automatic speech recognition solutions,” ACM Transactions on Accessible Computing, vol. 16, Dec. 2023.
  4. S. Braun and H. Gamper, “Effect of noise suppression losses on speech distortion and ASR performance,” in Proc. IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), pp. 996-1000, 2022.
  5. J. H. Ko, J. Fromm, M. Philipose, I. Tashev, and S. Zarar, “Limiting numerical precision of neural networks to achieve real-time voice activity detection,” in Proc. IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), pp. 2236-2240, 2018.
  6. M. R. Prasad, S. B. Gowda, M. B. Talawar, and N. Jagadisha, “Integrated noise suppression techniques for enhancing voice activity detection in degraded environments,” International Journal of Speech Technology, vol. 27, pp. 987-995, 2024.
  7. S. Tong, N. Chen, Y. Qian, and K. Yu, “Evaluating VAD for automatic speech recognition,” in Proc. 12th International Conference on Signal Processing (ICSP), pp. 2308-2314, 2014.
  8. K. Yamamoto, R. Takeda, and K. Komatani, “Analysis of voice activity detection errors in API-based streaming ASR for human-robot dialogue,” in Proc. 15th International Workshop on Spoken Dialogue Systems Technology, Bilbao, Spain: Association for Computational Linguistics, pp. 245-253, May 2025, [Online]. Available: https://aclanthology.org/2025.iwsds-1.26/.
  9. C. Arriaga, A. Pozo, J. Conde, and A. Alonso, “Assessing latency in ASR systems: A methodological perspective for real-time use,” 2025, [Online]. Available: https://arxiv.org/abs/2409.05674.
  10. A. Sapeha, E. Sariiev, M. Sapeha, I. Kovan, S. Rajanayagam, K. Karpov, M. Gering, D. Kachan, and E. Siemens, “GTA-NarrativeTraj: Language-aware trajectory prediction from GPS and dialogue in an open-world simulator,” in Proc. Int. Conf. Appl. Innov. IT, vol. 13, no. 5, pp. 193–199, doi: 10.25673/122853.
  11. J. Carletta, “Unleashing the killer corpus: Experiences in creating the multi-everything AMI meeting corpus,” Language Resources and Evaluation, vol. 41, no. 2, pp. 181-190, 2007.
  12. D. Orel and H. A. Varol, “Noise-robust automatic speech recognition for industrial and urban environments,” in Proc. IECON 2023 - 49th Annual Conference of the IEEE Industrial Electronics Society, pp. 1-6, 2023.
  13. M. Kolbæk, Z.-H. Tan, S. H. Jensen, and J. Jensen, “On loss functions for supervised monaural time-domain speech enhancement,” IEEE/ACM Transactions on Audio, Speech, and Language Processing, vol. 28, pp. 825-838, 2020.

Proceedings of the International Conference on Applied Innovations in IT by Anhalt University of Applied Sciences is licensed under CC BY-SA 4.0  ·  This work is licensed under a Creative Commons Attribution-ShareAlike 4.0 International License

ICAIIT 2026
International Conference on Applied Innovation in IT
Navigation
Publisher
ISSN2199-8876
Location Anhalt University of Applied Sciences
Phone +49 (0) 3496 67 5611
Address Building 01, Room 425
Bernburger Str. 55
D-06366 Köthen, Germany
Open Access License

All works are licensed under the Creative Commons Attribution-ShareAlike 4.0 International License (CC BY-SA 4.0), unless otherwise noted.

Published by ICAIIT in cooperation with Anhalt University of Applied Sciences.

© 2026 ICAIIT — International Conference on Applied Innovations in IT. Anhalt University of Applied Sciences, Köthen, Germany.
Visitors: site traffic counter