2025/04/26, Volume 13, Issue 1, pp.1-7

Enhancing Voice Activity Detection for an Elderly-Centric Self-Learning Conversational Robot Partner in Noisy Environments


Subashkumar Rajanayagam, Max Andreas Ingrisch, Pascal Müller, Patrick Jahn and Stefan Twieg


Abstract: Voice Activity Detection (VAD) is a root component in Human-Robot Interaction (HRI), especially for use cases such as a self-learning personalized conversational robot partner designed to support elderly users with high acceptance. While state-of-the-art, lightweight deep-learning–based VAD models achieve high precision, they often struggle with low recall in environments with significant background noise or music. In contrast, traditional lightweight rule-based VAD methods tend to yield higher recall but at the expense of precision. These limitations can negatively affect user experience, particularly among elderly individuals, by causing frustration from missed spoken inputs and reducing overall usability and acceptance of the conversational robot partners. This study investigates noise-suppressing preprocessing techniques to enhance both the recall and precision of existing VAD systems. Experimental results demonstrate that effective noise suppression prior to VAD processing substantially improves voice detection accuracy in noisy settings, ultimately promoting better interaction quality in elderly-centric robotic applications. Moreover, optimal sample rate, frame duration, thresholds and voice activity modes were identified for the robot Double3—the conversational robot partner platform for seniors in a care home, co-creatively developed by reflecting with the nursing staff. An open-source dataset and a dataset collected and annotated in-house with the Double3 robot were evaluated for robustness in benchmarks.

Keywords: Voice Activity Detection, Human Robot Interaction, Conversational Robot Partner, Elderly-Centric.

DOI: Under Indexing

Download: PDF

References:

  1. S. Yadav, P. A. D. Legaspi, M. S. O. Alink, A. B. J. Kokkeler and B. Nauta, "Hardware Implementations for Voice Activity Detection: Trends, Challenges and Outlook," in IEEE Transactions on Circuits and Systems I: Regular Papers, vol. 70, no. 3, pp. 1083-1096, March 2023, doi: 10.1109/TCSI.2022.3225717.
  2. Silero Team, "Silero Models: State-of-the-Art Speech Processing Models," GitHub repository, 2024. [Online]. Available: https://github.com/snakers4/silero-models. [Accessed: 09-Feb-2025].
  3. Google. (2011) WebRTC. https://webrtc.org/ [Online, accessed Feb 2025]
  4. Atlantis-vzw.com, ““Learning a foreign language with more ease".” Accessed: Feb. 02, 2025. [Online]. Available: https://www.atlantis-vzw.com/vreemde-talen?lang=en
  5. M. Sharma, S. Joshi, T. Chatterjee, and R. Hamid, “A comprehensive empirical review of modern voice activity detection approaches for movies and TV shows,” Neurocomputing, vol. 494, pp. 116–131, Jul. 2022, doi: 10.1016/J.NEUCOM.2022.04.084.
  6. R. M. Patil and C. M. Patil, “Unveiling the State-of-the-Art: A Comprehensive Survey on Voice Activity Detection Techniques”, doi: 10.1109/APCIT62007.2024.10673721.
  7. S. Chaudhuri, J. Roth, D. P. Ellis, A. C. Gallagher, L. Kaver, R. Marvin, C. Pantofaru, N. Reale, L. G. Reid, K. W. Wilson, and Z. Xi, “AVA-Speech: A densely labeled dataset of speech activity in movies,” arXiv preprint arXiv:1808.00606, Aug. 2018. [Online]. Available: https://arxiv.org/abs/1808.00606.
  8. X. L. Zhang and M. Xu, “AUC optimization for deep learning-based voice activity detection,” Eurasip J. Audio, Speech, Music Process., vol. 2022, no. 1, pp. 1–12, Dec. 2022, doi: 10.1186/S13636-022-00260-9/TABLES/7.
  9. T. Yoshimura, T. Hayashi, K. Takeda, and S. Watanabe, “End-to-End Automatic Speech Recognition Integrated with CTC-Based Voice Activity Detection,” ICASSP, IEEE Int. Conf. Acoust. Speech Signal Process. - Proc., vol. 2020-May, pp. 6999–7003, May 2020, doi: 10.1109/ICASSP40776.2020.9054358.
  10. F. Gu, M.-H. Chung, M. Chignell, S. Valaee, B. Zhou, and X. Liu, “A Survey on Deep Learning for Human Activity Recognition,” 2021. A Surv. Deep Learn. Hum. Act. Recognition. ACM Comput. Surv, vol. 54, no. 8, p. 177, 2021, doi: 10.1145/3472290.
  11. S. Twieg and B. Zimmermann. Acoustic clustering for vehicle based sounds, 2010.
  12. P. Müller, H. K. Gali, S. Rajanayagam, S. Twieg, P. Jahn, and S. Hofstetter. Making Complex Technologies Accessible Through Simple Controllability: Initial Results of a Feasibility Study. Applied Sciences, 15(2), 1002, 2024 [Online]. Available: https://doi.org/10.3390/app15021002.
  13. Audacity(R) software is copyright (c) 1999-2014 Audacity Team. [Website: http://audacity.sourceforge.net/. It is free software distributed under the terms of the GNU General Public License.] The name Audacity(R) is a registered trademark of Dominic Mazzoni.
  14. H. Schröter, T. Rosenkranz, A. Escalante-B., and A. Maier, "DeepFilterNet: Perceptually motivated real-time speech enhancement," in Proc. 17th Int. Workshop Acoustic Signal Enhancement (IWAENC), 2022.


    HOME

       - Call for Papers
       - Paper Submission
       - For Authors
       - For Reviewers
       - Important Dates
       - Conference Committee
       - Editorial Board
       - Reviewers
       - Last Proceedings


    PROCEEDINGS

       - Volume 13, Issue 1 (ICAIIT 2025)        - Volume 12, Issue 2 (ICAIIT 2024)        - Volume 12, Issue 1 (ICAIIT 2024)        - Volume 11, Issue 2 (ICAIIT 2023)
       - Volume 11, Issue 1 (ICAIIT 2023)
       - Volume 10, Issue 1 (ICAIIT 2022)
       - Volume 9, Issue 1 (ICAIIT 2021)
       - Volume 8, Issue 1 (ICAIIT 2020)
       - Volume 7, Issue 1 (ICAIIT 2019)
       - Volume 7, Issue 2 (ICAIIT 2019)
       - Volume 6, Issue 1 (ICAIIT 2018)
       - Volume 5, Issue 1 (ICAIIT 2017)
       - Volume 4, Issue 1 (ICAIIT 2016)
       - Volume 3, Issue 1 (ICAIIT 2015)
       - Volume 2, Issue 1 (ICAIIT 2014)
       - Volume 1, Issue 1 (ICAIIT 2013)


    PAST CONFERENCES

       ICAIIT 2025
         - Photos
         - Reports

       ICAIIT 2024
         - Photos
         - Reports

       ICAIIT 2023
         - Photos
         - Reports

       ICAIIT 2021
         - Photos
         - Reports

       ICAIIT 2020
         - Photos
         - Reports

       ICAIIT 2019
         - Photos
         - Reports

       ICAIIT 2018
         - Photos
         - Reports

    ETHICS IN PUBLICATIONS

    ACCOMODATION

    CONTACT US

 

        

         Proceedings of the International Conference on Applied Innovations in IT by Anhalt University of Applied Sciences is licensed under CC BY-SA 4.0


                                                   This work is licensed under a Creative Commons Attribution-ShareAlike 4.0 International License


           ISSN 2199-8876
           Publisher: Edition Hochschule Anhalt
           Location: Anhalt University of Applied Sciences
           Email: leiterin.hsb@hs-anhalt.de
           Phone: +49 (0) 3496 67 5611
           Address: Building 01 - Red Building, Top floor, Room 425, Bernburger Str. 55, D-06366 Köthen, Germany

        site traffic counter

Creative Commons License
Except where otherwise noted, all works and proceedings on this site is licensed under Creative Commons Attribution-ShareAlike 4.0 International License.