RespVAD: Voice Activity Detection via Video-Extracted Respiration Patterns
Arnab Kumar Mondal, Prathosh A.P

TL;DR
This paper introduces RespVAD, a novel voice activity detection method that leverages respiration patterns extracted from video to improve detection accuracy in noisy environments, independent of lip or mouth region analysis.
Contribution
The paper presents a new respiration-based VAD approach using optical flow and neural models, offering an alternative to lip-based visual methods.
Findings
Outperforms previous audio-visual VAD methods in noisy conditions
Effective respiration pattern extraction from abdominal-thoracic video regions
Neural sequence-to-sequence models accurately detect speech activity
Abstract
Voice Activity Detection (VAD) refers to the task of identification of regions of human speech in digital signals such as audio and video. While VAD is a necessary first step in many speech processing systems, it poses challenges when there are high levels of ambient noise during the audio recording. To improve the performance of VAD in such conditions, several methods utilizing the visual information extracted from the region surrounding the mouth/lip region of the speakers' video recording have been proposed. Even though these provide advantages over audio-only methods, they depend on faithful extraction of lip/mouth regions. Motivated by these, a new paradigm for VAD based on the fact that respiration forms the primary source of energy for speech production is proposed. Specifically, an audio-independent VAD technique using the respiration pattern extracted from the speakers' video…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsSpeech and Audio Processing · Music and Audio Processing · Blind Source Separation Techniques
