# SincQDR-VAD: A Noise-Robust Voice Activity Detection Framework Leveraging Learnable Filters and Ranking-Aware Optimization

**Authors:** Chien-Chun Wang, En-Lun Yu, Jeih-Weih Hung, Shih-Chieh Huang, Berlin Chen

arXiv: 2508.20885 · 2025-08-29

## TL;DR

SincQDR-VAD introduces a noise-robust voice activity detection framework that leverages learnable filters and ranking optimization to improve accuracy and efficiency in noisy environments.

## Contribution

The paper proposes a novel VAD framework combining a learnable Sinc-extractor and a quadratic disparity ranking loss, enhancing robustness and reducing parameter count.

## Key findings

- Significantly improves AUROC and F2-Score on benchmark datasets.
- Uses only 69% of parameters compared to previous methods.
- Demonstrates robustness in noisy and resource-limited environments.

## Abstract

Voice activity detection (VAD) is essential for speech-driven applications, but remains far from perfect in noisy and resource-limited environments. Existing methods often lack robustness to noise, and their frame-wise classification losses are only loosely coupled with the evaluation metric of VAD. To address these challenges, we propose SincQDR-VAD, a compact and robust framework that combines a Sinc-extractor front-end with a novel quadratic disparity ranking loss. The Sinc-extractor uses learnable bandpass filters to capture noise-resistant spectral features, while the ranking loss optimizes the pairwise score order between speech and non-speech frames to improve the area under the receiver operating characteristic curve (AUROC). A series of experiments conducted on representative benchmark datasets show that our framework considerably improves both AUROC and F2-Score, while using only 69% of the parameters compared to prior arts, confirming its efficiency and practical viability.

## Full text

_Full body text omitted from this summary view._ Fetch the complete paper as Markdown: https://tomesphere.com/paper/2508.20885/full.md

## Figures

4 figures with captions in the complete paper: https://tomesphere.com/paper/2508.20885/full.md

## References

59 references — full list in the complete paper: https://tomesphere.com/paper/2508.20885/full.md

---
Source: https://tomesphere.com/paper/2508.20885