SparseVSR: Lightweight and Noise Robust Visual Speech Recognition
Adriana Fernandez-Lopez, Honglie Chen, Pingchuan Ma, Alexandros, Haliassos, Stavros Petridis, Maja Pantic

TL;DR
SparseVSR introduces a lightweight, noise-robust visual speech recognition model using magnitude-based pruning, achieving state-of-the-art results and superior noise resistance compared to dense models.
Contribution
The paper presents a novel sparse model for visual speech recognition that outperforms dense models in noisy conditions and maintains high accuracy at high sparsity levels.
Findings
Sparse models outperform dense models at high sparsity levels.
Sparse networks show increased resistance to visual noise.
Achieved state-of-the-art results at 10% sparsity on LRS3.
Abstract
Recent advances in deep neural networks have achieved unprecedented success in visual speech recognition. However, there remains substantial disparity between current methods and their deployment in resource-constrained devices. In this work, we explore different magnitude-based pruning techniques to generate a lightweight model that achieves higher performance than its dense model equivalent, especially under the presence of visual noise. Our sparse models achieve state-of-the-art results at 10% sparsity on the LRS3 dataset and outperform the dense equivalent up to 70% sparsity. We evaluate our 50% sparse model on 7 different visual noise types and achieve an overall absolute improvement of more than 2% WER compared to the dense equivalent. Our results confirm that sparse networks are more resistant to noise than dense networks.
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsSpeech and Audio Processing · Indoor and Outdoor Localization Technologies · Music and Audio Processing
MethodsPruning
