A Fast and Lightweight Model for Causal Audio-Visual Speech Separation
Wendi Sang, Kai Li, Runxuan Yang, Jianqiang Huang, Xiaolin Hu

TL;DR
Swift-Net is a novel, lightweight, causal audio-visual speech separation model designed for real-time applications, effectively integrating visual cues and historical information to improve speech separation performance in complex environments.
Contribution
The paper introduces Swift-Net, a causal, lightweight AVSS model with a new fusion module and Grouped SRUs, enabling real-time speech separation with improved efficiency and performance.
Findings
Outperforms existing models on benchmark datasets
Operates effectively in real-time scenarios
Demonstrates robustness in complex environments
Abstract
Audio-visual speech separation (AVSS) aims to extract a target speech signal from a mixed signal by leveraging both auditory and visual (lip movement) cues. However, most existing AVSS methods exhibit complex architectures and rely on future context, operating offline, which renders them unsuitable for real-time applications. Inspired by the pipeline of RTFSNet, we propose a novel streaming AVSS model, named Swift-Net, which enhances the causal processing capabilities required for real-time applications. Swift-Net adopts a lightweight visual feature extraction module and an efficient fusion module for audio-visual integration. Additionally, Swift-Net employs Grouped SRUs to integrate historical information across different feature spaces, thereby improving the utilization efficiency of historical information. We further propose a causal transformation template to facilitate the…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsSpeech and Audio Processing · Hearing Loss and Rehabilitation · Advanced Adaptive Filtering Techniques
