Adapter-Based Multi-Agent AVSR Extension for Pre-Trained ASR Models
Christopher Simic, Korbinian Riedhammer, Tobias Bocklet

TL;DR
This paper introduces an adapter-based extension for pre-trained audio-visual speech recognition models, enabling efficient noise-specific adaptation with minimal parameter training while maintaining high performance.
Contribution
The work presents a novel adapter-based AVSR extension that allows noise-specific adaptation, reducing training parameters significantly compared to full fine-tuning, and leverages pre-trained models effectively.
Findings
Achieves up to 88.5% reduction in trainable parameters.
Maintains near state-of-the-art performance across various noise scenarios.
Supports extension with additional noise-specific adapters.
Abstract
We present an approach to Audio-Visual Speech Recognition that builds on a pre-trained Whisper model. To infuse visual information into this audio-only model, we extend it with an AV fusion module and LoRa adapters, one of the most up-to-date adapter approaches. One advantage of adapter-based approaches, is that only a relatively small number of parameters are trained, while the basic model remains unchanged. Common AVSR approaches train single models to handle several noise categories and noise levels simultaneously. Taking advantage of the lightweight nature of adapter approaches, we train noise-scenario-specific adapter-sets, each covering individual noise-categories or a specific noise-level range. The most suitable adapter-set is selected by previously classifying the noise-scenario. This enables our models to achieve an optimum coverage across different noise-categories and…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsFault Detection and Control Systems · Speech Recognition and Synthesis · Advanced Data Processing Techniques
MethodsAdapter
