MoWE-Audio: Multitask AudioLLMs with Mixture of Weak Encoders
Wenyu Zhang, Shuo Sun, Bin Wang, Xunlong Zou, Zhuohan Liu, Yingxu He,, Geyu Lin, Nancy F. Chen, Ai Ti Aw

TL;DR
This paper introduces MoWE-Audio, a multitask AudioLLM framework that uses a mixture of lightweight encoders to improve feature extraction and performance across diverse audio tasks.
Contribution
It proposes integrating mixtures of weak encoders into AudioLLMs, enhancing their capacity to handle new tasks without significantly increasing model size.
Findings
MoWE-Audio improves multi-task performance.
The mixture of encoders broadens task applicability.
Enhanced feature extraction with lightweight encoders.
Abstract
The rapid advancements in large language models (LLMs) have significantly enhanced natural language processing capabilities, facilitating the development of AudioLLMs that process and understand speech and audio inputs alongside text. Existing AudioLLMs typically combine a pre-trained audio encoder with a pre-trained LLM, which are subsequently finetuned on specific audio tasks. However, the pre-trained audio encoder has constrained capacity to capture features for new tasks and datasets. To address this, we propose to incorporate mixtures of `weak' encoders (MoWE) into the AudioLLM framework. MoWE supplements a base encoder with a pool of relatively light weight encoders, selectively activated based on the audio input to enhance feature extraction without significantly increasing model size. Our empirical results demonstrate that MoWE effectively improves multi-task performance,…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
- 🤗MERaLiON/MERaLiON-3-10B-previewmodel· 322 dl· ♡ 1322 dl♡ 1
- 🤗MERaLiON/MERaLiON-2-10Bmodel· 711 dl· ♡ 11711 dl♡ 11
- 🤗MERaLiON/MERaLiON-2-3Bmodel· 2.6k dl· ♡ 52.6k dl♡ 5
- 🤗MERaLiON/MERaLiON-2-10B-ASRmodel· 1.4k dl· ♡ 101.4k dl♡ 10
- 🤗lewiswoncy/m_test_9model· 42 dl42 dl
- 🤗lewiswoncy/m_test_9_11model· 2 dl2 dl
- 🤗MERaLiON/MERaLiON-2-3B-MLXmodel· 8 dl8 dl
- 🤗MERaLiON/MERaLiON-2-10B-MLXmodel· 12 dl12 dl
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsSpeech and Audio Processing · Music and Audio Processing · Advanced Adaptive Filtering Techniques
MethodsBalanced Selection
