WavLLM: Towards Robust and Adaptive Speech Large Language Model
Shujie Hu, Long Zhou, Shujie Liu, Sanyuan Chen, Lingwei Meng, Hongkun, Hao, Jing Pan, Xunying Liu, Jinyu Li, Sunit Sivasankaran, Linquan Liu, Furu, Wei

TL;DR
WavLLM is a novel speech large language model that integrates dual encoders and curriculum learning to achieve state-of-the-art performance across diverse speech tasks, demonstrating robustness and adaptability without task-specific training.
Contribution
The paper introduces WavLLM, a robust and adaptive speech LLM with dual encoders and a prompt-aware LoRA adapter, trained via curriculum learning for broad speech task generalization.
Findings
Achieves state-of-the-art results on multiple speech benchmarks.
Successfully completes Gaokao English listening tasks without specialized training.
Demonstrates robust generalization and complex task execution capabilities.
Abstract
The recent advancements in large language models (LLMs) have revolutionized the field of natural language processing, progressively broadening their scope to multimodal perception and generation. However, effectively integrating listening capabilities into LLMs poses significant challenges, particularly with respect to generalizing across varied contexts and executing complex auditory tasks. In this work, we introduce WavLLM, a robust and adaptive speech large language model with dual encoders, and a prompt-aware LoRA weight adapter, optimized by a two-stage curriculum learning approach. Leveraging dual encoders, we decouple different types of speech information, utilizing a Whisper encoder to process the semantic content of speech, and a WavLM encoder to capture the unique characteristics of the speaker's identity. Within the curriculum learning framework, WavLLM first builds its…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsNatural Language Processing Techniques · Speech Recognition and Synthesis · Topic Modeling
MethodsSparse Evolutionary Training · Adapter
