FastAdaSP: Multitask-Adapted Efficient Inference for Large Speech   Language Model

Yichen Lu; Jiaqi Song; Chao-Han Huck Yang; Shinji Watanabe

arXiv:2410.03007·eess.AS·October 7, 2024

FastAdaSP: Multitask-Adapted Efficient Inference for Large Speech Language Model

Yichen Lu, Jiaqi Song, Chao-Han Huck Yang, Shinji Watanabe

PDF

Open Access 1 Repo 1 Video

TL;DR

FastAdaSP introduces a token merging framework tailored for speech language models, significantly enhancing inference efficiency while maintaining performance across various speech tasks.

Contribution

It presents a novel weighted token merging method specifically designed for speech models, addressing the unique temporal dependencies of speech data.

Findings

01

Achieved 7x memory efficiency and 1.83x decoding throughput improvements.

02

Maintained performance on Emotion Recognition and Spoken Question Answering tasks.

03

Outperformed baseline methods in efficiency-performance trade-off.

Abstract

In this study, we aim to explore Multitask Speech Language Model (SpeechLM) efficient inference via token reduction. Unlike other modalities such as vision or text, speech has unique temporal dependencies, making previous efficient inference works on other modalities not directly applicable. Furthermore, methods for efficient SpeechLM inference on long sequence and sparse signals remain largely unexplored. Then we propose FastAdaSP, a weighted token merging framework specifically designed for various speech-related tasks to improve the trade-off between efficiency and performance. Experimental results on WavLLM and Qwen-Audio show that our method achieves the state-of-the-art (SOTA) efficiency-performance trade-off compared with other baseline methods. Specifically, FastAdaSP achieved 7x memory efficiency and 1.83x decoding throughput without any degradation on tasks like Emotion…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

yichen14/fastadasp
pytorchOfficial

Videos

FastAdaSP: Multitask-Adapted Efficient Inference for Large Speech Language Model· underline

Taxonomy

TopicsSpeech Recognition and Synthesis · Natural Language Processing Techniques · Topic Modeling