BLSP-KD: Bootstrapping Language-Speech Pre-training via Knowledge Distillation
Chen Wang, Minpeng Liao, Zhongqiang Huang, Jiajun Zhang

TL;DR
BLSP-KD introduces a novel method for aligning speech and text in large language models using knowledge distillation and a segmentation strategy, enabling better speech-to-text alignment and instruction-following capabilities.
Contribution
It proposes a new pretraining approach with techniques for fine-grained speech-text alignment and a novel finetuning method supporting speech inputs.
Findings
Outperforms previous end-to-end and cascaded systems in speech-text alignment.
Enables LLMs to better follow instructions with speech inputs.
Facilitates extension of LLMs to spoken language interactions.
Abstract
Recent end-to-end approaches have shown promise in extending large language models (LLMs) to speech inputs, but face limitations in directly assessing and optimizing alignment quality and fail to achieve fine-grained alignment due to speech-text length mismatch. We introduce BLSP-KD, a novel approach for Bootstrapping Language-Speech Pretraining via Knowledge Distillation, which addresses these limitations through two key techniques. First, it optimizes speech-text alignment by minimizing the divergence between the LLM's next-token prediction distributions for speech and text inputs using knowledge distillation. Second, it employs a continuous-integrate-andfire strategy to segment speech into tokens that correspond one-to-one with text tokens, enabling fine-grained alignment. We also introduce Partial LoRA (PLoRA), a new adaptation method supporting LLM finetuning for speech inputs…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsNatural Language Processing Techniques · Speech Recognition and Synthesis · Topic Modeling
MethodsKnowledge Distillation
