Context-Driven Dynamic Pruning for Large Speech Foundation Models

Masao Someki; Shikhar Bharadwaj; Atharva Anand Joshi; Chyi-Jiunn Lin; Jinchuan Tian; Jee-weon Jung; Markus M\"uller; Nathan Susanj; Jing Liu; Shinji Watanabe

arXiv:2505.18860·eess.AS·May 27, 2025·Interspeech

Context-Driven Dynamic Pruning for Large Speech Foundation Models

Masao Someki, Shikhar Bharadwaj, Atharva Anand Joshi, Chyi-Jiunn Lin, Jinchuan Tian, Jee-weon Jung, Markus M\"uller, Nathan Susanj, Jing Liu, Shinji Watanabe

PDF

Open Access

TL;DR

This paper introduces a context-driven dynamic pruning method for large speech models that reduces computational cost and improves accuracy by leveraging external context such as speaker and acoustic information during inference.

Contribution

It extends existing pruning techniques by incorporating diverse contextual information to optimize model computation dynamically in speech recognition tasks.

Findings

01

Achieves 56.7 GFLOPs reduction in computation.

02

Improves BLEU scores by 25.7% relative.

03

Demonstrates effective use of speaker and acoustic context.

Abstract

Speech foundation models achieve strong generalization across languages and acoustic conditions, but require significant computational resources for inference. In the context of speech foundation models, pruning techniques have been studied that dynamically optimize model structures based on the target audio leveraging external context. In this work, we extend this line of research and propose context-driven dynamic pruning, a technique that optimizes the model computation depending on the context between different input frames and additional context during inference. We employ the Open Whisper-style Speech Model (OWSM) and incorporate speaker embeddings, acoustic event embeddings, and language information as additional context. By incorporating the speaker embedding, our method achieves a reduction of 56.7 GFLOPs while improving BLEU scores by a relative 25.7% compared to the fully…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsSpeech and dialogue systems · Speech Recognition and Synthesis · Multi-Agent Systems and Negotiation

MethodsPruning