Beyond Prompting: Efficient and Robust Contextual Biasing for Speech LLMs via Logit-Space Integration (LOGIC)
Peidong Wang

TL;DR
This paper introduces LOGIC, a novel logit-space integration method for speech LLMs that efficiently and robustly incorporates new entities during decoding, outperforming prompting and post-processing approaches in multilingual settings.
Contribution
LOGIC provides a scalable, decoding-layer approach to bias speech LLMs with new entities, overcoming prompt limitations and reducing errors without increasing inference time.
Findings
9% relative reduction in Entity WER
Negligible 0.30% increase in False Alarm Rate
Effective across 11 multilingual locales
Abstract
The rapid emergence of new entities -- driven by cultural shifts, evolving trends, and personalized user data -- poses a significant challenge for existing Speech Large Language Models (Speech LLMs). While these models excel at general conversational tasks, their static training knowledge limits their ability to recognize domain-specific terms such as contact names, playlists, or technical jargon. Existing solutions primarily rely on prompting, which suffers from poor scalability: as the entity list grows, prompting encounters context window limitations, increased inference latency, and the "lost-in-the-middle" phenomenon. An alternative approach, Generative Error Correction (GEC), attempts to rewrite transcripts via post-processing but frequently suffers from "over-correction", introducing hallucinations of entities that were never spoken. In this work, we introduce LOGIC…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsTopic Modeling · Speech Recognition and Synthesis · Natural Language Processing Techniques
