ProGRes: Prompted Generative Rescoring on ASR n-Best
Ada Defne Tur, Adel Moumen, Mirco Ravanelli

TL;DR
This paper introduces ProGRes, a novel zero-shot rescoring method for speech recognition that leverages instruction-tuned LLMs to generate and score hypotheses, significantly improving WER across various recognizers.
Contribution
It presents a new approach combining prompt-based hypothesis generation with LLM scoring for zero-shot ASR rescoring, enhancing speech recognition accuracy.
Findings
Achieved 5% to 25% relative WER reduction
Compared multiple LLMs including GPT-4 Turbo and Llama-3-Instruct
Demonstrated effectiveness across different speech recognizers
Abstract
Large Language Models (LLMs) have shown their ability to improve the performance of speech recognizers by effectively rescoring the n-best hypotheses generated during the beam search process. However, the best way to exploit recent generative instruction-tuned LLMs for hypothesis rescoring is still unclear. This paper proposes a novel method that uses instruction-tuned LLMs to dynamically expand the n-best speech recognition hypotheses with new hypotheses generated through appropriately-prompted LLMs. Specifically, we introduce a new zero-shot method for ASR n-best rescoring, which combines confidence scores, LLM sequence scoring, and prompt-based hypothesis generation. We compare Llama-3-Instruct, GPT-3.5 Turbo, and GPT-4 Turbo as prompt-based generators with Llama-3 as sequence scorer LLM. We evaluated our approach using different speech recognizers and observed significant relative…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsNatural Language Processing Techniques · Topic Modeling · Machine Learning and Algorithms
MethodsRefunds@Expedia|||How do I get a full refund from Expedia? · 15 Ways to Contact How can i speak to someone at Delta Airlines · Attention Is All You Need · Cosine Annealing · Absolute Position Encodings · Label Smoothing · Position-Wise Feed-Forward Layer · Residual Connection · Linear Warmup With Cosine Annealing · Transformer
