Lattention: Lattice-attention in ASR rescoring
Prabhat Pandey, Sergio Duarte Torres, Ali Orkan Bayer, Ankur Gandhe,, Volker Leutnant

TL;DR
This paper introduces Lattention, a lattice-attention model for second-pass speech recognition rescoring, which leverages lattice structures to improve word error rates over traditional n-best rescoring methods.
Contribution
The paper proposes a novel lattice-attention encoder-decoder model for n-best rescoring that outperforms previous approaches by effectively utilizing lattice cues.
Findings
Achieves 4-5% relative WER reduction over first-pass
Attending to both lattices and acoustic features yields 6-8% WER reduction
Incorporating lattice weights significantly improves rescoring performance
Abstract
Lattices form a compact representation of multiple hypotheses generated from an automatic speech recognition system and have been shown to improve performance of downstream tasks like spoken language understanding and speech translation, compared to using one-best hypothesis. In this work, we look into the effectiveness of lattice cues for rescoring n-best lists in second-pass. We encode lattices with a recurrent network and train an attention encoder-decoder model for n-best rescoring. The rescoring model with attention to lattices achieves 4-5% relative word error rate reduction over first-pass and 6-8% with attention to both lattices and acoustic features. We show that rescoring models with attention to lattices outperform models with attention to n-best hypotheses. We also study different ways to incorporate lattice weights in the lattice encoder and demonstrate their importance for…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
