Lattention: Lattice-attention in ASR rescoring

Prabhat Pandey; Sergio Duarte Torres; Ali Orkan Bayer; Ankur Gandhe,; Volker Leutnant

arXiv:2111.10157·cs.CL·November 22, 2021

Lattention: Lattice-attention in ASR rescoring

Prabhat Pandey, Sergio Duarte Torres, Ali Orkan Bayer, Ankur Gandhe,, Volker Leutnant

PDF

TL;DR

This paper introduces Lattention, a lattice-attention model for second-pass speech recognition rescoring, which leverages lattice structures to improve word error rates over traditional n-best rescoring methods.

Contribution

The paper proposes a novel lattice-attention encoder-decoder model for n-best rescoring that outperforms previous approaches by effectively utilizing lattice cues.

Findings

01

Achieves 4-5% relative WER reduction over first-pass

02

Attending to both lattices and acoustic features yields 6-8% WER reduction

03

Incorporating lattice weights significantly improves rescoring performance

Abstract

Lattices form a compact representation of multiple hypotheses generated from an automatic speech recognition system and have been shown to improve performance of downstream tasks like spoken language understanding and speech translation, compared to using one-best hypothesis. In this work, we look into the effectiveness of lattice cues for rescoring n-best lists in second-pass. We encode lattices with a recurrent network and train an attention encoder-decoder model for n-best rescoring. The rescoring model with attention to lattices achieves 4-5% relative word error rate reduction over first-pass and 6-8% with attention to both lattices and acoustic features. We show that rescoring models with attention to lattices outperform models with attention to n-best hypotheses. We also study different ways to incorporate lattice weights in the lattice encoder and demonstrate their importance for…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.