End2End Acoustic to Semantic Transduction

Valentin Pelloin; Nathalie Camelin; Antoine Laurent; Renato De Mori,; Antoine Caubri\`ere; Yannick Est\`eve; Sylvain Meignier

arXiv:2102.01013·cs.CL·May 20, 2021

End2End Acoustic to Semantic Transduction

Valentin Pelloin, Nathalie Camelin, Antoine Laurent, Renato De Mori,, Antoine Caubri\`ere, Yannick Est\`eve, Sylvain Meignier

PDF

TL;DR

This paper introduces an end-to-end acoustic-to-semantic transduction model using attention mechanisms, achieving state-of-the-art results on the French MEDIA corpus for spoken language understanding.

Contribution

It presents a novel sequence-to-sequence model with attention for acoustic to semantic transduction, improving accuracy without additional context.

Findings

01

Achieved 13.6% CER and 18.5% CVER with the initial model.

02

Reduced concept error rate by 2.8 points over previous state-of-the-art.

03

Proposed a new model for hypothesizing concepts and values, reaching 15.4% CER and 21.6% CVER.

Abstract

In this paper, we propose a novel end-to-end sequence-to-sequence spoken language understanding model using an attention mechanism. It reliably selects contextual acoustic features in order to hypothesize semantic contents. An initial architecture capable of extracting all pronounced words and concepts from acoustic spans is designed and tested. With a shallow fusion language model, this system reaches a 13.6 concept error rate (CER) and an 18.5 concept value error rate (CVER) on the French MEDIA corpus, achieving an absolute 2.8 points reduction compared to the state-of-the-art. Then, an original model is proposed for hypothesizing concepts and their values. This transduction reaches a 15.4 CER and a 21.6 CVER without any new type of context.

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.