Token-level Sequence Labeling for Spoken Language Understanding using   Compositional End-to-End Models

Siddhant Arora; Siddharth Dalmia; Brian Yan; Florian Metze; Alan W; Black; Shinji Watanabe

arXiv:2210.15734·cs.CL·October 31, 2022

Token-level Sequence Labeling for Spoken Language Understanding using Compositional End-to-End Models

Siddhant Arora, Siddharth Dalmia, Brian Yan, Florian Metze, Alan W, Black, Shinji Watanabe

PDF

Open Access 1 Repo

TL;DR

This paper introduces compositional end-to-end spoken language understanding systems that explicitly separate speech recognition from language understanding, improving performance and compatibility with existing models.

Contribution

The authors propose a modular end-to-end SLU framework that integrates intermediate ASR decoders, enabling token-level sequence labeling and better performance.

Findings

01

Outperforms cascaded and direct end-to-end models on NER tasks

02

Allows use of pre-trained ASR and NLU components

03

Enables performance monitoring of individual modules

Abstract

End-to-end spoken language understanding (SLU) systems are gaining popularity over cascaded approaches due to their simplicity and ability to avoid error propagation. However, these systems model sequence labeling as a sequence prediction task causing a divergence from its well-established token-level tagging formulation. We build compositional end-to-end SLU systems that explicitly separate the added complexity of recognizing spoken mentions in SLU from the NLU task of sequence labeling. By relying on intermediate decoders trained for ASR, our end-to-end systems transform the input modality from speech to token-level representations that can be used in the traditional sequence labeling framework. This composition of ASR and NLU formulations in our end-to-end SLU system offers direct compatibility with pre-trained ASR and NLU systems, allows performance monitoring of individual…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

espnet/espnet
pytorchOfficial

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsNatural Language Processing Techniques · Topic Modeling · Speech and dialogue systems

MethodsConditional Random Field