Towards end-to-end spoken language understanding

Dmitriy Serdyuk; Yongqiang Wang; Christian Fuegen; Anuj Kumar; and Baiyang Liu; Yoshua Bengio

arXiv:1802.08395·cs.CL·February 26, 2018

Towards end-to-end spoken language understanding

Dmitriy Serdyuk, Yongqiang Wang, Christian Fuegen, Anuj Kumar, and Baiyang Liu, Yoshua Bengio

PDF

1 Repo

TL;DR

This paper explores an end-to-end approach for spoken language understanding that directly infers semantic meaning from audio features, bypassing traditional intermediate transcription steps, and demonstrates promising results.

Contribution

It introduces a unified end-to-end model for spoken language understanding that captures semantics directly from audio, improving integration and potentially reducing error propagation.

Findings

01

Achieved reasonable accuracy in semantic inference from audio

02

Model captures semantic attention directly from audio features

03

Demonstrated feasibility of end-to-end spoken language understanding

Abstract

Spoken language understanding system is traditionally designed as a pipeline of a number of components. First, the audio signal is processed by an automatic speech recognizer for transcription or n-best hypotheses. With the recognition results, a natural language understanding system classifies the text to structured data as domain, intent and slots for down-streaming consumers, such as dialog system, hands-free applications. These components are usually developed and optimized independently. In this paper, we present our study on an end-to-end learning system for spoken language understanding. With this unified approach, we can infer the semantic meaning directly from audio features without the intermediate text representation. This study showed that the trained model can achieve reasonable good result and demonstrated that the model can capture the semantic attention directly from the…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

dmitriy-serdyuk/arxiv2kindle
none

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.