Integration of Pre-trained Networks with Continuous Token Interface for   End-to-End Spoken Language Understanding

Seunghyun Seo; Donghyun Kwak; Bowon Lee

arXiv:2104.07253·cs.CL·February 18, 2022·5 cites

Integration of Pre-trained Networks with Continuous Token Interface for End-to-End Spoken Language Understanding

Seunghyun Seo, Donghyun Kwak, Bowon Lee

PDF

Open Access

TL;DR

This paper introduces a simple, robust integration method called Continuous Token Interface (CTI) for end-to-end spoken language understanding, enabling effective utilization of pre-trained ASR and NLU networks without extra modules.

Contribution

The paper proposes a novel CTI method that seamlessly combines pre-trained ASR and NLU networks for end-to-end SLU, achieving state-of-the-art results without additional complex modules.

Findings

01

Achieved state-of-the-art scores on SLURP dataset for intent classification and slot filling.

02

Demonstrated effective use of pre-trained NLU with noisy textual representations.

03

Enabled multi-task learning with heterogeneous data after CTI integration.

Abstract

Most End-to-End (E2E) SLU networks leverage the pre-trained ASR networks but still lack the capability to understand the semantics of utterances, crucial for the SLU task. To solve this, recently proposed studies use pre-trained NLU networks. However, it is not trivial to fully utilize both pre-trained networks; many solutions were proposed, such as Knowledge Distillation, cross-modal shared embedding, and network integration with Interface. We propose a simple and robust integration method for the E2E SLU network with novel Interface, Continuous Token Interface (CTI), the junctional representation of the ASR and NLU networks when both networks are pre-trained with the same vocabulary. Because the only difference is the noise level, we directly feed the ASR network's output to the NLU network. Thus, we can train our SLU network in an E2E manner without additional modules, such as…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsTopic Modeling · Natural Language Processing Techniques · Multimodal Machine Learning Applications

MethodsKnowledge Distillation