Integration of Pre-trained Networks with Continuous Token Interface for End-to-End Spoken Language Understanding
Seunghyun Seo, Donghyun Kwak, Bowon Lee

TL;DR
This paper introduces a simple, robust integration method called Continuous Token Interface (CTI) for end-to-end spoken language understanding, enabling effective utilization of pre-trained ASR and NLU networks without extra modules.
Contribution
The paper proposes a novel CTI method that seamlessly combines pre-trained ASR and NLU networks for end-to-end SLU, achieving state-of-the-art results without additional complex modules.
Findings
Achieved state-of-the-art scores on SLURP dataset for intent classification and slot filling.
Demonstrated effective use of pre-trained NLU with noisy textual representations.
Enabled multi-task learning with heterogeneous data after CTI integration.
Abstract
Most End-to-End (E2E) SLU networks leverage the pre-trained ASR networks but still lack the capability to understand the semantics of utterances, crucial for the SLU task. To solve this, recently proposed studies use pre-trained NLU networks. However, it is not trivial to fully utilize both pre-trained networks; many solutions were proposed, such as Knowledge Distillation, cross-modal shared embedding, and network integration with Interface. We propose a simple and robust integration method for the E2E SLU network with novel Interface, Continuous Token Interface (CTI), the junctional representation of the ASR and NLU networks when both networks are pre-trained with the same vocabulary. Because the only difference is the noise level, we directly feed the ASR network's output to the NLU network. Thus, we can train our SLU network in an E2E manner without additional modules, such as…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsTopic Modeling · Natural Language Processing Techniques · Multimodal Machine Learning Applications
MethodsKnowledge Distillation
