Enhance Incomplete Utterance Restoration by Joint Learning Token Extraction and Text Generation
Shumpei Inoue, Tsungwei Liu, Nguyen Hong Son, Minh-Tien Nguyen

TL;DR
This paper presents JET, a joint learning model for incomplete utterance restoration that effectively identifies omitted tokens and generates restored text, outperforming existing methods across multiple datasets.
Contribution
Introduces a novel joint learning approach combining token extraction and text generation for IUR, applicable to both extraction and abstraction scenarios.
Findings
Outperforms pretrained T5 and non-generative models on benchmark datasets.
Effective in both rich and limited training data settings.
Uses label creation methods that do not require annotation data.
Abstract
This paper introduces a model for incomplete utterance restoration (IUR) called JET (\textbf{J}oint learning token \textbf{E}xtraction and \textbf{T}ext generation). Different from prior studies that only work on extraction or abstraction datasets, we design a simple but effective model, working for both scenarios of IUR. Our design simulates the nature of IUR, where omitted tokens from the context contribute to restoration. From this, we construct a Picker that identifies the omitted tokens. To support the picker, we design two label creation methods (soft and hard labels), which can work in cases of no annotation data for the omitted tokens. The restoration is done by using a Generator with the help of the Picker on joint learning. Promising results on four benchmark datasets in extraction and abstraction scenarios show that our model is better than the pretrained T5 and…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsSpeech Recognition and Synthesis · Natural Language Processing Techniques · Topic Modeling
MethodsGated Linear Unit · Attention Is All You Need · Linear Layer · Byte Pair Encoding · Multi-Head Attention · SentencePiece · Dropout · Layer Normalization · Adafactor · Inverse Square Root Schedule
