Jointly Improving Language Understanding and Generation with   Quality-Weighted Weak Supervision of Automatic Labeling

Ernie Chang; Vera Demberg; Alex Marin

arXiv:2102.03551·cs.CL·February 9, 2021

Jointly Improving Language Understanding and Generation with Quality-Weighted Weak Supervision of Automatic Labeling

Ernie Chang, Vera Demberg, Alex Marin

PDF

Open Access

TL;DR

This paper introduces a semi-supervised framework that jointly improves language understanding and generation by using quality-weighted weak supervision from automatically labeled data, enhancing performance especially in low-resource settings.

Contribution

It proposes a novel semi-supervised training method that adapts updates based on label quality, leveraging large-scale weakly-labeled data generated by GPT-2.

Findings

01

Outperforms benchmark systems on E2E and Weather datasets.

02

Effective in low-resource scenarios.

03

Achieves state-of-the-art results with full data.

Abstract

Neural natural language generation (NLG) and understanding (NLU) models are data-hungry and require massive amounts of annotated data to be competitive. Recent frameworks address this bottleneck with generative models that synthesize weak labels at scale, where a small amount of training labels are expert-curated and the rest of the data is automatically annotated. We follow that approach, by automatically constructing a large-scale weakly-labeled data with a fine-tuned GPT-2, and employ a semi-supervised framework to jointly train the NLG and NLU models. The proposed framework adapts the parameter updates to the models according to the estimated label-quality. On both the E2E and Weather benchmarks, we show that this weakly supervised training paradigm is an effective approach under low resource scenarios and outperforming benchmark systems on both datasets when 100% of training data…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsNatural Language Processing Techniques · Topic Modeling · Multimodal Machine Learning Applications

MethodsLinear Layer · Cosine Annealing · Layer Normalization · Residual Connection · Attention Dropout · Discriminative Fine-Tuning · Multi-Head Attention · Adam · Linear Warmup With Cosine Annealing · Weight Decay