Multi-Task Deep Neural Networks for Natural Language Understanding

Xiaodong Liu; Pengcheng He; Weizhu Chen; Jianfeng Gao

arXiv:1901.11504·cs.CL·May 31, 2019·220 cites

Multi-Task Deep Neural Networks for Natural Language Understanding

Xiaodong Liu, Pengcheng He, Weizhu Chen, Jianfeng Gao

PDF

Open Access 5 Repos

TL;DR

This paper introduces MT-DNN, a multi-task deep learning model that combines BERT with multi-task training to achieve state-of-the-art results on various NLU benchmarks and improve domain adaptation.

Contribution

The paper presents a novel multi-task learning framework that enhances BERT with shared representations, leading to improved performance and domain adaptation in NLU tasks.

Findings

01

Achieved new state-of-the-art on ten NLU tasks.

02

Improved GLUE benchmark score to 82.7%.

03

Enabled domain adaptation with fewer labels.

Abstract

In this paper, we present a Multi-Task Deep Neural Network (MT-DNN) for learning representations across multiple natural language understanding (NLU) tasks. MT-DNN not only leverages large amounts of cross-task data, but also benefits from a regularization effect that leads to more general representations in order to adapt to new tasks and domains. MT-DNN extends the model proposed in Liu et al. (2015) by incorporating a pre-trained bidirectional transformer language model, known as BERT (Devlin et al., 2018). MT-DNN obtains new state-of-the-art results on ten NLU tasks, including SNLI, SciTail, and eight out of nine GLUE tasks, pushing the GLUE benchmark to 82.7% (2.2% absolute improvement). We also demonstrate using the SNLI and SciTail datasets that the representations learned by MT-DNN allow domain adaptation with substantially fewer in-domain labels than the pre-trained BERT…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsTopic Modeling · Natural Language Processing Techniques · Multimodal Machine Learning Applications

MethodsLinear Layer · Absolute Position Encodings · Position-Wise Feed-Forward Layer · Residual Connection · Attention Dropout · Linear Warmup With Linear Decay · Weight Decay · Byte Pair Encoding · Refunds@Expedia|||How do I get a full refund from Expedia? · Dense Connections