BERT: Pre-training of Deep Bidirectional Transformers for Language   Understanding

Jacob Devlin; Ming-Wei Chang; Kenton Lee; and Kristina Toutanova

arXiv:1810.04805·cs.CL·May 28, 2019

BERT: Pre-training of Deep Bidirectional Transformers for Language Understanding

Jacob Devlin, Ming-Wei Chang, Kenton Lee, and Kristina Toutanova

PDF

5 Repos 10 Models 5 Datasets 1 Video

TL;DR

BERT introduces a deep bidirectional transformer-based model pre-trained on unlabeled text, achieving state-of-the-art results across multiple NLP tasks with minimal task-specific modifications.

Contribution

It presents a novel pre-training method for deep bidirectional representations that significantly improves performance on various NLP benchmarks.

Findings

01

Achieves new state-of-the-art results on eleven NLP tasks.

02

Pushes GLUE score to 80.5%.

03

Improves SQuAD v1.1 and v2.0 F1 scores substantially.

Abstract

We introduce a new language representation model called BERT, which stands for Bidirectional Encoder Representations from Transformers. Unlike recent language representation models, BERT is designed to pre-train deep bidirectional representations from unlabeled text by jointly conditioning on both left and right context in all layers. As a result, the pre-trained BERT model can be fine-tuned with just one additional output layer to create state-of-the-art models for a wide range of tasks, such as question answering and language inference, without substantial task-specific architecture modifications. BERT is conceptually simple and empirically powerful. It obtains new state-of-the-art results on eleven natural language processing tasks, including pushing the GLUE score to 80.5% (7.7% point absolute improvement), MultiNLI accuracy to 86.7% (4.6% absolute improvement), SQuAD v1.1…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

Models

Datasets

Videos

BERT: Pre-training of Deep Bidirectional Transformers for Language Understanding· youtube

Taxonomy

Methods🗣Does Fidelity have 24 hour customer service? "Fidelity technical support" · Linear Layer · mBERT · Residual Connection · Attention Dropout · Linear Warmup With Linear Decay · Weight Decay · Refunds@Expedia|||How do I get a full refund from Expedia? · Dense Connections · Adam