# Cloze-driven Pretraining of Self-attention Networks

**Authors:** Alexei Baevski, Sergey Edunov, Yinhan Liu, Luke Zettlemoyer, Michael, Auli

arXiv: 1903.07785 · 2019-03-20

## TL;DR

This paper introduces a bi-directional transformer pretraining method using a cloze-style task, achieving significant improvements in language understanding benchmarks and analyzing key factors for effective pretraining.

## Contribution

It proposes a novel cloze-driven pretraining approach for transformers, demonstrating state-of-the-art results on multiple NLP tasks and providing detailed insights into pretraining factors.

## Key findings

- Large performance gains on GLUE benchmark
- State-of-the-art results on NER and constituency parsing
- Effective pretraining depends on data domain, size, and model capacity

## Abstract

We present a new approach for pretraining a bi-directional transformer model that provides significant performance gains across a variety of language understanding problems. Our model solves a cloze-style word reconstruction task, where each word is ablated and must be predicted given the rest of the text. Experiments demonstrate large performance gains on GLUE and new state of the art results on NER as well as constituency parsing benchmarks, consistent with the concurrently introduced BERT model. We also present a detailed analysis of a number of factors that contribute to effective pretraining, including data domain and size, model capacity, and variations on the cloze objective.

## Full text

_Full body text omitted from this summary view._ Fetch the complete paper as Markdown: https://tomesphere.com/paper/1903.07785/full.md

## Figures

3 figures with captions in the complete paper: https://tomesphere.com/paper/1903.07785/full.md

## References

36 references — full list in the complete paper: https://tomesphere.com/paper/1903.07785/full.md

---
Source: https://tomesphere.com/paper/1903.07785