Attending to Entities for Better Text Understanding

Pengxiang Cheng; Katrin Erk

arXiv:1911.04361·cs.CL·November 12, 2019·1 cites

Attending to Entities for Better Text Understanding

Pengxiang Cheng, Katrin Erk

PDF

Open Access

TL;DR

This paper demonstrates that injecting coreference knowledge into self-attention models significantly improves performance on complex NLP tasks, surpassing larger models with fewer parameters.

Contribution

Introducing a method to incorporate coreference information into self-attention models, leading to state-of-the-art results on the LAMBADA task with fewer parameters.

Findings

01

Model with coreference supervision outperforms GPT-2 on LAMBADA

02

Achieves state-of-the-art with fewer parameters

03

Analysis of architecture and supervision variants

Abstract

Recent progress in NLP witnessed the development of large-scale pre-trained language models (GPT, BERT, XLNet, etc.) based on Transformer (Vaswani et al. 2017), and in a range of end tasks, such models have achieved state-of-the-art results, approaching human performance. This demonstrates the power of the stacked self-attention architecture when paired with a sufficient number of layers and a large amount of pre-training data. However, on tasks that require complex and long-distance reasoning where surface-level cues are not enough, there is still a large gap between the pre-trained models and human performance. Strubell et al. (2018) recently showed that it is possible to inject knowledge of syntactic structure into a model through supervised self-attention. We conjecture that a similar injection of semantic knowledge, in particular, coreference information, into an existing model…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsTopic Modeling · Natural Language Processing Techniques · Text Readability and Simplification

MethodsLinear Layer · Absolute Position Encodings · Position-Wise Feed-Forward Layer · Cosine Annealing · WordPiece · Linear Warmup With Linear Decay · SentencePiece · BERT · XLNet · Residual Connection