Attending to Entities for Better Text Understanding
Pengxiang Cheng, Katrin Erk

TL;DR
This paper demonstrates that injecting coreference knowledge into self-attention models significantly improves performance on complex NLP tasks, surpassing larger models with fewer parameters.
Contribution
Introducing a method to incorporate coreference information into self-attention models, leading to state-of-the-art results on the LAMBADA task with fewer parameters.
Findings
Model with coreference supervision outperforms GPT-2 on LAMBADA
Achieves state-of-the-art with fewer parameters
Analysis of architecture and supervision variants
Abstract
Recent progress in NLP witnessed the development of large-scale pre-trained language models (GPT, BERT, XLNet, etc.) based on Transformer (Vaswani et al. 2017), and in a range of end tasks, such models have achieved state-of-the-art results, approaching human performance. This demonstrates the power of the stacked self-attention architecture when paired with a sufficient number of layers and a large amount of pre-training data. However, on tasks that require complex and long-distance reasoning where surface-level cues are not enough, there is still a large gap between the pre-trained models and human performance. Strubell et al. (2018) recently showed that it is possible to inject knowledge of syntactic structure into a model through supervised self-attention. We conjecture that a similar injection of semantic knowledge, in particular, coreference information, into an existing model…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsTopic Modeling · Natural Language Processing Techniques · Text Readability and Simplification
MethodsLinear Layer · Absolute Position Encodings · Position-Wise Feed-Forward Layer · Cosine Annealing · WordPiece · Linear Warmup With Linear Decay · SentencePiece · BERT · XLNet · Residual Connection
