E.T.: Entity-Transformers. Coreference augmented Neural Language Model   for richer mention representations via Entity-Transformer blocks

Nikolaos Stylianou; Ioannis Vlahavas

arXiv:2011.05431·cs.CL·November 12, 2020

E.T.: Entity-Transformers. Coreference augmented Neural Language Model for richer mention representations via Entity-Transformer blocks

Nikolaos Stylianou, Ioannis Vlahavas

PDF

Open Access

TL;DR

This paper introduces GPT2E, an extension of GPT2 that incorporates coreference information via Entity-Transformer blocks, resulting in richer entity mention representations with minimal additional training cost.

Contribution

The paper presents a novel Entity-Transformer architecture that integrates coreference annotations into GPT2, enhancing entity representations without significant computational overhead.

Findings

01

GPT2E outperforms GPT2 in perplexity on CoNLL 2012 and LAMBADA datasets.

02

Entity-Transformers improve downstream tasks like Named Entity Recognition.

03

The approach is adaptable to most Transformer-based language models.

Abstract

In the last decade, the field of Neural Language Modelling has witnessed enormous changes, with the development of novel models through the use of Transformer architectures. However, even these models struggle to model long sequences due to memory constraints and increasing computational complexity. Coreference annotations over the training data can provide context far beyond the modelling limitations of such language models. In this paper we present an extension over the Transformer-block architecture used in neural language models, specifically in GPT2, in order to incorporate entity annotations during training. Our model, GPT2E, extends the Transformer layers architecture of GPT2 to Entity-Transformers, an architecture designed to handle coreference information when present. To that end, we achieve richer representations for entity mentions, with insignificant training cost. We show…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsTopic Modeling · Natural Language Processing Techniques · Machine Learning in Healthcare

MethodsLinear Layer · Absolute Position Encodings · Position-Wise Feed-Forward Layer · Byte Pair Encoding · Dropout · Softmax · Multi-Head Attention · Residual Connection · Dense Connections · Label Smoothing