WikiCREM: A Large Unsupervised Corpus for Coreference Resolution

Vid Kocijan; Oana-Maria Camburu; Ana-Maria Cretu; Yordan Yordanov,; Phil Blunsom; Thomas Lukasiewicz

arXiv:1908.08025·cs.CL·October 15, 2019

WikiCREM: A Large Unsupervised Corpus for Coreference Resolution

Vid Kocijan, Oana-Maria Camburu, Ana-Maria Cretu, Yordan Yordanov,, Phil Blunsom, Thomas Lukasiewicz

PDF

1 Repo

TL;DR

WikiCREM is a large, automatically generated dataset for pronoun coreference resolution that enables training and evaluation of models, leading to state-of-the-art results on multiple benchmarks.

Contribution

The paper introduces WikiCREM, a novel large-scale unsupervised dataset for pronoun resolution, and demonstrates its effectiveness with models that outperform previous approaches.

Findings

01

Achieved state-of-the-art results on 6 out of 7 coreference datasets.

02

Demonstrated the effectiveness of the WikiCREM dataset for training coreference models.

03

Provided an off-the-shelf model for pronoun disambiguation.

Abstract

Pronoun resolution is a major area of natural language understanding. However, large-scale training sets are still scarce, since manually labelling data is costly. In this work, we introduce WikiCREM (Wikipedia CoREferences Masked) a large-scale, yet accurate dataset of pronoun disambiguation instances. We use a language-model-based approach for pronoun resolution in combination with our WikiCREM dataset. We compare a series of models on a collection of diverse and challenging coreference resolution problems, where we match or outperform previous state-of-the-art approaches on 6 out of 7 datasets, such as GAP, DPR, WNLI, PDP, WinoBias, and WinoGender. We release our model to be used off-the-shelf for solving pronoun disambiguation.

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

vid-koci/bert-commonsense
pytorchOfficial

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.