BERT for Coreference Resolution: Baselines and Analysis

Mandar Joshi; Omer Levy; Daniel S. Weld; Luke Zettlemoyer

arXiv:1908.09091·cs.CL·December 24, 2019

BERT for Coreference Resolution: Baselines and Analysis

Mandar Joshi, Omer Levy, Daniel S. Weld, Luke Zettlemoyer

PDF

2 Repos

TL;DR

This paper applies BERT to coreference resolution, demonstrating significant performance improvements and analyzing its strengths and limitations in understanding context and entity distinctions.

Contribution

It introduces BERT-based models for coreference resolution, providing baseline results and insights into model behavior and areas for future enhancement.

Findings

01

BERT-large outperforms ELMo and BERT-base in coreference tasks.

02

Achieved +3.9 F1 on OntoNotes and +11.5 F1 on GAP benchmarks.

03

Identified challenges in modeling document context and paraphrasing.

Abstract

We apply BERT to coreference resolution, achieving strong improvements on the OntoNotes (+3.9 F1) and GAP (+11.5 F1) benchmarks. A qualitative analysis of model predictions indicates that, compared to ELMo and BERT-base, BERT-large is particularly better at distinguishing between related but distinct entities (e.g., President and CEO). However, there is still room for improvement in modeling document-level context, conversations, and mention paraphrasing. Our code and models are publicly available.

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

MethodsLinear Layer · Sigmoid Activation · Tanh Activation · Residual Connection · Attention Dropout · Linear Warmup With Linear Decay · Weight Decay · Refunds@Expedia|||How do I get a full refund from Expedia? · Dense Connections · Adam