# Neural Mention Detection

**Authors:** Juntao Yu, Bernd Bohnet, Massimo Poesio

arXiv: 1907.12524 · 2020-06-23

## TL;DR

This paper introduces and compares three neural network models for mention detection, demonstrating significant improvements in recall and F1 scores over existing methods, and enhancing coreference resolution and nested NER tasks.

## Contribution

The paper presents a novel neural mention detection approach using BERT and biaffine classifiers, achieving state-of-the-art results across multiple datasets.

## Key findings

- Best model improves mention recall by up to 1.8 percentage points.
- Model achieves up to 6.2 percentage points higher F1 on CONLL and CRAC datasets.
- Enhanced coreference resolution performance with up to 1.7 percentage points improvement.

## Abstract

Mention detection is an important preprocessing step for annotation and interpretation in applications such as NER and coreference resolution, but few stand-alone neural models have been proposed able to handle the full range of mentions. In this work, we propose and compare three neural network-based approaches to mention detection. The first approach is based on the mention detection part of a state of the art coreference resolution system; the second uses ELMO embeddings together with a bidirectional LSTM and a biaffine classifier; the third approach uses the recently introduced BERT model. Our best model (using a biaffine classifier) achieves gains of up to 1.8 percentage points on mention recall when compared with a strong baseline in a HIGH RECALL coreference annotation setting. The same model achieves improvements of up to 5.3 and 6.2 p.p. when compared with the best-reported mention detection F1 on the CONLL and CRAC coreference data sets respectively in a HIGH F1 annotation setting. We then evaluate our models for coreference resolution by using mentions predicted by our best model in start-of-the-art coreference systems. The enhanced model achieved absolute improvements of up to 1.7 and 0.7 p.p. when compared with our strong baseline systems (pipeline system and end-to-end system) respectively. For nested NER, the evaluation of our model on the GENIA corpora shows that our model matches or outperforms state-of-the-art models despite not being specifically designed for this task.

## Full text

_Full body text omitted from this summary view._ Fetch the complete paper as Markdown: https://tomesphere.com/paper/1907.12524/full.md

## Figures

1 figure with captions in the complete paper: https://tomesphere.com/paper/1907.12524/full.md

## References

35 references — full list in the complete paper: https://tomesphere.com/paper/1907.12524/full.md

---
Source: https://tomesphere.com/paper/1907.12524