Multilingual Coreference Resolution with Harmonized Annotations

Ond\v{r}ej Pra\v{z}\'ak; Miloslav Konop\'ik; Jakub Sido

arXiv:2107.12088·cs.CL·September 6, 2021·1 cites

Multilingual Coreference Resolution with Harmonized Annotations

Ond\v{r}ej Pra\v{z}\'ak, Miloslav Konop\'ik, Jakub Sido

PDF

Open Access

TL;DR

This paper introduces a multilingual coreference resolution approach using a new harmonized corpus, demonstrating that combined models improve performance, especially for low-resource languages.

Contribution

The paper presents CorefUD, a multilingual corpus with harmonized annotations, and shows that joint training models benefit low-resource languages in coreference resolution.

Findings

01

Harmonized annotations improve coreference resolution performance.

02

Joint models outperform monolingual models for smaller datasets.

03

Multilingual training benefits low-resource languages.

Abstract

In this paper, we present coreference resolution experiments with a newly created multilingual corpus CorefUD. We focus on the following languages: Czech, Russian, Polish, German, Spanish, and Catalan. In addition to monolingual experiments, we combine the training data in multilingual experiments and train two joined models -- for Slavic languages and for all the languages together. We rely on an end-to-end deep learning model that we slightly adapted for the CorefUD corpus. Our results show that we can profit from harmonized annotations, and using joined models helps significantly for the languages with smaller training data.

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsNatural Language Processing Techniques · Topic Modeling · Speech Recognition and Synthesis