Multilingual Coreference Resolution with Harmonized Annotations
Ond\v{r}ej Pra\v{z}\'ak, Miloslav Konop\'ik, Jakub Sido

TL;DR
This paper introduces a multilingual coreference resolution approach using a new harmonized corpus, demonstrating that combined models improve performance, especially for low-resource languages.
Contribution
The paper presents CorefUD, a multilingual corpus with harmonized annotations, and shows that joint training models benefit low-resource languages in coreference resolution.
Findings
Harmonized annotations improve coreference resolution performance.
Joint models outperform monolingual models for smaller datasets.
Multilingual training benefits low-resource languages.
Abstract
In this paper, we present coreference resolution experiments with a newly created multilingual corpus CorefUD. We focus on the following languages: Czech, Russian, Polish, German, Spanish, and Catalan. In addition to monolingual experiments, we combine the training data in multilingual experiments and train two joined models -- for Slavic languages and for all the languages together. We rely on an end-to-end deep learning model that we slightly adapted for the CorefUD corpus. Our results show that we can profit from harmonized annotations, and using joined models helps significantly for the languages with smaller training data.
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsNatural Language Processing Techniques · Topic Modeling · Speech Recognition and Synthesis
