RuCoCo: a new Russian corpus with coreference annotation
Vladimir Dobrovolskii, Mariia Michurina, Alexandra Ivoylova

TL;DR
RuCoCo is a large Russian corpus with coreference annotations, combining machine-generated and human-refined annotations to facilitate research in Russian NLP.
Contribution
This paper introduces RuCoCo, a new large-scale Russian coreference corpus with high inter-annotator agreement, combining manual and machine annotations.
Findings
Corpus contains 1 million words and 150,000 mentions.
High inter-annotator agreement achieved.
Publicly available resource for NLP research.
Abstract
We present a new corpus with coreference annotation, Russian Coreference Corpus (RuCoCo). The goal of RuCoCo is to obtain a large number of annotated texts while maintaining high inter-annotator agreement. RuCoCo contains news texts in Russian, part of which were annotated from scratch, and for the rest the machine-generated annotations were refined by human annotators. The size of our corpus is one million words and around 150,000 mentions. We make the corpus publicly available.
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsNatural Language Processing Techniques · Topic Modeling · Speech and dialogue systems
