# AGRR-2019: A Corpus for Gapping Resolution in Russian

**Authors:** Maria Ponomareva, Kira Droganova, Ivan Smurov, Tatiana, Shavrina

arXiv: 1906.04099 · 2019-06-11

## TL;DR

This paper introduces a comprehensive Russian gapping dataset with 7.5k sentences, designed to facilitate NLP research on ellipsis resolution across diverse genres, supporting machine learning approaches.

## Contribution

It provides a large, diverse corpus for Russian gapping resolution and discusses methods developed within the AGRR-2019 shared task.

## Key findings

- Dataset covers multiple genres including news, fiction, social media, and technical texts.
- The corpus is diverse and representative for machine learning applications.
- Gapping resolution methods from the shared task show promising results.

## Abstract

This paper provides a comprehensive overview of the gapping dataset for Russian that consists of 7.5k sentences with gapping (as well as 15k relevant negative sentences) and comprises data from various genres: news, fiction, social media and technical texts. The dataset was prepared for the Automatic Gapping Resolution Shared Task for Russian (AGRR-2019) - a competition aimed at stimulating the development of NLP tools and methods for processing of ellipsis.   In this paper, we pay special attention to the gapping resolution methods that were introduced within the shared task as well as an alternative test set that illustrates that our corpus is a diverse and representative subset of Russian language gapping sufficient for effective utilization of machine learning techniques.

## Full text

_Full body text omitted from this summary view._ Fetch the complete paper as Markdown: https://tomesphere.com/paper/1906.04099/full.md

## References

32 references — full list in the complete paper: https://tomesphere.com/paper/1906.04099/full.md

---
Source: https://tomesphere.com/paper/1906.04099