So What's the Plan? Mining Strategic Planning Documents

Ekaterina Artemova; Tatiana Batura; Anna Golenkovskaya; Vitaly Ivanin,; Vladimir Ivanov; Veronika Sarkisyan; Ivan Smurov; Elena Tutubalina

arXiv:2007.00257·cs.CL·July 8, 2020

So What's the Plan? Mining Strategic Planning Documents

Ekaterina Artemova, Tatiana Batura, Anna Golenkovskaya, Vitaly Ivanin,, Vladimir Ivanov, Veronika Sarkisyan, Ivan Smurov, Elena Tutubalina

PDF

1 Repo

TL;DR

This paper introduces RuREBus, a large corpus of Russian strategic planning documents created through a semi-automated annotation process, enabling new language technology applications and insights for e-government research.

Contribution

It presents a novel corpus creation pipeline combining machine learning and manual correction for Russian strategic planning texts.

Findings

01

Successful creation of a large annotated corpus

02

Demonstrated pipeline for semi-automated text annotation

03

Potential for new insights in e-government research

Abstract

In this paper we present a corpus of Russian strategic planning documents, RuREBus. This project is grounded both from language technology and e-government perspectives. Not only new language sources and tools are being developed, but also their applications to e-goverment research. We demonstrate the pipeline for creating a text corpus from scratch. First, the annotation schema is designed. Next texts are marked up using human-in-the-loop strategy, so that preliminary annotations are derived from a machine learning model and are manually corrected. The amount of annotated texts is large enough to showcase what insights can be gained from RuREBus.

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

dialogue-evaluation/RuREBus
noneOfficial

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.