The Russian Legislative Corpus

Denis Saveliev; Ruslan Kuchakov

arXiv:2406.04855·cs.CL·April 29, 2026

The Russian Legislative Corpus

Denis Saveliev, Ruslan Kuchakov

PDF

1 Repo 1 Datasets

TL;DR

This paper introduces a comprehensive Russian legislative corpus spanning from 1991 to 2025, available in basic and detailed versions with extensive linguistic annotations.

Contribution

It provides a large-scale, annotated corpus of Russian legislation, facilitating linguistic and legal research with detailed syntactic and morphological data.

Findings

01

Contains 304,382 texts and over 194 million tokens.

02

Includes texts in original and Universal Dependencies formats.

03

Enables advanced NLP research on Russian legal language.

Abstract

We present a comprehensive corpus of Russian primary and secondary legislation adopted between 1991 and 2025, comprising 304,382 texts (194,425,905 tokens). The corpus is available in two versions: the basic version contains texts with simple metadata, while the detailed version includes both the original texts and their equivalents converted to the Universal Dependencies CoNLL-U format, annotated with parts of speech, morphological features, and syntactic dependencies.

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

irlcode/RusLawOD
github

Datasets

irlspbru/RusLawOD
dataset· 831 dl
831 dl

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.