The Russian Legislative Corpus
Denis Saveliev, Ruslan Kuchakov

TL;DR
This paper introduces a comprehensive Russian legislative corpus spanning from 1991 to 2025, available in basic and detailed versions with extensive linguistic annotations.
Contribution
It provides a large-scale, annotated corpus of Russian legislation, facilitating linguistic and legal research with detailed syntactic and morphological data.
Findings
Contains 304,382 texts and over 194 million tokens.
Includes texts in original and Universal Dependencies formats.
Enables advanced NLP research on Russian legal language.
Abstract
We present a comprehensive corpus of Russian primary and secondary legislation adopted between 1991 and 2025, comprising 304,382 texts (194,425,905 tokens). The corpus is available in two versions: the basic version contains texts with simple metadata, while the detailed version includes both the original texts and their equivalents converted to the Universal Dependencies CoNLL-U format, annotated with parts of speech, morphological features, and syntactic dependencies.
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
