Linked Multi-Model Data on Russian Domestic and Foreign Policy Speeches

Daria Blinova; Gayathri Emuru; Rakesh Emuru; Kushagradheer Shridheer Srivastava; Mina Rulis; Sunita Chandrasekaran; Benjamin E. Bagozzi

arXiv:2605.15886·cs.CL·May 18, 2026

Linked Multi-Model Data on Russian Domestic and Foreign Policy Speeches

Daria Blinova, Gayathri Emuru, Rakesh Emuru, Kushagradheer Shridheer Srivastava, Mina Rulis, Sunita Chandrasekaran, Benjamin E. Bagozzi

PDF

TL;DR

This paper presents a comprehensive, multimodal dataset of Russian political speeches with multilingual texts, images, and metadata, enabling advanced analysis of authoritarian political communication.

Contribution

It introduces a novel, linked multimodal dataset of Russian government speeches with validated topical annotations, supporting social science and LLM research.

Findings

01

Dataset includes decades of speeches with multilingual texts and images.

02

Linked data enables multimodal, multilingual, temporal, and spatial analysis.

03

Annotated data supports research in authoritarian politics and LLM applications.

Abstract

This paper introduces a dataset of interlinked multimodal political communications from the Russian government, addressing persistent deficiencies in the availability of social text- and image-based data for authoritarian politics contexts. The dataset comprises two large corpora of official speeches delivered by senior actors within the Kremlin and the Russian Ministry of Foreign Affairs over multiple decades. For each speech, we provide Russian- and English-language texts, associated images and captions where available, and harmonized metadata including (e.g.) dates, speakers, (geo)locations, and official government content tags. Unique identifiers link images to speeches and align Russian and English versions of the same communication texts. We further augment these linked datasets with validated topical annotations for both speech texts and speech images, which are generated via…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.