RuBQ: A Russian Dataset for Question Answering over Wikidata
Vladislav Korablinov, Pavel Braslavski

TL;DR
RuBQ is the first Russian KBQA dataset with 1,500 questions, SPARQL queries, translations, and reference answers, enabling research in question answering over Wikidata in Russian.
Contribution
The paper introduces RuBQ, a high-quality, multilingual KBQA dataset specifically for Russian, combining questions, translations, SPARQL queries, and entity linking.
Findings
Dataset contains 1,500 questions with varying complexity.
Includes automatic and manual verification processes.
Facilitates development of Russian KBQA systems.
Abstract
The paper presents RuBQ, the first Russian knowledge base question answering (KBQA) dataset. The high-quality dataset consists of 1,500 Russian questions of varying complexity, their English machine translations, SPARQL queries to Wikidata, reference answers, as well as a Wikidata sample of triples containing entities with Russian labels. The dataset creation started with a large collection of question-answer pairs from online quizzes. The data underwent automatic filtering, crowd-assisted entity linking, automatic generation of SPARQL queries, and their subsequent in-house verification.
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
