A Russian Jeopardy! Data Set for Question-Answering Systems
Elena Mikhalkova

TL;DR
This paper introduces a large Russian question-answering dataset derived from quiz shows, facilitating research in NLP tasks like fact extraction and semantic search, especially for Russian language applications.
Contribution
It provides a novel, sizable Russian QA dataset from quiz shows, enabling new research opportunities in NLP and question-answering systems for Russian.
Findings
The dataset contains 379,284 questions, including 29,375 from 'Own Game'
Analysis of linguistic features relevant to QA tasks
Potential for QA competitions based on the dataset
Abstract
Question answering (QA) is one of the most common NLP tasks that relates to named entity recognition, fact extraction, semantic search and some other fields. In industry, it is much appreciated in chatbots and corporate information systems. It is also a challenging task that attracted the attention of a very general audience at the quiz show Jeopardy! In this article we describe a Jeopardy!-like Russian QA data set collected from the official Russian quiz database Chgk (che ge ka). The data set includes 379,284 quiz-like questions with 29,375 from the Russian analogue of Jeopardy! - "Own Game". We observe its linguistic features and the related QA-task. We conclude about perspectives of a QA competition based on the data set collected from this database.
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsTopic Modeling · Natural Language Processing Techniques
