A Russian Jeopardy! Data Set for Question-Answering Systems

Elena Mikhalkova

arXiv:2112.02325·cs.CL·October 8, 2024

A Russian Jeopardy! Data Set for Question-Answering Systems

Elena Mikhalkova

PDF

Open Access 1 Repo

TL;DR

This paper introduces a large Russian question-answering dataset derived from quiz shows, facilitating research in NLP tasks like fact extraction and semantic search, especially for Russian language applications.

Contribution

It provides a novel, sizable Russian QA dataset from quiz shows, enabling new research opportunities in NLP and question-answering systems for Russian.

Findings

01

The dataset contains 379,284 questions, including 29,375 from 'Own Game'

02

Analysis of linguistic features relevant to QA tasks

03

Potential for QA competitions based on the dataset

Abstract

Question answering (QA) is one of the most common NLP tasks that relates to named entity recognition, fact extraction, semantic search and some other fields. In industry, it is much appreciated in chatbots and corporate information systems. It is also a challenging task that attracted the attention of a very general audience at the quiz show Jeopardy! In this article we describe a Jeopardy!-like Russian QA data set collected from the official Russian quiz database Chgk (che ge ka). The data set includes 379,284 quiz-like questions with 29,375 from the Russian analogue of Jeopardy! - "Own Game". We observe its linguistic features and the related QA-task. We conclude about perspectives of a QA competition based on the data set collected from this database.

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

evrog/russian-qa-jeopardy
noneOfficial

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsTopic Modeling · Natural Language Processing Techniques