DaNetQA: a yes/no Question Answering Dataset for the Russian Language

Taisia Glushkova; Alexey Machnev; Alena Fenogenova; Tatiana; Shavrina; Ekaterina Artemova; Dmitry I. Ignatov

arXiv:2010.02605·cs.CL·October 4, 2023

DaNetQA: a yes/no Question Answering Dataset for the Russian Language

Taisia Glushkova, Alexey Machnev, Alena Fenogenova, Tatiana, Shavrina, Ekaterina Artemova, Dmitry I. Ignatov

PDF

TL;DR

DaNetQA is a new Russian yes/no question-answering dataset based on Wikipedia paragraphs, with a focus on transfer learning methods for improving model performance across tasks and languages.

Contribution

The paper introduces DaNetQA, a reproducible dataset for Russian yes/no QA, and explores transfer learning techniques for task and language transfer.

Findings

01

Transfer learning improves QA performance.

02

Multilingual fine-tuning enhances cross-language transfer.

03

Task transfer from related tasks boosts accuracy.

Abstract

DaNetQA, a new question-answering corpus, follows (Clark et. al, 2019) design: it comprises natural yes/no questions. Each question is paired with a paragraph from Wikipedia and an answer, derived from the paragraph. The task is to take both the question and a paragraph as input and come up with a yes/no answer, i.e. to produce a binary output. In this paper, we present a reproducible approach to DaNetQA creation and investigate transfer learning methods for task and language transferring. For task transferring we leverage three similar sentence modelling tasks: 1) a corpus of paraphrases, Paraphraser, 2) an NLI task, for which we use the Russian part of XNLI, 3) another question answering task, SberQUAD. For language transferring we use English to Russian translation together with multilingual language fine-tuning.

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.