QALD-9-plus: A Multilingual Dataset for Question Answering over DBpedia and Wikidata Translated by Native Speakers
Aleksandr Perevalov, Dennis Diefenbach, Ricardo Usbeck, Andreas Both

TL;DR
QALD-9-plus is a multilingual dataset extension for KGQA, featuring high-quality translations in 8 languages, including endangered ones, and transferring questions from DBpedia to Wikidata to enhance accessibility and usability.
Contribution
The paper introduces QALD-9-plus, a novel multilingual KGQA benchmark with native speaker translations in 8 languages and transfer to Wikidata, addressing accessibility challenges.
Findings
Includes translations in 8 languages, 5 of which are new to KGQA research.
Transfers questions from DBpedia to Wikidata, increasing dataset relevance.
Provides high-quality, native speaker translations for improved accessibility.
Abstract
The ability to have the same experience for different user groups (i.e., accessibility) is one of the most important characteristics of Web-based systems. The same is true for Knowledge Graph Question Answering (KGQA) systems that provide the access to Semantic Web data via natural language interface. While following our research agenda on the multilingual aspect of accessibility of KGQA systems, we identified several ongoing challenges. One of them is the lack of multilingual KGQA benchmarks. In this work, we extend one of the most popular KGQA benchmarks - QALD-9 by introducing high-quality questions' translations to 8 languages provided by native speakers, and transferring the SPARQL queries of QALD-9 from DBpedia to Wikidata, s.t., the usability and relevance of the dataset is strongly increased. Five of the languages - Armenian, Ukrainian, Lithuanian, Bashkir and Belarusian - to…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsTopic Modeling · Natural Language Processing Techniques · Data Quality and Management
