WDV: A Broad Data Verbalisation Dataset Built from Wikidata

Gabriel Amaral; Odinaldo Rodrigues; Elena Simperl

arXiv:2205.02627·cs.CL·May 6, 2022

WDV: A Broad Data Verbalisation Dataset Built from Wikidata

Gabriel Amaral, Odinaldo Rodrigues, Elena Simperl

PDF

Open Access 1 Repo

TL;DR

This paper introduces WDV, a comprehensive dataset for verbalising Wikidata's knowledge graph triples into human-readable text, addressing existing gaps and supporting NLP research.

Contribution

We created WDV, a large, diverse Wikidata claim verbalisation dataset with high-quality text-triple coupling and an evaluation workflow for fluency and adequacy.

Findings

01

WDV covers a wide range of entities and predicates.

02

The dataset demonstrates high human-centred quality scores.

03

Open data and tools support further KG verbalisation research.

Abstract

Data verbalisation is a task of great importance in the current field of natural language processing, as there is great benefit in the transformation of our abundant structured and semi-structured data into human-readable formats. Verbalising Knowledge Graph (KG) data focuses on converting interconnected triple-based claims, formed of subject, predicate, and object, into text. Although KG verbalisation datasets exist for some KGs, there are still gaps in their fitness for use in many scenarios. This is especially true for Wikidata, where available datasets either loosely couple claim sets with textual information or heavily focus on predicates around biographies, cities, and countries. To address these gaps, we propose WDV, a large KG claim verbalisation dataset built from Wikidata, with a tight coupling between triples and text, covering a wide variety of entities and predicates. We…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

gabrielmaia7/wdv
pytorchOfficial

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsAdvanced Graph Neural Networks · Topic Modeling · Semantic Web and Ontologies