Ontology-Free General-Domain Knowledge Graph-to-Text Generation Dataset   Synthesis using Large Language Model

Daehee Kim; Deokhyung Kang; Sangwon Ryu; Gary Geunbae Lee

arXiv:2409.07088·cs.CL·September 12, 2024

Ontology-Free General-Domain Knowledge Graph-to-Text Generation Dataset Synthesis using Large Language Model

Daehee Kim, Deokhyung Kang, Sangwon Ryu, Gary Geunbae Lee

PDF

Open Access 1 Repo

TL;DR

This paper introduces a large-scale, high-quality dataset for graph-to-text generation created using large language models, enabling better training and performance of language models in this task.

Contribution

The paper presents WikiOFGraph, a novel large-scale dataset for G2T generation, generated without external ontologies, and demonstrates its effectiveness in improving PLM performance.

Findings

01

PLM fine-tuned on WikiOFGraph outperforms models trained on other datasets.

02

The dataset contains 5.85 million graph-text pairs.

03

The method is scalable and produces high-quality G2T data.

Abstract

Knowledge Graph-to-Text (G2T) generation involves verbalizing structured knowledge graphs into natural language text. Recent advancements in Pretrained Language Models (PLMs) have improved G2T performance, but their effectiveness depends on datasets with precise graph-text alignment. However, the scarcity of high-quality, general-domain G2T generation datasets restricts progress in the general-domain G2T generation research. To address this issue, we introduce Wikipedia Ontology-Free Graph-text dataset (WikiOFGraph), a new large-scale G2T dataset generated using a novel method that leverages Large Language Model (LLM) and Data-QuestEval. Our new dataset, which contains 5.85M general-domain graph-text pairs, offers high graph-text consistency without relying on external ontologies. Experimental results demonstrate that PLM fine-tuned on WikiOFGraph outperforms those trained on other…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

daehuikim/WikiOFGraph
noneOfficial

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsTopic Modeling · Semantic Web and Ontologies