Construction of Paired Knowledge Graph-Text Datasets Informed by Cyclic Evaluation
Ali Mousavi, Xin Zhan, He Bai, Peng Shi, Theo Rekatsinas, Benjamin, Han, Yunyao Li, Jeff Pound, Josh Susskind, Natalie Schluter, Ihab Ilyas,, Navdeep Jaitly

TL;DR
This paper investigates how dataset quality affects KG-text model hallucinations, introduces cyclic evaluation as a proxy for data equivalence, and constructs improved datasets to enhance model performance.
Contribution
It introduces cyclic evaluation to measure KG-text dataset quality and creates a new dataset, LAGRANGE, with heuristics to improve data equivalence.
Findings
Noisier datasets increase hallucination in models.
Manually created datasets outperform automatic ones in cyclic evaluation.
Synthetic datasets improve text generation but not KG regeneration.
Abstract
Datasets that pair Knowledge Graphs (KG) and text together (KG-T) can be used to train forward and reverse neural models that generate text from KG and vice versa. However models trained on datasets where KG and text pairs are not equivalent can suffer from more hallucination and poorer recall. In this paper, we verify this empirically by generating datasets with different levels of noise and find that noisier datasets do indeed lead to more hallucination. We argue that the ability of forward and reverse models trained on a dataset to cyclically regenerate source KG or text is a proxy for the equivalence between the KG and the text in the dataset. Using cyclic evaluation we find that manually created WebNLG is much better than automatically created TeKGen and T-REx. Guided by these observations, we construct a new, improved dataset called LAGRANGE using heuristics meant to improve…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsTopic Modeling · Natural Language Processing Techniques · Advanced Graph Neural Networks
