InvertiTune: High-Quality Data Synthesis for Cost-Effective Single-Shot Text-to-Knowledge Graph Generation
Faezeh Faez, Marzieh S. Tahaei, Yaochen Hu, Ali Pourranjbar, Mahdi Biparva, Mark Coates, Yingxue Zhang

TL;DR
InvertiTune introduces a novel data synthesis and fine-tuning framework that significantly improves single-shot text-to-knowledge graph generation by creating realistic training datasets and outperforming existing methods.
Contribution
The paper presents a new framework combining controlled data generation with supervised fine-tuning to enhance Text2KG performance and realism.
Findings
Outperforms larger non-fine-tuned LLMs and state-of-the-art Text2KG methods.
Generates more realistic datasets reflecting real-world scenarios.
Demonstrates strong cross-dataset generalization.
Abstract
Large Language Models (LLMs) have revolutionized the ability to understand and generate text, enabling significant progress in automatic knowledge graph construction from text (Text2KG). Many Text2KG methods, however, rely on iterative LLM prompting, making them computationally expensive and prone to overlooking complex relations distributed throughout the text. To address these limitations, we propose InvertiTune, a framework that combines a controlled data generation pipeline with supervised fine-tuning (SFT). Within this framework, the data-generation pipeline systematically extracts subgraphs from large knowledge bases, applies noise filtering, and leverages LLMs to generate corresponding natural text descriptions, a task more aligned with LLM capabilities than direct KG generation from text. This pipeline enables generating datasets composed of longer texts paired with larger KGs…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsAdvanced Graph Neural Networks · Topic Modeling · Multimodal Machine Learning Applications
