TL;DR
This paper introduces a scalable, modular pipeline for generating natural language descriptions from structured data, avoiding task-specific training and leveraging basic NLP tools for adaptability across domains.
Contribution
A novel pipeline-based approach that generates coherent paragraphs from structured data without requiring task-specific parallel data, enhancing scalability and domain adaptability.
Findings
Outperforms existing data-to-text systems on benchmark datasets.
Demonstrates robustness across diverse data types like Knowledge Graphs and Key-Value maps.
Operates effectively without task-specific labeled data.
Abstract
We present a framework for generating natural language description from structured data such as tables; the problem comes under the category of data-to-text natural language generation (NLG). Modern data-to-text NLG systems typically employ end-to-end statistical and neural architectures that learn from a limited amount of task-specific labeled data, and therefore, exhibit limited scalability, domain-adaptability, and interpretability. Unlike these systems, ours is a modular, pipeline-based approach, and does not require task-specific parallel data. It rather relies on monolingual corpora and basic off-the-shelf NLP tools. This makes our system more scalable and easily adaptable to newer domains. Our system employs a 3-staged pipeline that: (i) converts entries in the structured data to canonical form, (ii) generates simple sentences for each atomic entry in the canonicalized…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
