DART: Open-Domain Structured Data Record to Text Generation
Linyong Nan, Dragomir Radev, Rui Zhang, Amrit Rau, Abhinand, Sivaprasad, Chiachun Hsieh, Xiangru Tang, Aadit Vyas, Neha Verma, Pranav, Krishna, Yangxiaokang Liu, Nadia Irwanto, Jessica Pan, Faiaz Rahman, Ahmad, Zaidi, Mutethia Mutuma, Yasin Tarabar, Ankit Gupta, Tao Yu

TL;DR
DART is a large, open-domain dataset for structured data record to text generation, created through a novel extraction process that encodes table semantics, aiming to improve out-of-domain generalization and challenge existing datasets.
Contribution
The paper introduces DART, a new large-scale dataset for data-to-text generation with a novel extraction framework that captures table semantics and merges heterogeneous sources.
Findings
DART poses new challenges for data-to-text models.
State-of-the-art results achieved on WebNLG 2017.
Facilitates out-of-domain generalization.
Abstract
We present DART, an open domain structured DAta Record to Text generation dataset with over 82k instances (DARTs). Data-to-Text annotations can be a costly process, especially when dealing with tables which are the major source of structured data and contain nontrivial structures. To this end, we propose a procedure of extracting semantic triples from tables that encodes their structures by exploiting the semantic dependencies among table headers and the table title. Our dataset construction framework effectively merged heterogeneous sources from open domain semantic parsing and dialogue-act-based meaning representation tasks by utilizing techniques such as: tree ontology annotation, question-answer pair to declarative sentence conversion, and predicate unification, all with minimum post-editing. We present systematic evaluation on DART as well as new state-of-the-art results on WebNLG…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsTopic Modeling · Natural Language Processing Techniques · Advanced Text Analysis Techniques
