Neural Pipeline for Zero-Shot Data-to-Text Generation
Zden\v{e}k Kasner, Ond\v{r}ej Du\v{s}ek

TL;DR
This paper introduces a pipeline approach that leverages pretrained language models for zero-shot data-to-text generation by transforming descriptions through trained modules, avoiding fine-tuning on specific datasets.
Contribution
The authors propose a novel pipeline method that uses general-domain trained modules to perform data-to-text generation without dataset-specific fine-tuning.
Findings
Enables zero-shot data-to-text generation from RDF triples.
Outperforms fine-tuned models on WebNLG and E2E datasets.
Uses synthetic corpus for training modules on general operations.
Abstract
In data-to-text (D2T) generation, training on in-domain data leads to overfitting to the data representation and repeating training data noise. We examine how to avoid finetuning pretrained language models (PLMs) on D2T generation datasets while still taking advantage of surface realization capabilities of PLMs. Inspired by pipeline approaches, we propose to generate text by transforming single-item descriptions with a sequence of modules trained on general-domain text-based operations: ordering, aggregation, and paragraph compression. We train PLMs for performing these operations on a synthetic corpus WikiFluent which we build from English Wikipedia. Our experiments on two major triple-to-text datasets -- WebNLG and E2E -- show that our approach enables D2T generation from RDF triples in zero-shot settings.
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsTopic Modeling · Natural Language Processing Techniques · Multimodal Machine Learning Applications
