Neural Pipeline for Zero-Shot Data-to-Text Generation

Zden\v{e}k Kasner; Ond\v{r}ej Du\v{s}ek

arXiv:2203.16279·cs.CL·March 31, 2022

Neural Pipeline for Zero-Shot Data-to-Text Generation

Zden\v{e}k Kasner, Ond\v{r}ej Du\v{s}ek

PDF

Open Access 1 Repo

TL;DR

This paper introduces a pipeline approach that leverages pretrained language models for zero-shot data-to-text generation by transforming descriptions through trained modules, avoiding fine-tuning on specific datasets.

Contribution

The authors propose a novel pipeline method that uses general-domain trained modules to perform data-to-text generation without dataset-specific fine-tuning.

Findings

01

Enables zero-shot data-to-text generation from RDF triples.

02

Outperforms fine-tuned models on WebNLG and E2E datasets.

03

Uses synthetic corpus for training modules on general operations.

Abstract

In data-to-text (D2T) generation, training on in-domain data leads to overfitting to the data representation and repeating training data noise. We examine how to avoid finetuning pretrained language models (PLMs) on D2T generation datasets while still taking advantage of surface realization capabilities of PLMs. Inspired by pipeline approaches, we propose to generate text by transforming single-item descriptions with a sequence of modules trained on general-domain text-based operations: ordering, aggregation, and paragraph compression. We train PLMs for performing these operations on a synthetic corpus WikiFluent which we build from English Wikipedia. Our experiments on two major triple-to-text datasets -- WebNLG and E2E -- show that our approach enables D2T generation from RDF triples in zero-shot settings.

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

kasnerz/zeroshot-d2t-pipeline
pytorchOfficial

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsTopic Modeling · Natural Language Processing Techniques · Multimodal Machine Learning Applications