Neural data-to-text generation: A comparison between pipeline and   end-to-end architectures

Thiago Castro Ferreira; Chris van der Lee; Emiel van Miltenburg; Emiel; Krahmer

arXiv:1908.09022·cs.CL·November 28, 2019

Neural data-to-text generation: A comparison between pipeline and end-to-end architectures

Thiago Castro Ferreira, Chris van der Lee, Emiel van Miltenburg, Emiel, Krahmer

PDF

1 Repo 1 Datasets

TL;DR

This paper compares traditional pipeline and modern end-to-end neural approaches for data-to-text generation from RDF triples, finding that pipeline models produce higher quality and better generalization.

Contribution

It provides a systematic comparison of pipeline and end-to-end neural architectures using state-of-the-art methods, with comprehensive evaluations.

Findings

01

Pipeline models generate more accurate and coherent texts.

02

Pipeline approaches generalize better to unseen data.

03

End-to-end models are less effective in maintaining quality.

Abstract

Traditionally, most data-to-text applications have been designed using a modular pipeline architecture, in which non-linguistic input data is converted into natural language through several intermediate transformations. In contrast, recent neural models for data-to-text generation have been proposed as end-to-end approaches, where the non-linguistic input is rendered in natural language with much less explicit intermediate representations in-between. This study introduces a systematic comparison between neural pipeline and end-to-end data-to-text approaches for the generation of text from RDF triples. Both architectures were implemented making use of state-of-the art deep learning methods as the encoder-decoder Gated-Recurrent Units (GRU) and Transformer. Automatic and human evaluations together with a qualitative analysis suggest that having explicit intermediate steps in the…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

ThiagoCF05/webnlg
noneOfficial

Datasets

GEM/dart
dataset· 117 dl
117 dl

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

MethodsLinear Layer · Absolute Position Encodings · Position-Wise Feed-Forward Layer · Residual Connection · Adam · *Communicated@Fast*How Do I Communicate to Expedia? · Dropout · Multi-Head Attention · Byte Pair Encoding · Dense Connections