High Recall Data-to-text Generation with Progressive Edit

Choonghan Kim; Gary Geunbae Lee

arXiv:2208.04558·cs.CL·August 10, 2022

High Recall Data-to-text Generation with Progressive Edit

Choonghan Kim, Gary Geunbae Lee

PDF

Open Access

TL;DR

This paper introduces ProEdit, a progressive editing method that leverages asymmetric sentence generation to enhance data-to-text output recall, achieving state-of-the-art results on the ToTTo dataset.

Contribution

The paper proposes a novel progressive editing approach that exploits asymmetric generation phenomena to improve recall in data-to-text generation tasks.

Findings

01

ProEdit significantly improves recall in D2T generation.

02

Achieves state-of-the-art results on the ToTTo dataset.

03

Simple yet effective method for structured input coverage.

Abstract

Data-to-text (D2T) generation is the task of generating texts from structured inputs. We observed that when the same target sentence was repeated twice, Transformer (T5) based model generates an output made up of asymmetric sentences from structured inputs. In other words, these sentences were different in length and quality. We call this phenomenon "Asymmetric Generation" and we exploit this in D2T generation. Once asymmetric sentences are generated, we add the first part of the output with a no-repeated-target. As this goes through progressive edit (ProEdit), the recall increases. Hence, this method better covers structured inputs than before editing. ProEdit is a simple but effective way to improve performance in D2T generation and it achieves the new stateof-the-art result on the ToTTo dataset

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsTopic Modeling · Natural Language Processing Techniques · Software Engineering Research

MethodsMulti-Head Attention · Attention Is All You Need · Linear Layer · Dense Connections · Dropout · Softmax · Adam · Absolute Position Encodings · Label Smoothing · Position-Wise Feed-Forward Layer