Data-to-text Generation by Splicing Together Nearest Neighbors

Sam Wiseman; Arturs Backurs; Karl Stratos

arXiv:2101.08248·cs.CL·November 1, 2021

Data-to-text Generation by Splicing Together Nearest Neighbors

Sam Wiseman, Arturs Backurs, Karl Stratos

PDF

Open Access 1 Repo

TL;DR

This paper introduces a novel data-to-text generation method that splices retrieved text segments using learned policies, resulting in more interpretable and controllable outputs while maintaining competitive quality.

Contribution

It presents a new approach that directly manipulates neighbor text segments with learned policies, contrasting with token-by-token generation methods.

Findings

01

Policies perform on par with strong baselines in automatic evaluation.

02

Generated texts are more interpretable and controllable.

03

The method reduces derivation finding to weighted context-free grammar parsing.

Abstract

We propose to tackle data-to-text generation tasks by directly splicing together retrieved segments of text from "neighbor" source-target pairs. Unlike recent work that conditions on retrieved neighbors but generates text token-by-token, left-to-right, we learn a policy that directly manipulates segments of neighbor text, by inserting or replacing them in partially constructed generations. Standard techniques for training such a policy require an oracle derivation for each generation, and we prove that finding the shortest such derivation can be reduced to parsing under a particular weighted context-free grammar. We find that policies learned in this way perform on par with strong baselines in terms of automatic and human evaluation, but allow for more interpretable and controllable generation.

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

swiseman/neighbor-splicing
pytorchOfficial

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsTopic Modeling · Natural Language Processing Techniques · Multimodal Machine Learning Applications