Data-to-text Generation by Splicing Together Nearest Neighbors
Sam Wiseman, Arturs Backurs, Karl Stratos

TL;DR
This paper introduces a novel data-to-text generation method that splices retrieved text segments using learned policies, resulting in more interpretable and controllable outputs while maintaining competitive quality.
Contribution
It presents a new approach that directly manipulates neighbor text segments with learned policies, contrasting with token-by-token generation methods.
Findings
Policies perform on par with strong baselines in automatic evaluation.
Generated texts are more interpretable and controllable.
The method reduces derivation finding to weighted context-free grammar parsing.
Abstract
We propose to tackle data-to-text generation tasks by directly splicing together retrieved segments of text from "neighbor" source-target pairs. Unlike recent work that conditions on retrieved neighbors but generates text token-by-token, left-to-right, we learn a policy that directly manipulates segments of neighbor text, by inserting or replacing them in partially constructed generations. Standard techniques for training such a policy require an oracle derivation for each generation, and we prove that finding the shortest such derivation can be reduced to parsing under a particular weighted context-free grammar. We find that policies learned in this way perform on par with strong baselines in terms of automatic and human evaluation, but allow for more interpretable and controllable generation.
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsTopic Modeling · Natural Language Processing Techniques · Multimodal Machine Learning Applications
