Generating High-Quality Surface Realizations Using Data Augmentation and Factored Sequence Models
Henry Elder, Chris Hokamp

TL;DR
This paper introduces a novel approach for surface realization from obfuscated text by augmenting data with synthetic examples and employing preprocessing techniques, achieving state-of-the-art results in the 2018 Surface Realization shared task.
Contribution
It presents a new data augmentation method and preprocessing techniques that significantly improve surface realization performance.
Findings
Ranked first on all evaluation metrics in the 2018 Surface Realization shared task.
Synthetic data generation effectively addresses training data scarcity.
Preprocessing enhances model understanding of input structure.
Abstract
This work presents a new state of the art in reconstruction of surface realizations from obfuscated text. We identify the lack of sufficient training data as the major obstacle to training high-performing models, and solve this issue by generating large amounts of synthetic training data. We also propose preprocessing techniques which make the structure contained in the input features more accessible to sequence models. Our models were ranked first on all evaluation metrics in the English portion of the 2018 Surface Realization shared task.
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
