A Self-Supervised Automatic Post-Editing Data Generation Tool
Hyeonseok Moon, Chanjun Park, Sugyeong Eo, Jaehyung Seo, SeungJun Lee,, Heuiseok Lim

TL;DR
This paper introduces a self-supervised, web-deployable tool that automatically generates data for automatic post-editing, reducing human effort and enabling research across multiple language pairs.
Contribution
It presents a novel self-supervised data generation tool for APE that minimizes human supervision and supports multiple language pairs, expanding research possibilities.
Findings
Enables large-scale APE data creation with minimal human effort
Supports multiple language pairs including low-resource languages
Facilitates data-centric APE research and development
Abstract
Data building for automatic post-editing (APE) requires extensive and expert-level human effort, as it contains an elaborate process that involves identifying errors in sentences and providing suitable revisions. Hence, we develop a self-supervised data generation tool, deployable as a web application, that minimizes human supervision and constructs personalized APE data from a parallel corpus for several language pairs with English as the target language. Data-centric APE research can be conducted using this tool, involving many language pairs that have not been studied thus far owing to the lack of suitable data.
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsNatural Language Processing Techniques · Topic Modeling · Software Engineering Research
