Translation of Multifaceted Data without Re-Training of Machine Translation Systems
Hyeonseok Moon, Seungyoon Lee, Seongtae Hong, Seungjun Lee, Chanjun, Park, Heuiseok Lim

TL;DR
This paper introduces a novel machine translation pipeline that preserves intra-data relations by concatenating data components, improving translation quality and downstream task performance without re-training existing systems.
Contribution
It proposes a new MT approach using Catalyst Statement and Indicator Token to maintain intra-data relations, enhancing translation and training data effectiveness.
Findings
Improved translation quality over conventional methods.
Enhanced downstream task performance in WPR and QG.
No re-training of existing MT systems required.
Abstract
Translating major language resources to build minor language resources becomes a widely-used approach. Particularly in translating complex data points composed of multiple components, it is common to translate each component separately. However, we argue that this practice often overlooks the interrelation between components within the same data point. To address this limitation, we propose a novel MT pipeline that considers the intra-data relation in implementing MT for training data. In our MT pipeline, all the components in a data point are concatenated to form a single translation sequence and subsequently reconstructed to the data components after translation. We introduce a Catalyst Statement (CS) to enhance the intra-data relation, and Indicator Token (IT) to assist the decomposition of a translated sequence into its respective data components. Through our approach, we have…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
Taxonomy
TopicsNatural Language Processing Techniques
