Data-Driven Approach for Formality-Sensitive Machine Translation:   Language-Specific Handling and Synthetic Data Generation

Seugnjun Lee; Hyeonseok Moon; Chanjun Park; Heuiseok Lim

arXiv:2306.14514·cs.CL·June 28, 2023

Data-Driven Approach for Formality-Sensitive Machine Translation: Language-Specific Handling and Synthetic Data Generation

Seugnjun Lee, Hyeonseok Moon, Chanjun Park, Heuiseok Lim

PDF

Open Access

TL;DR

This paper presents a data-driven method for formality-sensitive machine translation that leverages language-specific data handling and synthetic data generation with large language models, significantly improving translation quality.

Contribution

It introduces a novel combination of language-specific data handling and synthetic data generation techniques for FSMT, enhancing translation performance.

Findings

01

Significant improvement over baseline models

02

Effective use of large-scale language models for synthetic data

03

Prompt engineering enhances translation quality

Abstract

In this paper, we introduce a data-driven approach for Formality-Sensitive Machine Translation (FSMT) that caters to the unique linguistic properties of four target languages. Our methodology centers on two core strategies: 1) language-specific data handling, and 2) synthetic data generation using large-scale language models and empirical prompt engineering. This approach demonstrates a considerable improvement over the baseline, highlighting the effectiveness of data-centric techniques. Our prompt engineering strategy further improves performance by producing superior synthetic translation examples.

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsNatural Language Processing Techniques · Topic Modeling · Text Readability and Simplification