Controlling Formality in Low-Resource NMT with Domain Adaptation and Re-Ranking: SLT-CDT-UoS at IWSLT2022
Sebastian T. Vincent, Lo\"ic Barrault, Carolina Scarton

TL;DR
This paper presents a method for controlling formality in low-resource spoken language translation by combining domain adaptation, data engineering, and hypothesis re-ranking, achieving high accuracy in multiple language pairs.
Contribution
The authors introduce a novel approach that leverages language-independent data extraction and hypothesis re-ranking to improve formality control in low-resource NMT, including zero-shot scenarios.
Findings
Achieved 93.5% accuracy in English-German formality control
Reached 99.5% accuracy in English-Spanish formality control
Demonstrated effective zero-shot formality control with 59% and 66% accuracy
Abstract
This paper describes the SLT-CDT-UoS group's submission to the first Special Task on Formality Control for Spoken Language Translation, part of the IWSLT 2022 Evaluation Campaign. Our efforts were split between two fronts: data engineering and altering the objective function for best hypothesis selection. We used language-independent methods to extract formal and informal sentence pairs from the provided corpora; using English as a pivot language, we propagated formality annotations to languages treated as zero-shot in the task; we also further improved formality controlling with a hypothesis re-ranking approach. On the test sets for English-to-German and English-to-Spanish, we achieved an average accuracy of .935 within the constrained setting and .995 within unconstrained setting. In a zero-shot setting for English-to-Russian and English-to-Italian, we scored average accuracy of .590…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsNatural Language Processing Techniques · Text Readability and Simplification · Topic Modeling
