TSMind: Alibaba and Soochow University's Submission to the WMT22 Translation Suggestion Task
Xin Ge, Ke Wang, Jiayi Wang, Nini Xiao, Xiangyu Duan, Yu Zhao, Yuqi, Zhang

TL;DR
This paper presents TSMind, a translation suggestion system by Alibaba and Soochow University, which leverages fine-tuning of large pre-trained models and data augmentation techniques, achieving top rankings in the WMT22 shared task.
Contribution
It introduces a novel data filtering approach using dual conditional cross-entropy and GPT-2 models to enhance data augmentation for translation suggestion.
Findings
Ranked first in three out of four language directions
Effective use of data filtering improves translation suggestion quality
Demonstrates success of fine-tuning large pre-trained models for TS tasks
Abstract
This paper describes the joint submission of Alibaba and Soochow University, TSMind, to the WMT 2022 Shared Task on Translation Suggestion (TS). We participate in the English-German and English-Chinese tasks. Basically, we utilize the model paradigm fine-tuning on the downstream tasks based on large-scale pre-trained models, which has recently achieved great success. We choose FAIR's WMT19 English-German news translation system and MBART50 for English-Chinese as our pre-trained models. Considering the task's condition of limited use of training data, we follow the data augmentation strategies proposed by WeTS to boost our TS model performance. The difference is that we further involve the dual conditional cross-entropy model and GPT-2 language model to filter augmented data. The leader board finally shows that our submissions are ranked first in three of four language directions in the…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsNatural Language Processing Techniques · Topic Modeling · Speech Recognition and Synthesis
MethodsMulti-Head Attention · Attention Is All You Need · Cosine Annealing · Linear Layer · Dropout · Byte Pair Encoding · Attention Dropout · Linear Warmup With Cosine Annealing · Dense Connections · Layer Normalization
