Optimizing Deep Transformers for Chinese-Thai Low-Resource Translation
Wenjie Hao, Hongfei Xu, Lingling Mu, Hongying Zan

TL;DR
This paper enhances Chinese-Thai low-resource translation by optimizing deep Transformer models, increasing layers to 24, and achieving state-of-the-art results in constrained evaluation.
Contribution
It introduces a deep 24-layer Transformer model optimized for low-resource Chinese-Thai translation, improving performance over previous methods.
Findings
24-layer Transformer outperforms shallower models
Achieved state-of-the-art Chinese-Thai translation results
Optimal experiment settings identified for low-resource scenarios
Abstract
In this paper, we study the use of deep Transformer translation model for the CCMT 2022 Chinese-Thai low-resource machine translation task. We first explore the experiment settings (including the number of BPE merge operations, dropout probability, embedding size, etc.) for the low-resource scenario with the 6-layer Transformer. Considering that increasing the number of layers also increases the regularization on new model parameters (dropout modules are also introduced when using more layers), we adopt the highest performance setting but increase the depth of the Transformer to 24 layers to obtain improved translation quality. Our work obtains the SOTA performance in the Chinese-to-Thai translation in the constrained evaluation.
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
MethodsMulti-Head Attention · Attention Is All You Need · Linear Layer · Absolute Position Encodings · Layer Normalization · Adam · Byte Pair Encoding · Residual Connection · Label Smoothing · Position-Wise Feed-Forward Layer
