XDLM: Cross-lingual Diffusion Language Model for Machine Translation
Linyao Chen, Aosong Feng, Boming Yang, Zihui Li

TL;DR
XDLM introduces a novel cross-lingual diffusion model for machine translation, leveraging a new training objective and outperforming existing diffusion and Transformer models on multiple benchmarks.
Contribution
The paper presents XDLM, the first cross-lingual diffusion model for machine translation, with a new pretraining objective TLDM and a fine-tuning approach, advancing cross-lingual NLP capabilities.
Findings
Outperforms diffusion and Transformer baselines on benchmarks
Introduces TLDM training objective for cross-lingual mapping
Demonstrates effectiveness of diffusion models in machine translation
Abstract
Recently, diffusion models have excelled in image generation tasks and have also been applied to neural language processing (NLP) for controllable text generation. However, the application of diffusion models in a cross-lingual setting is less unexplored. Additionally, while pretraining with diffusion models has been studied within a single language, the potential of cross-lingual pretraining remains understudied. To address these gaps, we propose XDLM, a novel Cross-lingual diffusion model for machine translation, consisting of pretraining and fine-tuning stages. In the pretraining stage, we propose TLDM, a new training objective for mastering the mapping between different languages; in the fine-tuning stage, we build up the translation system based on the pretrained model. We evaluate the result on several machine translation benchmarks and outperformed both diffusion and Transformer…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsNatural Language Processing Techniques · Topic Modeling
MethodsMulti-Head Attention · Attention Is All You Need · Linear Layer · Absolute Position Encodings · Softmax · Dense Connections · Dropout · Byte Pair Encoding · Position-Wise Feed-Forward Layer · Residual Connection
