Is Translation Helpful? An Empirical Analysis of Cross-Lingual Transfer in Low-Resource Dialog Generation
Lei Shen, Shuai Yu, Xiaoyu Shen

TL;DR
This paper investigates the effectiveness of machine translation in cross-lingual dialog generation, revealing that using original English data often outperforms translated data due to cultural and linguistic biases.
Contribution
It provides empirical evidence that directly using English dialog data is more beneficial than translated versions for low-resource Chinese dialog tasks, challenging common assumptions.
Findings
English data improves dialog naturalness and relevance in Chinese
Translated data can introduce cultural biases and unnaturalness
Using original English data is recommended over translation for better transfer
Abstract
Cross-lingual transfer is important for developing high-quality chatbots in multiple languages due to the strongly imbalanced distribution of language resources. A typical approach is to leverage off-the-shelf machine translation (MT) systems to utilize either the training corpus or developed models from high-resource languages. In this work, we investigate whether it is helpful to utilize MT at all in this task. To do so, we simulate a low-resource scenario assuming access to limited Chinese dialog data in the movie domain and large amounts of English dialog data from multiple domains. Experiments show that leveraging English dialog corpora can indeed improve the naturalness, relevance and cross-domain transferability in Chinese. However, directly using English dialog corpora in its original form, surprisingly, is better than using its translated version. As the topics and wording…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsTopic Modeling · Speech and dialogue systems · Natural Language Processing Techniques
