A Multi-task Multi-stage Transitional Training Framework for Neural Chat   Translation

Chulun Zhou; Yunlong Liang; Fandong Meng; Jie Zhou; Jinan Xu; Hongji; Wang; Min Zhang; Jinsong Su

arXiv:2301.11749·cs.CL·January 30, 2023

A Multi-task Multi-stage Transitional Training Framework for Neural Chat Translation

Chulun Zhou, Yunlong Liang, Fandong Meng, Jie Zhou, Jinan Xu, Hongji, Wang, Min Zhang, Jinsong Su

PDF

TL;DR

This paper introduces a multi-task multi-stage training framework for neural chat translation that leverages bilingual and monolingual dialogues, auxiliary tasks, and gradual transition strategies to improve translation quality in cross-lingual chats.

Contribution

It proposes a novel multi-stage training process with auxiliary tasks and a gradual transition strategy to enhance neural chat translation performance.

Findings

01

Outperforms existing models on two language pairs.

02

Effectively models dialogue coherence and speaker characteristics.

03

Alleviates training discrepancy between stages.

Abstract

Neural chat translation (NCT) aims to translate a cross-lingual chat between speakers of different languages. Existing context-aware NMT models cannot achieve satisfactory performances due to the following inherent problems: 1) limited resources of annotated bilingual dialogues; 2) the neglect of modelling conversational properties; 3) training discrepancy between different stages. To address these issues, in this paper, we propose a multi-task multi-stage transitional (MMT) training framework, where an NCT model is trained using the bilingual chat translation dataset and additional monolingual dialogues. We elaborately design two auxiliary tasks, namely utterance discrimination and speaker discrimination, to introduce the modelling of dialogue coherence and speaker characteristic into the NCT model. The training process consists of three stages: 1) sentence-level pre-training on…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.