mmT5: Modular Multilingual Pre-Training Solves Source Language   Hallucinations

Jonas Pfeiffer; Francesco Piccinno; Massimo Nicosia; Xinyi Wang,; Machel Reid; Sebastian Ruder

arXiv:2305.14224·cs.CL·May 24, 2023·2 cites

mmT5: Modular Multilingual Pre-Training Solves Source Language Hallucinations

Jonas Pfeiffer, Francesco Piccinno, Massimo Nicosia, Xinyi Wang,, Machel Reid, Sebastian Ruder

PDF

Open Access

TL;DR

mmT5 is a modular multilingual model that significantly reduces language hallucinations and improves zero-shot language generation accuracy across 40+ languages by disentangling language-specific and language-agnostic information.

Contribution

The paper introduces mmT5, a modular pre-training approach with language-specific modules that enhances multilingual generation and reduces hallucinations compared to existing models.

Findings

01

Zero-shot language correctness increased from 7% to 99%.

02

Outperforms mT5 on natural language understanding and generation tasks.

03

Effectively addresses source language hallucination in multilingual models.

Abstract

Multilingual sequence-to-sequence models perform poorly with increased language coverage and fail to consistently generate text in the correct target language in few-shot settings. To address these challenges, we propose mmT5, a modular multilingual sequence-to-sequence model. mmT5 utilizes language-specific modules during pre-training, which disentangle language-specific information from language-agnostic information. We identify representation drift during fine-tuning as a key limitation of modular generative models and develop strategies that enable effective zero-shot transfer. Our model outperforms mT5 at the same parameter sizes by a large margin on representative natural language understanding and generation tasks in 40+ languages. Compared to mT5, mmT5 raises the rate of generating text in the correct language under zero-shot settings from 7% to 99%, thereby greatly alleviating…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsTopic Modeling · Natural Language Processing Techniques · Text Readability and Simplification

MethodsGated Linear Unit · Attention Is All You Need · fail · Softmax · Layer Normalization · Inverse Square Root Schedule · Byte Pair Encoding · Dropout · Linear Layer · Attention Dropout