Extrapolating Multilingual Understanding Models as Multilingual Generators
Bohong Wu, Fei Yuan, Hai Zhao, Lei Li, Jingjing Xu

TL;DR
This paper introduces a novel method called Semantic-Guided Alignment-then-Denoising (SGA) to adapt multilingual understanding models into effective multilingual generators, achieving significant improvements in various generation tasks.
Contribution
It proposes the SGA approach to transform encoder-based models into generators with minimal additional parameters, outperforming traditional initialization methods.
Findings
SGA outperforms initialization-based methods in translation and question/story generation.
XLM-R shows strong zero-shot performance but lags behind mBART in supervised tasks.
More research is needed to enhance understanding models for generation capabilities.
Abstract
Multilingual understanding models (or encoder-based), pre-trained via masked language modeling, have achieved promising results on many language understanding tasks (e.g., mBERT). However, these non-autoregressive (NAR) models still struggle to generate high-quality texts compared with autoregressive (AR) models. Considering that encoder-based models have the advantage of efficient generation and self-correction abilities, this paper explores methods to empower multilingual understanding models the generation abilities to get a unified model. Specifically, we start from a multilingual encoder (XLM-R) and propose a \textbf{S}emantic-\textbf{G}uided \textbf{A}lignment-then-Denoising (SGA) approach to adapt an encoder to a multilingual generator with a small number of new parameters. Experiments show that the proposed approach is an effective adaption method, outperforming widely-used…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsTopic Modeling · Natural Language Processing Techniques · Multimodal Machine Learning Applications
MethodsmBART · XLM-R
