BARThez: a Skilled Pretrained French Sequence-to-Sequence Model

Moussa Kamal Eddine; Antoine J.-P. Tixier; Michalis Vazirgiannis

arXiv:2010.12321·cs.CL·February 10, 2021

BARThez: a Skilled Pretrained French Sequence-to-Sequence Model

Moussa Kamal Eddine, Antoine J.-P. Tixier, Michalis Vazirgiannis

PDF

5 Repos 8 Models 5 Datasets

TL;DR

This paper introduces BARThez, a large-scale pretrained French sequence-to-sequence model based on BART, demonstrating strong performance on discriminative and generative NLP tasks, and enhancing generative capabilities through continued pretraining.

Contribution

The paper presents BARThez, the first large-scale pretrained French seq2seq model, and shows its effectiveness on various NLP tasks, along with improvements via multilingual BART pretraining.

Findings

01

BARThez is competitive with French BERT-based models.

02

Continued pretraining of multilingual BART improves generative performance.

03

BARThez performs well on both discriminative and generative tasks.

Abstract

Inductive transfer learning has taken the entire NLP field by storm, with models such as BERT and BART setting new state of the art on countless NLU tasks. However, most of the available models and research have been conducted for English. In this work, we introduce BARThez, the first large-scale pretrained seq2seq model for French. Being based on BART, BARThez is particularly well-suited for generative tasks. We evaluate BARThez on five discriminative tasks from the FLUE benchmark and two generative tasks from a novel summarization dataset, OrangeSum, that we created for this research. We show BARThez to be very competitive with state-of-the-art BERT-based French language models such as CamemBERT and FlauBERT. We also continue the pretraining of a multilingual BART on BARThez' corpus, and show our resulting model, mBARThez, to significantly boost BARThez' generative performance. Code,…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

Models

Datasets

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

MethodsLinear Layer · mBARTHez · Sigmoid Activation · Tanh Activation · Long Short-Term Memory · Sequence to Sequence · Byte Pair Encoding · Adam · Softmax · Layer Normalization