AraBART: a Pretrained Arabic Sequence-to-Sequence Model for Abstractive Summarization
Moussa Kamal Eddine, Nadi Tomeh, Nizar Habash, Joseph Le Roux,, Michalis Vazirgiannis

TL;DR
AraBART is the first end-to-end pretrained Arabic sequence-to-sequence model based on BART, significantly improving abstractive summarization performance over existing models and baselines.
Contribution
Introducing AraBART, the first fully pretrained Arabic BART-based model for abstractive summarization, outperforming existing Arabic and multilingual models.
Findings
Achieves state-of-the-art results on multiple Arabic summarization datasets.
Outperforms Arabic BERT-based, mBART, and mT5 models.
Demonstrates the effectiveness of end-to-end pretraining for Arabic summarization.
Abstract
Like most natural language understanding and generation tasks, state-of-the-art models for summarization are transformer-based sequence-to-sequence architectures that are pretrained on large corpora. While most existing models focused on English, Arabic remained understudied. In this paper we propose AraBART, the first Arabic model in which the encoder and the decoder are pretrained end-to-end, based on BART. We show that AraBART achieves the best performance on multiple abstractive summarization datasets, outperforming strong baselines including a pretrained Arabic BERT-based model and multilingual mBART and mT5 models.
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsTopic Modeling · Natural Language Processing Techniques · Advanced Text Analysis Techniques
MethodsMulti-Head Attention · Linear Layer · SentencePiece · Adafactor · Refunds@Expedia|||How do I get a full refund from Expedia? · Dropout · Layer Normalization · Adam · Inverse Square Root Schedule · Byte Pair Encoding
