IndicBART: A Pre-trained Model for Indic Natural Language Generation

Raj Dabre; Himani Shrotriya; Anoop Kunchukuttan; Ratish; Puduppully; Mitesh M. Khapra; Pratyush Kumar

arXiv:2109.02903·cs.CL·October 28, 2022

IndicBART: A Pre-trained Model for Indic Natural Language Generation

Raj Dabre, Himani Shrotriya, Anoop Kunchukuttan, Ratish, Puduppully, Mitesh M. Khapra, Pratyush Kumar

PDF

3 Repos 4 Models

TL;DR

IndicBART is a compact, multilingual pre-trained model tailored for Indic languages that leverages script similarities to enhance natural language generation tasks like translation and summarization, especially in low-resource settings.

Contribution

This paper introduces IndicBART, a novel pre-trained sequence-to-sequence model specifically designed for Indic languages, utilizing script sharing to improve transfer learning and performance.

Findings

01

IndicBART performs competitively with larger models like mBART50.

02

It excels in low-resource translation scenarios.

03

Script sharing and multilingual training enhance model efficiency.

Abstract

In this paper, we study pre-trained sequence-to-sequence models for a group of related languages, with a focus on Indic languages. We present IndicBART, a multilingual, sequence-to-sequence pre-trained model focusing on 11 Indic languages and English. IndicBART utilizes the orthographic similarity between Indic scripts to improve transfer learning between similar Indic languages. We evaluate IndicBART on two NLG tasks: Neural Machine Translation (NMT) and extreme summarization. Our experiments on NMT and extreme summarization show that a model specific to related languages like IndicBART is competitive with large pre-trained models like mBART50 despite being significantly smaller. It also performs well on very low-resource translation scenarios where languages are not included in pre-training or fine-tuning. Script sharing, multilingual training, and better utilization of limited model…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

Models

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.