Sequence-to-Sequence Resources for Catalan
Ona de Gibert, Ksenia Kharitonova, Blanca Calvo Figueras, Jordi, Armengol-Estap\'e, Maite Melero

TL;DR
This paper introduces new sequence-to-sequence language resources for Catalan, including datasets for summarization and machine translation, along with baselines and a Catalan BART model to foster NLP development in the language.
Contribution
The work provides the first comprehensive Catalan datasets for summarization and translation, along with baseline models and open resources to advance Catalan NLP research.
Findings
New Catalan summarization datasets created
A parallel Catalan-English corpus with test sets developed
Baseline models established for summarization and translation
Abstract
In this work, we introduce sequence-to-sequence language resources for Catalan, a moderately under-resourced language, towards two tasks, namely: Summarization and Machine Translation (MT). We present two new abstractive summarization datasets in the domain of newswire. We also introduce a parallel Catalan-English corpus, paired with three different brand new test sets. Finally, we evaluate the data presented with competing state of the art models, and we develop baselines for these tasks using a newly created Catalan BART. We release the resulting resources of this work under open license to encourage the development of language technology in Catalan.
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsNatural Language Processing Techniques · Topic Modeling · Text Readability and Simplification
MethodsRefunds@Expedia|||How do I get a full refund from Expedia? · Attention Is All You Need · Linear Layer · Layer Normalization · Multi-Head Attention · Dense Connections · Byte Pair Encoding · Dropout · Adam · Residual Connection
