Models and Datasets for Cross-Lingual Summarisation

Laura Perez-Beltrachini; Mirella Lapata

arXiv:2202.09583·cs.CL·February 22, 2022

Models and Datasets for Cross-Lingual Summarisation

Laura Perez-Beltrachini, Mirella Lapata

PDF

1 Repo 2 Datasets

TL;DR

This paper introduces a new cross-lingual summarisation dataset covering twelve language pairs, derived from Wikipedia, and evaluates multilingual models across various scenarios, advancing research in multilingual NLP.

Contribution

The paper provides a novel multilingual cross-lingual summarisation dataset and analysis methodology, applicable to multiple languages and scenarios, with experimental validation using pre-trained models.

Findings

01

Effective cross-lingual summarisation achieved with multilingual models.

02

Dataset enables evaluation in supervised, zero-shot, and out-of-domain settings.

03

Human validation confirms dataset quality and task relevance.

Abstract

We present a cross-lingual summarisation corpus with long documents in a source language associated with multi-sentence summaries in a target language. The corpus covers twelve language pairs and directions for four European languages, namely Czech, English, French and German, and the methodology for its creation can be applied to several other languages. We derive cross-lingual document-summary instances from Wikipedia by combining lead paragraphs and articles' bodies from language aligned Wikipedia titles. We analyse the proposed cross-lingual summarisation task with automatic metrics and validate it with a human study. To illustrate the utility of our dataset we report experiments with multi-lingual pre-trained models in supervised, zero- and few-shot, and out-of-domain scenarios.

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

lauhaide/clads
pytorchOfficial

Datasets

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.