M2DS: Multilingual Dataset for Multi-document Summarisation
Kushan Hewapathirana (1, 2), Nisansa de Silva (1), C.D. Athuraliya, (2) ((1) Department of Computer Science & Engineering, University of, Moratuwa, Sri Lanka, (2) ConscientAI, Sri Lanka)

TL;DR
This paper introduces M2DS, the first multilingual dataset for multi-document summarisation, covering five languages from BBC articles to promote inclusive research in diverse linguistic contexts.
Contribution
The paper presents M2DS, a novel multilingual dataset for MDS, filling a significant gap in non-English datasets and providing baseline evaluations for future research.
Findings
M2DS includes document-summary pairs in five languages.
Baseline models achieve varying performance across languages.
The dataset promotes inclusive multilingual MDS research.
Abstract
In the rapidly evolving digital era, there is an increasing demand for concise information as individuals seek to distil key insights from various sources. Recent attention from researchers on Multi-document Summarisation (MDS) has resulted in diverse datasets covering customer reviews, academic papers, medical and legal documents, and news articles. However, the English-centric nature of these datasets has created a conspicuous void for multilingual datasets in today's globalised digital landscape, where linguistic diversity is celebrated. Media platforms such as British Broadcasting Corporation (BBC) have disseminated news in 20+ languages for decades. With only 380 million people speaking English natively as their first language, accounting for less than 5% of the global population, the vast majority primarily relies on other languages. These facts underscore the need for inclusivity…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsTopic Modeling · Natural Language Processing Techniques · Advanced Text Analysis Techniques
MethodsSoftmax · Attention Is All You Need
