Large Scale Multi-Lingual Multi-Modal Summarization Dataset
Yash Verma, Anubhav Jangra, Raghvendra Kumar, Sriparna Saha

TL;DR
This paper introduces M3LS, the largest multi-lingual multi-modal summarization dataset with over a million document-image pairs across 20 languages, enabling advanced research in multi-modal and multi-lingual summarization.
Contribution
It provides the first large-scale, diverse multi-lingual multi-modal dataset for summarization, along with formal task definition and baseline evaluations.
Findings
M3LS is the largest multi-lingual multi-modal summarization dataset to date.
Baseline models show varying performance across languages and modalities.
The dataset offers new challenges and opportunities for multi-modal, multi-lingual summarization research.
Abstract
Significant developments in techniques such as encoder-decoder models have enabled us to represent information comprising multiple modalities. This information can further enhance many downstream tasks in the field of information retrieval and natural language processing; however, improvements in multi-modal techniques and their performance evaluation require large-scale multi-modal data which offers sufficient diversity. Multi-lingual modeling for a variety of tasks like multi-modal summarization, text generation, and translation leverages information derived from high-quality multi-lingual annotated data. In this work, we present the current largest multi-lingual multi-modal summarization dataset (M3LS), and it consists of over a million instances of document-image pairs along with a professionally annotated multi-modal summary for each pair. It is derived from news articles published…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsTopic Modeling · Natural Language Processing Techniques · Multimodal Machine Learning Applications
