WikiMulti: a Corpus for Cross-Lingual Summarization
Pavel Tikhonov, Valentin Malykh

TL;DR
WikiMulti is a new multilingual dataset derived from Wikipedia articles designed to facilitate research in cross-lingual summarization, enabling evaluation of existing models across 15 languages.
Contribution
The paper introduces WikiMulti, a comprehensive dataset for cross-lingual summarization across 15 languages, and provides baseline evaluations of current methods.
Findings
Existing models show varied performance across languages.
The dataset enables standardized benchmarking for CLS.
Baseline results highlight areas for future improvement.
Abstract
Cross-lingual summarization (CLS) is the task to produce a summary in one particular language for a source document in a different language. We introduce WikiMulti - a new dataset for cross-lingual summarization based on Wikipedia articles in 15 languages. As a set of baselines for further studies, we evaluate the performance of existing cross-lingual abstractive summarization methods on our dataset. We make our dataset publicly available here: https://github.com/tikhonovpavel/wikimulti
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsNatural Language Processing Techniques · Topic Modeling · Text Readability and Simplification
