GlobeSumm: A Challenging Benchmark Towards Unifying Multi-lingual, Cross-lingual and Multi-document News Summarization
Yangfan Ye, Xiachong Feng, Xiaocheng Feng, Weitao Ma, Libo Qin,, Dongliang Xu, Qing Yang, Hongtao Liu, Bing Qin

TL;DR
This paper introduces GLOBESUMM, a comprehensive benchmark dataset for multi-lingual, cross-lingual, and multi-document news summarization, addressing real-world complexities and enabling better evaluation of language models.
Contribution
The paper presents a new unified task MCMS, constructs the GLOBESUMM dataset with event-centric multilingual news, and proposes protocol-guided prompting for high-quality annotations.
Findings
GLOBESUMM captures real-world summarization challenges.
Experimental results validate dataset quality and task complexity.
Highlights the need for advanced models to handle conflicts and redundancies.
Abstract
News summarization in today's global scene can be daunting with its flood of multilingual content and varied viewpoints from different sources. However, current studies often neglect such real-world scenarios as they tend to focus solely on either single-language or single-document tasks. To bridge this gap, we aim to unify Multi-lingual, Cross-lingual and Multi-document Summarization into a novel task, i.e., MCMS, which encapsulates the real-world requirements all-in-one. Nevertheless, the lack of a benchmark inhibits researchers from adequately studying this invaluable problem. To tackle this, we have meticulously constructed the GLOBESUMM dataset by first collecting a wealth of multilingual news reports and restructuring them into event-centric format. Additionally, we introduce the method of protocol-guided prompting for high-quality and cost-effective reference annotation. In MCMS,…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
Taxonomy
TopicsNatural Language Processing Techniques · Topic Modeling · Advanced Text Analysis Techniques
MethodsFocus
