GlobalBench: A Benchmark for Global Progress in Natural Language Processing
Yueqi Song, Catherine Cui, Simran Khanuja, Pengfei Liu, Fahim Faisal,, Alissa Ostapenko, Genta Indra Winata, Alham Fikri Aji, Samuel Cahyawijaya,, Yulia Tsvetkov, Antonios Anastasopoulos, Graham Neubig

TL;DR
GlobalBench is a dynamic, comprehensive benchmark that tracks progress and promotes equitable development of NLP technology across all languages, emphasizing utility and under-served languages.
Contribution
It introduces a continuously expanding benchmark that measures not only accuracy but also utility and equity, incentivizing research on under-represented languages.
Findings
Covers 966 datasets in 190 languages.
Includes 1,128 system submissions across 62 languages.
Identifies under-served languages with high populations.
Abstract
Despite the major advances in NLP, significant disparities in NLP system performance across languages still exist. Arguably, these are due to uneven resource allocation and sub-optimal incentives to work on less resourced languages. To track and further incentivize the global development of equitable language technology, we introduce GlobalBench. Prior multilingual benchmarks are static and have focused on a limited number of tasks and languages. In contrast, GlobalBench is an ever-expanding collection that aims to dynamically track progress on all NLP datasets in all languages. Rather than solely measuring accuracy, GlobalBench also tracks the estimated per-speaker utility and equity of technology across all languages, providing a multi-faceted view of how language technology is serving people of the world. Furthermore, GlobalBench is designed to identify the most under-served…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsNatural Language Processing Techniques · Topic Modeling · Speech and dialogue systems
