Contamination Report for Multilingual Benchmarks

Sanchit Ahuja; Varun Gumma; Sunayana Sitaram

arXiv:2410.16186·cs.CL·October 22, 2024

Contamination Report for Multilingual Benchmarks

Sanchit Ahuja, Varun Gumma, Sunayana Sitaram

PDF

Open Access

TL;DR

This paper investigates the contamination of popular multilingual benchmarks in large language models, revealing widespread issues that affect the reliability of multilingual evaluation results.

Contribution

It introduces a systematic analysis of benchmark contamination in multilingual LLMs using the Black Box test, highlighting prevalent data leakage problems.

Findings

01

Almost all tested models show signs of benchmark contamination.

02

Most benchmarks are contaminated across multiple models.

03

Contamination impacts the validity of multilingual evaluation results.

Abstract

Benchmark contamination refers to the presence of test datasets in Large Language Model (LLM) pre-training or post-training data. Contamination can lead to inflated scores on benchmarks, compromising evaluation results and making it difficult to determine the capabilities of models. In this work, we study the contamination of popular multilingual benchmarks in LLMs that support multiple languages. We use the Black Box test to determine whether $7$ frequently used multilingual benchmarks are contaminated in $7$ popular open and closed LLMs and find that almost all models show signs of being contaminated with almost all the benchmarks we test. Our findings can help the community determine the best set of benchmarks to use for multilingual evaluation.

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsInterpreting and Communication in Healthcare · International Environmental Law and Policies

MethodsSparse Evolutionary Training