When Life Gives You Samples: The Benefits of Scaling up Inference Compute for Multilingual LLMs

Ammar Khairi; Daniel D'souza; Ye Shen; Julia Kreutzer; Sara Hooker

arXiv:2506.20544·cs.CL·June 26, 2025

When Life Gives You Samples: The Benefits of Scaling up Inference Compute for Multilingual LLMs

Ammar Khairi, Daniel D'souza, Ye Shen, Julia Kreutzer, Sara Hooker

PDF

Open Access 1 Datasets 1 Video

TL;DR

This paper demonstrates that adapting sampling and selection strategies for inference in multilingual large language models significantly improves performance across diverse languages and tasks, especially at larger scales.

Contribution

It introduces novel multilingual and multi-task inference strategies that outperform existing methods, enabling better performance without retraining.

Findings

01

Sampling and selection strategies must be tailored for different languages and domains.

02

Proposed methods achieve up to +9.0 win-rate improvements at large scale.

03

Strategies improve performance across open-ended, formal, and multilingual tasks.

Abstract

Recent advancements in large language models (LLMs) have shifted focus toward scaling inference-time compute, improving performance without retraining the model. A common approach is to sample multiple outputs in parallel, and select one of these as the final output. However, work to date has focused on English and a handful of domains such as math and code. In contrast, we are most interested in techniques that generalize across open-ended tasks, formally verifiable tasks, and across languages. In this work, we study how to robustly scale inference-time compute for open-ended generative tasks in a multilingual, multi-task setting. Our findings show that both sampling strategy based on temperature variation and selection strategy must be adapted to account for diverse domains and varied language settings. We evaluate existing selection methods, revealing that strategies effective in…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Datasets

CohereLabs/m-ArenaHard-v2.0
dataset· 427 dl
427 dl

Videos

When Life Gives You Samples: The Benefits of Scaling up Inference Compute for Multilingual LLMs· underline

Taxonomy

TopicsNatural Language Processing Techniques

MethodsFocus