XProvence: Zero-Cost Multilingual Context Pruning for Retrieval-Augmented Generation

Youssef Mohamed; Mohamed Elhoseiny; Thibault Formal; Nadezhda Chirkova

arXiv:2601.18886·cs.IR·January 28, 2026

XProvence: Zero-Cost Multilingual Context Pruning for Retrieval-Augmented Generation

Youssef Mohamed, Mohamed Elhoseiny, Thibault Formal, Nadezhda Chirkova

PDF

Open Access 2 Models

TL;DR

XProvence is a multilingual zero-cost context pruning method for retrieval-augmented generation that maintains high performance across 16 languages and supports over 100 languages through cross-lingual transfer, improving efficiency without accuracy loss.

Contribution

It extends the Provence framework to multiple languages, enabling effective zero-cost context pruning in multilingual RAG systems with minimal performance impact.

Findings

01

XProvence effectively prunes contexts with minimal performance loss.

02

Outperforms strong baselines on multilingual QA benchmarks.

03

Supports over 100 languages through cross-lingual transfer.

Abstract

This paper introduces XProvence, a multilingual zero-cost context pruning model for retrieval-augmented generation (RAG), trained on 16 languages and supporting 100+ languages through effective cross-lingual transfer. Motivated by the growing use of RAG systems across diverse languages, we explore several strategies to generalize the Provence framework-which first integrated efficient zero-cost context pruning directly into the re-ranking model-beyond English. Across four multilingual question answering benchmarks, we show how XProvence can prune RAG contexts with minimal-to-no performance degradation and outperforms strong baselines. Our model is available at https://huggingface.co/naver/xprovence-reranker-bgem3-v2.

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Models

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsTopic Modeling · Information Retrieval and Search Behavior · Multimodal Machine Learning Applications