Multilingual Retrieval Augmented Generation for Culturally-Sensitive Tasks: A Benchmark for Cross-lingual Robustness

Bryan Li; Fiona Luo; Samar Haider; Adwait Agashe; Tammy Li; Runqi Liu; Muqing Miao; Shriya Ramakrishnan; Yuan Yuan; Chris Callison-Burch

arXiv:2410.01171·cs.CL·June 24, 2025

Multilingual Retrieval Augmented Generation for Culturally-Sensitive Tasks: A Benchmark for Cross-lingual Robustness

Bryan Li, Fiona Luo, Samar Haider, Adwait Agashe, Tammy Li, Runqi Liu, Muqing Miao, Shriya Ramakrishnan, Yuan Yuan, Chris Callison-Burch

PDF

Open Access 1 Repo 1 Datasets 1 Video

TL;DR

This paper introduces BordIRLines, a multilingual dataset for evaluating retrieval-augmented generation (RAG) in culturally-sensitive tasks, demonstrating that multilingual retrieval enhances robustness and reduces bias in large language models.

Contribution

The paper presents BordIRLines, a new multilingual dataset for RAG evaluation, and analyzes how cross-lingual retrieval impacts model robustness and bias in sensitive contexts.

Findings

01

Multilingual retrieval improves response consistency.

02

Retrieving diverse language documents reduces geopolitical bias.

03

Low-resource languages show greater variance in response citations.

Abstract

The paradigm of retrieval-augmented generated (RAG) helps mitigate hallucinations of large language models (LLMs). However, RAG also introduces biases contained within the retrieved documents. These biases can be amplified in scenarios which are multilingual and culturally-sensitive, such as territorial disputes. We thus introduce BordIRLines, a dataset of territorial disputes paired with retrieved Wikipedia documents, across 49 languages. We evaluate the cross-lingual robustness of this RAG setting by formalizing several modes for multilingual retrieval. Our experiments on several LLMs show that incorporating perspectives from diverse languages can in fact improve robustness; retrieving multilingual documents best improves response consistency and decreases geopolitical bias over RAG with purely in-language documents. We also consider how RAG responses utilize presented documents,…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

manestay/bordirlines
noneOfficial

Datasets

borderlines/bordirlines
dataset· 15 dl
15 dl

Videos

Multilingual Retrieval Augmented Generation for Culturally-Sensitive Tasks: A Benchmark for Cross-lingual Robustness· underline

Taxonomy

TopicsMultimodal Machine Learning Applications · Topic Modeling · Natural Language Processing Techniques

MethodsAttention Is All You Need · Attention Dropout · WordPiece · Linear Warmup With Linear Decay · Linear Layer · Weight Decay · Byte Pair Encoding · BERT · Softmax · Dropout