Multilingual Retrieval Augmented Generation for Culturally-Sensitive Tasks: A Benchmark for Cross-lingual Robustness
Bryan Li, Fiona Luo, Samar Haider, Adwait Agashe, Tammy Li, Runqi Liu, Muqing Miao, Shriya Ramakrishnan, Yuan Yuan, Chris Callison-Burch

TL;DR
This paper introduces BordIRLines, a multilingual dataset for evaluating retrieval-augmented generation (RAG) in culturally-sensitive tasks, demonstrating that multilingual retrieval enhances robustness and reduces bias in large language models.
Contribution
The paper presents BordIRLines, a new multilingual dataset for RAG evaluation, and analyzes how cross-lingual retrieval impacts model robustness and bias in sensitive contexts.
Findings
Multilingual retrieval improves response consistency.
Retrieving diverse language documents reduces geopolitical bias.
Low-resource languages show greater variance in response citations.
Abstract
The paradigm of retrieval-augmented generated (RAG) helps mitigate hallucinations of large language models (LLMs). However, RAG also introduces biases contained within the retrieved documents. These biases can be amplified in scenarios which are multilingual and culturally-sensitive, such as territorial disputes. We thus introduce BordIRLines, a dataset of territorial disputes paired with retrieved Wikipedia documents, across 49 languages. We evaluate the cross-lingual robustness of this RAG setting by formalizing several modes for multilingual retrieval. Our experiments on several LLMs show that incorporating perspectives from diverse languages can in fact improve robustness; retrieving multilingual documents best improves response consistency and decreases geopolitical bias over RAG with purely in-language documents. We also consider how RAG responses utilize presented documents,…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
Taxonomy
TopicsMultimodal Machine Learning Applications · Topic Modeling · Natural Language Processing Techniques
MethodsAttention Is All You Need · Attention Dropout · WordPiece · Linear Warmup With Linear Decay · Linear Layer · Weight Decay · Byte Pair Encoding · BERT · Softmax · Dropout
