Same Claim, Different Judgment: Benchmarking Scenario-Induced Bias in Multilingual Financial Misinformation Detection

Zhiwei Liu; Yupen Cao; Yuechen Jiang; Mohsinul Kabir; Polydoros Giannouris; Chen Xu; Ziyang Xu; Tianlei Zhu; Md. Tariquzzaman; Triantafillos Papadopoulos; Yan Wang; Lingfei Qian; Xueqing Peng; Zhuohan Xie; Ye Yuan; Saeed Almheiri; Abdulrazzaq Alnajjar; Mingbin Chen; Harry Stuart; Paul Thompson; Prayag Tiwari; Alejandro Lopez-Lira; Xue Liu; Jimin Huang; Sophia Ananiadou

arXiv:2601.05403·cs.CL·April 21, 2026

Same Claim, Different Judgment: Benchmarking Scenario-Induced Bias in Multilingual Financial Misinformation Detection

Zhiwei Liu, Yupen Cao, Yuechen Jiang, Mohsinul Kabir, Polydoros Giannouris, Chen Xu, Ziyang Xu, Tianlei Zhu, Md. Tariquzzaman, Triantafillos Papadopoulos, Yan Wang, Lingfei Qian, Xueqing Peng, Zhuohan Xie, Ye Yuan, Saeed Almheiri, Abdulrazzaq Alnajjar, Mingbin Chen, Harry Stuart

PDF

1 Repo

TL;DR

This paper introduces MFMDScen, a benchmark for evaluating behavioral biases of multilingual LLMs in complex financial misinformation detection scenarios, revealing persistent biases across models.

Contribution

It presents a comprehensive, multilingual benchmark with diverse economic scenarios to systematically assess biases in financial misinformation detection by LLMs.

Findings

01

Behavioral biases are prevalent across both commercial and open-source LLMs.

02

The benchmark covers complex, multilingual financial scenarios involving ethnicity, region, and personality.

03

Persistent biases highlight the need for bias mitigation in financial LLM applications.

Abstract

Large language models (LLMs) have been widely applied across various domains of finance. Since their training data are largely derived from human-authored corpora, LLMs may inherit a range of human biases. Behavioral biases can lead to instability and uncertainty in decision-making, particularly when processing financial information. However, existing research on LLM bias has mainly focused on direct questioning or simplified, general-purpose settings, with limited consideration of the complex real-world financial environments and high-risk, context-sensitive, multilingual financial misinformation detection tasks MFMD. In this work, we propose MFMDScen, a comprehensive benchmark for evaluating behavioral biases of LLMs in MFMD across diverse economic scenarios. In collaboration with financial experts, we construct three types of complex financial scenarios: (i) role- and…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

lzw108/FMD
github

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.