FinSafetyBench: Evaluating LLM Safety in Real-World Financial Scenarios

Yutao Hou; Yihan Jiang; Yuhan Xie; Jian Yang; Liwen Zhang; Hailiang Huang; Guanhua Chen; Yun Chen

arXiv:2605.00706·cs.CL·May 4, 2026

FinSafetyBench: Evaluating LLM Safety in Real-World Financial Scenarios

Yutao Hou, Yihan Jiang, Yuhan Xie, Jian Yang, Liwen Zhang, Hailiang Huang, Guanhua Chen, Yun Chen

PDF

TL;DR

FinSafetyBench is a bilingual benchmark designed to evaluate large language models' ability to refuse requests violating financial compliance, based on real-world cases and ethics standards.

Contribution

It introduces a comprehensive, real-world grounded benchmark for assessing LLM safety in financial scenarios, highlighting vulnerabilities and language-specific susceptibilities.

Findings

01

LLMs show vulnerabilities to adversarial prompts bypassing safeguards.

02

Chinese-language LLMs are more susceptible to compliance violations.

03

Prompt-level defenses have limited effectiveness against sophisticated manipulations.

Abstract

Large language models (LLMs) are increasingly applied in financial scenarios. However, they may produce harmful outputs, including facilitating illegal activities or unethical behavior, posing serious compliance risks. To systematically evaluate LLM safety in finance, we propose FinSafetyBench, a bilingual (English-Chinese) red-teaming benchmark designed to test an LLM's refusal of requests that violate financial compliance. Grounded in real-world financial crime cases and ethics standards, the benchmark comprises 14 subcategories spanning financial crimes and ethical violations. Through extensive experiments on general-purpose and finance-specialized LLMs under three representative attack settings, we identify critical vulnerabilities that allow adversarial prompts to bypass compliance safeguards. Further analysis reveals stronger susceptibility in Chinese contexts and highlights the…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.