FinSafetyBench: Evaluating LLM Safety in Real-World Financial Scenarios
Yutao Hou, Yihan Jiang, Yuhan Xie, Jian Yang, Liwen Zhang, Hailiang Huang, Guanhua Chen, Yun Chen

TL;DR
FinSafetyBench is a bilingual benchmark designed to evaluate large language models' ability to refuse requests violating financial compliance, based on real-world cases and ethics standards.
Contribution
It introduces a comprehensive, real-world grounded benchmark for assessing LLM safety in financial scenarios, highlighting vulnerabilities and language-specific susceptibilities.
Findings
LLMs show vulnerabilities to adversarial prompts bypassing safeguards.
Chinese-language LLMs are more susceptible to compliance violations.
Prompt-level defenses have limited effectiveness against sophisticated manipulations.
Abstract
Large language models (LLMs) are increasingly applied in financial scenarios. However, they may produce harmful outputs, including facilitating illegal activities or unethical behavior, posing serious compliance risks. To systematically evaluate LLM safety in finance, we propose FinSafetyBench, a bilingual (English-Chinese) red-teaming benchmark designed to test an LLM's refusal of requests that violate financial compliance. Grounded in real-world financial crime cases and ethics standards, the benchmark comprises 14 subcategories spanning financial crimes and ethical violations. Through extensive experiments on general-purpose and finance-specialized LLMs under three representative attack settings, we identify critical vulnerabilities that allow adversarial prompts to bypass compliance safeguards. Further analysis reveals stronger susceptibility in Chinese contexts and highlights the…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
