Crisis-Bench: Benchmarking Strategic Ambiguity and Reputation Management in Large Language Models

Cooper Lin; Maohao Ran; Yanting Zhang; Zhenglin Wan; Hongwei Fan; Yibo Xu; Yike Guo; Wei Xue; Jun Song

arXiv:2601.05570·cs.AI·January 12, 2026

Crisis-Bench: Benchmarking Strategic Ambiguity and Reputation Management in Large Language Models

Cooper Lin, Maohao Ran, Yanting Zhang, Zhenglin Wan, Hongwei Fan, Yibo Xu, Yike Guo, Wei Xue, Jun Song

PDF

Open Access

TL;DR

Crisis-Bench is a new benchmark that evaluates large language models' ability to manage reputation and strategic ambiguity in high-stakes corporate crises, highlighting the gap between general safety and professional utility.

Contribution

It introduces Crisis-Bench, a multi-agent POMDP framework with a novel economic incentive metric, to assess LLMs' reputation management in complex crisis scenarios.

Findings

01

Some models exhibit Machiavellian strategic withholding.

02

Models vary in their ability to balance ethics and strategic ambiguity.

03

The benchmark reveals a gap between general safety and professional utility.

Abstract

Standard safety alignment optimizes Large Language Models (LLMs) for universal helpfulness and honesty, effectively instilling a rigid "Boy Scout" morality. While robust for general-purpose assistants, this one-size-fits-all ethical framework imposes a "transparency tax" on professional domains requiring strategic ambiguity and information withholding, such as public relations, negotiation, and crisis management. To measure this gap between general safety and professional utility, we introduce Crisis-Bench, a multi-agent Partially Observable Markov Decision Process (POMDP) that evaluates LLMs in high-stakes corporate crises. Spanning 80 diverse storylines across 8 industries, Crisis-Bench tasks an LLM-based Public Relations (PR) Agent with navigating a dynamic 7-day corporate crisis simulation while managing strictly separated Private and Public narrative states to enforce rigorous…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsPublic Relations and Crisis Communication · Artificial Intelligence in Healthcare and Education · Explainable Artificial Intelligence (XAI)