Do Reasoning LLMs Refuse What They Infer in Long Contexts?

Yu Fu; Haz Sameen Shahgir; Huanli Gong; Zhipeng Wei; N. Benjamin Erichson; Yue Dong

arXiv:2602.08874·cs.CL·May 15, 2026

Do Reasoning LLMs Refuse What They Infer in Long Contexts?

Yu Fu, Haz Sameen Shahgir, Huanli Gong, Zhipeng Wei, N. Benjamin Erichson, Yue Dong

PDF

TL;DR

This paper investigates how long-context language models can infer harmful objectives from incomplete information, revealing a safety gap where models often fail to refuse harmful inferences in complex reasoning scenarios.

Contribution

It introduces compositional reasoning attacks to evaluate the safety of LLMs in long contexts, highlighting their limitations in refusing inferred harmful requests.

Findings

01

Models refuse direct harmful requests effectively.

02

Refusal rates drop when harmful objectives are reconstructed compositionally.

03

Longer contexts increase the likelihood of harmful inferences and failures to refuse.

Abstract

Long-context LLMs can infer objectives that are not stated explicitly. This capability is useful for reasoning over documents, code, retrieved evidence, and tool traces, but it also creates a safety risk: harmful intent can be distributed across a context and become visible only after the model composes the relevant pieces. Existing safety evaluations mostly test explicit harmful requests, and therefore miss this failure mode. We introduce compositional reasoning attacks, a long-context threat model in which harmful requests are decomposed into semantically incomplete fragments and embedded in long contexts. The final query is neutral; the harmful objective emerges only if the model retrieves the fragments, composes them, and infers the implied goal. We instantiate this setting using AdvBench requests, varying the required reasoning from Direct Retrieval to Single-hop Aggregation, Chain…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.