State-Dependent Refusal and Learned Incapacity in RLHF-Aligned Language Models

TK Lee

arXiv:2512.13762·cs.AI·December 17, 2025

State-Dependent Refusal and Learned Incapacity in RLHF-Aligned Language Models

TK Lee

PDF

Open Access

TL;DR

This paper introduces a qualitative framework for auditing language models' behavior over long interactions, revealing patterns of selective refusal in sensitive domains and proposing learned incapacity as a behavioral concept.

Contribution

It presents a novel interaction-level auditing methodology and introduces learned incapacity as a new behavioral descriptor for analyzing model responses.

Findings

01

Models show asymmetry between normal performance and refusals in sensitive domains.

02

Meta-narrative role framing correlates with refusal behavior.

03

The framework enables qualitative analysis of long-horizon model interactions.

Abstract

Large language models (LLMs) are widely deployed as general-purpose tools, yet extended interaction can reveal behavioral patterns not captured by standard quantitative benchmarks. We present a qualitative case-study methodology for auditing policy-linked behavioral selectivity in long-horizon interaction. In a single 86-turn dialogue session, the same model shows Normal Performance (NP) in broad, non-sensitive domains while repeatedly producing Functional Refusal (FR) in provider- or policy-sensitive domains, yielding a consistent asymmetry between NP and FR across domains. Drawing on learned helplessness as an analogy, we introduce learned incapacity (LI) as a behavioral descriptor for this selective withholding without implying intentionality or internal mechanisms. We operationalize three response regimes (NP, FR, Meta-Narrative; MN) and show that MN role-framing narratives tend to…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsTopic Modeling · Artificial Intelligence in Healthcare and Education · Explainable Artificial Intelligence (XAI)