CBMAS: Cognitive Behavioral Modeling via Activation Steering
Ahmed H. Ismail, Anthony Kuang, Ayo Akinkugbe, Kevin Zhu, Sean O'Brien

TL;DR
CBMAS introduces a diagnostic framework that uses continuous activation steering to analyze and interpret cognitive behaviors in large language models, revealing tipping points and behavioral trajectories.
Contribution
The paper presents a novel continuous diagnostic approach combining activation steering, bias curves, and sensitivity analysis to understand LLM cognitive behaviors.
Findings
Reveals tipping points where small interventions flip model behavior
Shows how steering effects evolve across model layers
Provides tools and datasets for cognitive behavior analysis
Abstract
Large language models (LLMs) often encode cognitive behaviors unpredictably across prompts, layers, and contexts, making them difficult to diagnose and control. We present CBMAS, a diagnostic framework for continuous activation steering, which extends cognitive bias analysis from discrete before/after interventions to interpretable trajectories. By combining steering vector construction with dense {\alpha}-sweeps, logit lens-based bias curves, and layer-site sensitivity analysis, our approach can reveal tipping points where small intervention strengths flip model behavior and show how steering effects evolve across layer depth. We argue that these continuous diagnostics offer a bridge between high-level behavioral evaluation and low-level representational dynamics, contributing to the cognitive interpretability of LLMs. Lastly, we provide a CLI and datasets for various cognitive…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsTopic Modeling · Explainable Artificial Intelligence (XAI) · Machine Learning in Healthcare
