ActivationReasoning: Logical Reasoning in Latent Activation Spaces
Lukas Helff, Ruben H\"arle, Wolfgang Stammer, Felix Friedrich, Manuel Brack, Antonia W\"ust, Hikaru Shindo, Patrick Schramowski, Kristian Kersting

TL;DR
ActivationReasoning introduces a framework embedding explicit logical reasoning into LLM latent spaces, enhancing interpretability, control, and reasoning capabilities across diverse tasks.
Contribution
It presents a novel method to incorporate logical reasoning into LLMs' latent representations, enabling systematic reasoning and model control.
Findings
AR scales with reasoning complexity and generalizes well.
It improves transparency and enables structured reasoning.
AR transfers effectively across different model backbones.
Abstract
Large language models (LLMs) excel at generating fluent text, but their internal reasoning remains opaque and difficult to control. Sparse autoencoders (SAEs) make hidden activations more interpretable by exposing latent features that often align with human concepts. Yet, these features are fragile and passive, offering no mechanism for systematic reasoning or model control. To address this, we introduce ActivationReasoning (AR), a framework that embeds explicit logical reasoning into the latent space of LLMs. It proceeds in three stages: (1) Finding latent representations, first latent concept representations are identified (e.g., via SAEs) and organized into a dictionary; (2) Activating propositions, at inference time AR detects activating concepts and maps them to logical propositions; and (3)Logical reasoning, applying logical rules over these propositions to infer higher-order…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
