Conversational Context Classification: A Representation Engineering Approach

Jonathan Pan

arXiv:2601.12286·cs.CL·January 21, 2026

Conversational Context Classification: A Representation Engineering Approach

Jonathan Pan

PDF

Open Access

TL;DR

This paper explores a novel representation engineering approach using OCSVM to identify context-specific subspaces within LLMs' internal states, aiding in detecting out-of-context responses and improving interpretability.

Contribution

It introduces a method combining representation engineering and OCSVM to locate context-relevant subspaces in LLMs, enhancing context detection capabilities.

Findings

01

Effective identification of context-specific subspaces in Llama and Qwen models.

02

Promising results in detecting out-of-context conversational responses.

03

Improved interpretability of LLM internal states.

Abstract

The increasing prevalence of Large Language Models (LLMs) demands effective safeguards for their operation, particularly concerning their tendency to generate out-of-context responses. A key challenge is accurately detecting when LLMs stray from expected conversational norms, manifesting as topic shifts, factual inaccuracies, or outright hallucinations. Traditional anomaly detection struggles to directly apply within contextual semantics. This paper outlines our experiment in exploring the use of Representation Engineering (RepE) and One-Class Support Vector Machine (OCSVM) to identify subspaces within the internal states of LLMs that represent a specific context. By training OCSVM on in-context examples, we establish a robust boundary within the LLM's hidden state latent space. We evaluate out study with two open source LLMs - Llama and Qwen models in specific contextual domain. Our…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsTopic Modeling · Explainable Artificial Intelligence (XAI) · Anomaly Detection Techniques and Applications