Conversational Context Classification: A Representation Engineering Approach
Jonathan Pan

TL;DR
This paper explores a novel representation engineering approach using OCSVM to identify context-specific subspaces within LLMs' internal states, aiding in detecting out-of-context responses and improving interpretability.
Contribution
It introduces a method combining representation engineering and OCSVM to locate context-relevant subspaces in LLMs, enhancing context detection capabilities.
Findings
Effective identification of context-specific subspaces in Llama and Qwen models.
Promising results in detecting out-of-context conversational responses.
Improved interpretability of LLM internal states.
Abstract
The increasing prevalence of Large Language Models (LLMs) demands effective safeguards for their operation, particularly concerning their tendency to generate out-of-context responses. A key challenge is accurately detecting when LLMs stray from expected conversational norms, manifesting as topic shifts, factual inaccuracies, or outright hallucinations. Traditional anomaly detection struggles to directly apply within contextual semantics. This paper outlines our experiment in exploring the use of Representation Engineering (RepE) and One-Class Support Vector Machine (OCSVM) to identify subspaces within the internal states of LLMs that represent a specific context. By training OCSVM on in-context examples, we establish a robust boundary within the LLM's hidden state latent space. We evaluate out study with two open source LLMs - Llama and Qwen models in specific contextual domain. Our…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsTopic Modeling · Explainable Artificial Intelligence (XAI) · Anomaly Detection Techniques and Applications
