Concept Tokens: Learning Behavioral Embeddings Through Concept Definitions

Ignacio Sastre; Aiala Ros\'a

arXiv:2601.04465·cs.CL·January 9, 2026

Concept Tokens: Learning Behavioral Embeddings Through Concept Definitions

Ignacio Sastre, Aiala Ros\'a

PDF

Open Access

TL;DR

Concept Tokens is a lightweight method that learns a special token's embedding from natural language definitions to control and steer behavior in frozen large language models, improving interpretability and reducing hallucinations.

Contribution

It introduces a novel approach to embed concept definitions into a single token, enabling effective behavioral control without retraining the entire model.

Findings

01

Negating the token reduces hallucinations by increasing abstentions.

02

Providing the token increases hallucinations and decreases precision.

03

Concept tokens better preserve instruction compliance compared to full definitions.

Abstract

We propose Concept Tokens, a lightweight method that adds a new special token to a pretrained LLM and learns only its embedding from multiple natural language definitions of a target concept, where occurrences of the concept are replaced by the new token. The LLM is kept frozen and the embedding is optimized with the standard language-modeling objective. We evaluate Concept Tokens in three settings. First, we study hallucinations in closed-book question answering on HotpotQA and find a directional effect: negating the hallucination token reduces hallucinated answers mainly by increasing abstentions, whereas asserting it increases hallucinations and lowers precision. Second, we induce recasting, a pedagogical feedback strategy for second language teaching, and observe the same directional effect. Moreover, compared to providing the full definitional corpus in-context, concept tokens…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsTopic Modeling · Natural Language Processing Techniques · Intelligent Tutoring Systems and Adaptive Learning