Concept Attractors in LLMs and their Applications
Sotirios Panagiotis Chytas, Vikas Singh

TL;DR
This paper reveals that LLMs have concept-specific attractors in their internal representations, which can be exploited through simple, training-free methods for various tasks, outperforming some specialized baselines.
Contribution
It introduces a novel IFS-based framework explaining LLM internal representations and develops training-free attractor methods for practical applications.
Findings
Attractors explain semantic clustering in LLMs
Attractor-based methods match or outperform specialized baselines
Approach is training-free and generalizes well
Abstract
Large language models (LLMs) often map semantically related prompts to similar internal representations at specific layers, even when their surface forms differ widely. We show that this behavior can be explained through Iterated Function Systems (IFS), where layers act as contractive mappings toward concept-specific Attractors. We leverage this insight and develop simple, training-free methods that operate directly on these Attractors to solve a wide range of practical tasks, including language translation, hallucination reduction, guardrailing, and synthetic data generation. Despite their simplicity, these Attractor-based interventions match or exceed specialized baselines, offering an efficient alternative to heavy fine-tuning, generalizable in scenarios where baselines underperform.
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsTopic Modeling · Multimodal Machine Learning Applications · Generative Adversarial Networks and Image Synthesis
