Representation Engineering for Large-Language Models: Survey and Research Challenges
Lukasz Bartoszcze, Sarthak Munshi, Bryan Sukidi, Jennifer Yen, Zejia, Yang, David Williams-King, Linh Le, Kosi Asuzu, and Carsten Maple

TL;DR
This paper surveys representation engineering for large-language models, discussing its methods, challenges, and future research directions to improve model predictability, safety, and personalization.
Contribution
It formalizes the goals and methods of representation engineering, comparing it with other approaches and outlining research challenges and future directions.
Findings
Representation engineering uses contrasting inputs to modify high-level concepts.
It faces risks like performance drops and increased compute time.
The paper proposes a research agenda for safe, predictable, and personalized LLMs.
Abstract
Large-language models are capable of completing a variety of tasks, but remain unpredictable and intractable. Representation engineering seeks to resolve this problem through a new approach utilizing samples of contrasting inputs to detect and edit high-level representations of concepts such as honesty, harmfulness or power-seeking. We formalize the goals and methods of representation engineering to present a cohesive picture of work in this emerging discipline. We compare it with alternative approaches, such as mechanistic interpretability, prompt-engineering and fine-tuning. We outline risks such as performance decrease, compute time increases and steerability issues. We present a clear agenda for future research to build predictable, dynamic, safe and personalizable LLMs.
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsTopic Modeling · Natural Language Processing Techniques · Semantic Web and Ontologies
