Identifying and Manipulating Personality Traits in LLMs Through Activation Engineering

Rumi Allbert; James K. Wiles; Vlad Grankovsky

arXiv:2412.10427·cs.CL·August 26, 2025

Identifying and Manipulating Personality Traits in LLMs Through Activation Engineering

Rumi Allbert, James K. Wiles, Vlad Grankovsky

PDF

Open Access 1 Repo

TL;DR

This paper introduces a novel activation engineering method to identify and manipulate personality traits in large language models, enhancing interpretability and enabling dynamic personality adjustments.

Contribution

It presents a new technique for modifying LLM personality traits through activation direction adjustments, advancing interpretability and ethical considerations.

Findings

01

Successful identification of activation directions linked to personality traits

02

Demonstrated ability to fine-tune LLM personalities dynamically

03

Insights into ethical implications of personality manipulation in LLMs

Abstract

The field of large language models (LLMs) has grown rapidly in recent years, driven by the desire for better efficiency, interpretability, and safe use. Building on the novel approach of "activation engineering," this study explores personality modification in LLMs, drawing inspiration from research like Refusal in LLMs Is Mediated by a Single Direction (arXiv:2406.11717) and Steering Llama 2 via Contrastive Activation Addition (arXiv:2312.06681). We leverage activation engineering to develop a method for identifying and adjusting activation directions related to personality traits, which may allow for dynamic LLM personality fine-tuning. This work aims to further our understanding of LLM interpretability while examining the ethical implications of such developments.

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

RumiAllbert/llm-abliterator
jaxOfficial

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsOnline Learning and Analytics

MethodsLLaMA