DPN-LE: Dual Personality Neuron Localization and Editing for Large Language Models
Lifan Zheng, Xue Yang, Jiawei Chen, Chenyan Wu, Jingyuan Zhang, Fanheng Kong, Xinyi Zeng, Xiang Chen, Yu Tian

TL;DR
This paper introduces DPN-LE, a method for precise personality editing in large language models by identifying and modifying a small subset of personality-specific neurons, improving control and preserving capabilities.
Contribution
DPN-LE is a novel neuron localization and editing technique that isolates mutually exclusive personality neurons using contrastive activation analysis and sparse interventions.
Findings
DPN-LE intervenes on about 0.5% of neurons for personality control.
It achieves competitive personality editing with minimal performance degradation.
Experiments show effectiveness on LLaMA-3-8B-Instruct and Qwen2.5-7B-Instruct.
Abstract
With the widespread adoption of large language models (LLMs), understanding their personality representation mechanisms has become critical. As a novel paradigm in Personality Editing, most existing methods employ neuron-editing to locate and modify LLM neurons, requiring changes to numerous neurons and leading to significant performance degradation. This raises a fundamental question: Are all modified neurons directly related to personality representation? In this work, we investigate and quantify this specificity through assessments of general capability impact and representation-level patterns. We find that: 1) Current methods can change personalities but reduce overall performance. 2) Neurons are multifunctional, connecting personality traits and general knowledge. 3) Opposing personality traits demonstrate distinctly mutually exclusive representation patterns. Motivated by these…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
