Personality Editing for Language Models through Adjusting Self-Referential Queries
Seojin Hwang, Yumin Kim, Byeongjeong Kim, Donghoon Shin, Hwanhee Lee

TL;DR
This paper introduces PALETTE, a novel method for editing the personality of large language models using self-referential queries, requiring minimal data and providing stable, balanced personality control.
Contribution
PALETTE enables personality editing in LLMs through adjustment queries grounded in psychological constructs, requiring only 12 samples, unlike traditional fine-tuning methods.
Findings
Achieves significant personality alignment with minimal data
Outperforms prompt-based approaches in stability and balance
Validated by both automatic and human evaluations
Abstract
Large Language Models (LLMs) are integral to applications such as conversational agents and content creation, where precise control over a model's personality is essential for maintaining tone, consistency, and user engagement. However, prevailing prompt-based or fine-tuning approaches either lack robustness or demand large-scale training data, making them costly and impractical. In this paper, we present PALETTE (Personality Adjustment by LLM SElf-TargeTed quEries), a novel method for personality editing in LLMs. Our approach introduces adjustment queries, where self-referential statements grounded in psychological constructs are treated analogously to factual knowledge, enabling direct editing of personality-related responses. Unlike fine-tuning, PALETTE requires only 12 editing samples to achieve substantial improvements in personality alignment across personality dimensions.…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
Taxonomy
TopicsSemantic Web and Ontologies · Natural Language Processing Techniques · Topic Modeling
