From Text to Emoji: How PEFT-Driven Personality Manipulation Unleashes the Emoji Potential in LLMs
Navya Jain, Zekun Wu, Cristian Munoz, Airlie Hilliard, Xin Guan,, Adriano Koshiyama, Emre Kazim, Philip Treleaven

TL;DR
This paper demonstrates how PEFT, specifically QLoRA, can manipulate LLMs to generate emojis representing personality traits, revealing latent behaviors and enhancing personality control over models.
Contribution
Introduces an Opinion QA dataset for PEFT-driven personality manipulation and develops benchmarks and explainability methods to analyze emoji-based personality expression in LLMs.
Findings
LLMs generate emojis for personality traits after PEFT manipulation
PEFT outperforms IKE in personality trait manipulation
Specific neurons are linked to emoji-based trait expressions
Abstract
The manipulation of the personality traits of large language models (LLMs) has emerged as a key area of research. Methods like prompt-based In-Context Knowledge Editing (IKE) and gradient-based Model Editor Networks (MEND) have been explored but show irregularity and variability; IKE depends on the prompt, leading to variability and sensitivity, while MEND yields inconsistent and gibberish outputs. To address this, we employed Opinion QA Based Parameter-Efficient Fine-Tuning (PEFT), specifically Quantized Low-Rank Adaptation (QLoRA), to manipulate the Big Five personality traits: Openness, Conscientiousness, Extraversion, Agreeableness, and Neuroticism. After PEFT, models such as Mistral-7B-Instruct and LLaMA-2-7B-chat showed a latent behaviour by generating emojis for certain traits, despite no emojis being present in the PEFT data. For instance, LLaMA-2-7B-chat generated emojis in…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsDigital Communication and Language · Linguistics, Language Diversity, and Identity · Second Language Acquisition and Learning
MethodsMODEL EDITOR NETWORKS WITH GRADIENT DECOMPOSITION
