From Text to Emoji: How PEFT-Driven Personality Manipulation Unleashes   the Emoji Potential in LLMs

Navya Jain; Zekun Wu; Cristian Munoz; Airlie Hilliard; Xin Guan,; Adriano Koshiyama; Emre Kazim; Philip Treleaven

arXiv:2409.10245·cs.CL·February 26, 2025

From Text to Emoji: How PEFT-Driven Personality Manipulation Unleashes the Emoji Potential in LLMs

Navya Jain, Zekun Wu, Cristian Munoz, Airlie Hilliard, Xin Guan,, Adriano Koshiyama, Emre Kazim, Philip Treleaven

PDF

Open Access

TL;DR

This paper demonstrates how PEFT, specifically QLoRA, can manipulate LLMs to generate emojis representing personality traits, revealing latent behaviors and enhancing personality control over models.

Contribution

Introduces an Opinion QA dataset for PEFT-driven personality manipulation and develops benchmarks and explainability methods to analyze emoji-based personality expression in LLMs.

Findings

01

LLMs generate emojis for personality traits after PEFT manipulation

02

PEFT outperforms IKE in personality trait manipulation

03

Specific neurons are linked to emoji-based trait expressions

Abstract

The manipulation of the personality traits of large language models (LLMs) has emerged as a key area of research. Methods like prompt-based In-Context Knowledge Editing (IKE) and gradient-based Model Editor Networks (MEND) have been explored but show irregularity and variability; IKE depends on the prompt, leading to variability and sensitivity, while MEND yields inconsistent and gibberish outputs. To address this, we employed Opinion QA Based Parameter-Efficient Fine-Tuning (PEFT), specifically Quantized Low-Rank Adaptation (QLoRA), to manipulate the Big Five personality traits: Openness, Conscientiousness, Extraversion, Agreeableness, and Neuroticism. After PEFT, models such as Mistral-7B-Instruct and LLaMA-2-7B-chat showed a latent behaviour by generating emojis for certain traits, despite no emojis being present in the PEFT data. For instance, LLaMA-2-7B-chat generated emojis in…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsDigital Communication and Language · Linguistics, Language Diversity, and Identity · Second Language Acquisition and Learning

MethodsMODEL EDITOR NETWORKS WITH GRADIENT DECOMPOSITION