Exploring Safety-Utility Trade-Offs in Personalized Language Models

Anvesh Rao Vijjini; Somnath Basu Roy Chowdhury; Snigdha Chaturvedi

arXiv:2406.11107·cs.CL·February 12, 2025·1 cites

Exploring Safety-Utility Trade-Offs in Personalized Language Models

Anvesh Rao Vijjini, Somnath Basu Roy Chowdhury, Snigdha Chaturvedi

PDF

Open Access 1 Video

TL;DR

This paper investigates how personalized large language models exhibit biases affecting safety and utility, revealing significant performance variance across user identities and proposing mitigation strategies.

Contribution

It quantifies personalization bias in LLMs along safety and utility axes and evaluates its impact across multiple models, introducing mitigation approaches.

Findings

01

LLMs show significant safety-utility trade-off variance based on user identity

02

Personalization bias affects models like Llama, Mistral, GPT-3.5, and GPT-4o

03

Mitigation strategies such as preference tuning can reduce personalization bias

Abstract

As large language models (LLMs) become increasingly integrated into daily applications, it is essential to ensure they operate fairly across diverse user demographics. In this work, we show that LLMs suffer from personalization bias, where their performance is impacted when they are personalized to a user's identity. We quantify personalization bias by evaluating the performance of LLMs along two axes - safety and utility. We measure safety by examining how benign LLM responses are to unsafe prompts with and without personalization. We measure utility by evaluating the LLM's performance on various tasks, including general knowledge, mathematical abilities, programming, and reasoning skills. We find that various LLMs, ranging from open-source models like Llama (Touvron et al., 2023) and Mistral (Jiang et al., 2023) to API-based ones like GPT-3.5 and GPT-4o (Ouyang et al., 2022), exhibit…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

Exploring Safety-Utility Trade-Offs in Personalized Language Models· underline

Taxonomy

TopicsNatural Language Processing Techniques · Topic Modeling

MethodsRefunds@Expedia|||How do I get a full refund from Expedia? · 15 Ways to Contact How can i speak to someone at Delta Airlines · Cosine Annealing · Residual Connection · Softmax · Layer Normalization · Byte Pair Encoding · Adam · Attention Dropout · Weight Decay