PersoBench: Benchmarking Personalized Response Generation in Large Language Models

Saleh Afzoon; Zahra Jamali; Usman Naseem; Amin Beheshti

arXiv:2410.03198·cs.CL·February 5, 2026

PersoBench: Benchmarking Personalized Response Generation in Large Language Models

Saleh Afzoon, Zahra Jamali, Usman Naseem, Amin Beheshti

PDF

Open Access

TL;DR

This paper introduces PersoBench, an automated benchmarking pipeline to evaluate the personalization ability of large language models in persona-aware dialogue generation, revealing current models' strengths and weaknesses.

Contribution

The paper presents a novel automated benchmarking framework, PersoBench, for assessing LLMs' personalization in dialogue, addressing a gap in existing evaluation methods.

Findings

01

LLMs generate fluent and diverse responses effectively.

02

Current LLMs struggle with personalization and coherence.

03

Evaluation across multiple models and datasets highlights these limitations.

Abstract

While large language models (LLMs) have exhibited impressive conversational capabilities, their proficiency in delivering personalized responses remains unclear. Although recent benchmarks automatically evaluate persona consistency in role-playing contexts using LLM-based judgment, the evaluation of personalization in response generation remains underexplored. To address this gap, we present an automated benchmarking pipeline, PersoBench, to evaluate the personalization ability of LLMs in persona-aware dialogue generation within a zero-shot setting. Our framework employs a structured pipeline comprising speaker-aware annotation, task-specific and context-driven prompt construction, response post-processing, and automated evaluation across multiple dimensions of generation quality. In particular, the pipeline performs text preprocessing and speaker labeling, constructs structured prompts…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsTopic Modeling · Natural Language Processing Techniques · Speech Recognition and Synthesis