Uncovering Factor Level Preferences to Improve Human-Model Alignment

Juhyun Oh; Eunsu Kim; Jiseon Kim; Wenda Xu; Inha Cha; William Yang Wang; Alice Oh

arXiv:2410.06965·cs.CL·November 18, 2025

Uncovering Factor Level Preferences to Improve Human-Model Alignment

Juhyun Oh, Eunsu Kim, Jiseon Kim, Wenda Xu, Inha Cha, William Yang Wang, Alice Oh

PDF

Open Access

TL;DR

This paper introduces PROFILE, a framework for uncovering and measuring factor-level preference alignment in large language models, revealing discrepancies and guiding improvements in alignment with human preferences.

Contribution

The paper presents PROFILE, a novel automated method for analyzing factor-level preferences in LLMs, enabling targeted improvements in human-model alignment.

Findings

01

LLMs show poor factor-level alignment in text generation tasks.

02

Strong alignment is observed in discrimination tasks.

03

Leveraging the generation-discrimination gap improves LLM alignment.

Abstract

Large language models (LLMs) often exhibit tendencies that diverge from human preferences, such as favoring certain writing styles or producing overly verbose outputs. While crucial for improvement, identifying the factors driving these misalignments remains challenging due to existing evaluation methods' reliance on coarse-grained comparisons and lack of explainability. To address this, we introduce PROFILE, an automated framework to uncover and measure factor-level preference alignment of humans and LLMs. Using PROFILE, we analyze preference alignment across three key tasks: summarization, instruction-following, and document-based QA. We find a significant discrepancy: while LLMs show poor factor-level alignment with human preferences when generating texts, they demonstrate strong alignment in discrimination tasks. We demonstrate how leveraging the identified generation-discrimination…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsHuman-Automation Interaction and Safety