Same Words, Different Judgments: How Preferences Vary Across Modalities

Aaron Broukhim; Nadir Weibel; Eshin Jolly

arXiv:2602.22710·cs.SD·May 8, 2026

Same Words, Different Judgments: How Preferences Vary Across Modalities

Aaron Broukhim, Nadir Weibel, Eshin Jolly

PDF

TL;DR

This study investigates how human and synthetic preferences differ across text and speech modalities in AI evaluation, revealing significant modality-specific differences and the need for tailored protocols.

Contribution

It provides the first controlled cross-modal comparison of preference annotations, highlighting differences in reporting and agreement between text and audio evaluations.

Findings

01

Audio preferences show narrower decision thresholds and less bias.

02

Synthetic ratings can predict inter-rater agreement effectively.

03

Modality-specific evaluation protocols are necessary for audio data.

Abstract

Preference-based reinforcement learning (PbRL) is the dominant framework for aligning AI systems to human preferences. However, evaluation protocols for such data were designed for text and have not been validated for speech. We present the first ICC-based, controlled cross-modal study of human and synthetic preference annotations, comparing text and audio evaluations of identical semantic content across 100 prompts. We show that achieving $good$ agreement within either modality (ICC(2, $k$ ) $\approx$ .80) requires $\sim$ 9 raters. At the same time, modalities show marked differences in how people report preferences: audio raters exhibit narrower decision thresholds, reduced length bias, and more user-oriented evaluation criteria, with near-chance cross-modality agreement. We demonstrate that synthetic ratings can be used to effectively predict inter-rater agreement, thus serving…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.