Omni-Persona: Systematic Benchmarking and Improving Omnimodal Personalization

Yeongtak Oh; Dongwook Lee; Sangkwon Park; Heeseung Kim; Sungroh Yoon

arXiv:2605.09996·cs.CV·May 12, 2026

Omni-Persona: Systematic Benchmarking and Improving Omnimodal Personalization

Yeongtak Oh, Dongwook Lee, Sangkwon Park, Heeseung Kim, Sungroh Yoon

PDF

1 Repo

TL;DR

Omni-Persona introduces a comprehensive benchmark for omnimodal personalization, diagnosing grounding behaviors and evaluating models across text, image, and audio modalities with a focus on absent-persona scenarios.

Contribution

It formalizes a new cross-modal routing task, proposes Calibrated Accuracy for better grounding evaluation, and provides diagnostic insights into model behaviors across modalities.

Findings

01

Open-source models show an audio-visual grounding gap.

02

Calibration exposes limitations of answerable recall and model size.

03

RLVR generalizes well but tends to be conservative and lower quality.

Abstract

While multimodal large language models have advanced across text, image, and audio, personalization research has remained primarily vision-language, with unified omnimodal benchmarking that jointly covers text, image, and audio still limited, and lacking the methodological rigor to account for absent-persona scenarios or systematic grounding studies. We introduce Omni-Persona, the first comprehensive benchmark for omnimodal personalization. We formalize the task as cross-modal routing over the \emph{Persona Modality Graph}, encompassing 4 task groups and 18 fine-grained tasks across $\sim 750$ items. To rigorously diagnose grounding behavior, we propose \emph{Calibrated Accuracy ( $Cal$ )}, which jointly rewards correct grounding and appropriate abstention, incorporating absent-persona queries within a unified evaluation framework. On our dedicated experiments, three diagnostic…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

oyt9306/Omni-Persona
github

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.