Are Any-to-Any Models More Consistent Across Modality Transfers Than Specialists?

Jiwan Chung; Janghan Yoon; Junhyeong Park; Sangeyl Lee; Joowon Yang; Sooyeon Park; Youngjae Yu

arXiv:2505.24211·cs.CL·June 2, 2025

Are Any-to-Any Models More Consistent Across Modality Transfers Than Specialists?

Jiwan Chung, Janghan Yoon, Junhyeong Park, Sangeyl Lee, Joowon Yang, Sooyeon Park, Youngjae Yu

PDF

Open Access 1 Repo 1 Datasets 1 Video

TL;DR

This paper evaluates whether any-to-any generative models outperform specialized models in cross-modal coherence, finding limited consistency in pointwise tests but some weak structured equivariance signals.

Contribution

Introduces ACON, a new dataset and evaluation framework for assessing cross-modal consistency in unified generative models.

Findings

01

No significant advantage of any-to-any models in cyclic consistency

02

Weak equivariance signals suggest some structured cross-modal coherence

03

Structured analysis of latent space reveals potential for improved consistency

Abstract

Any-to-any generative models aim to enable seamless interpretation and generation across multiple modalities within a unified framework, yet their ability to preserve relationships across modalities remains uncertain. Do unified models truly achieve cross-modal coherence, or is this coherence merely perceived? To explore this, we introduce ACON, a dataset of 1,000 images (500 newly contributed) paired with captions, editing instructions, and Q&A pairs to evaluate cross-modal transfers rigorously. Using three consistency criteria-cyclic consistency, forward equivariance, and conjugated equivariance-our experiments reveal that any-to-any models do not consistently demonstrate greater cross-modal consistency than specialized models in pointwise evaluations such as cyclic consistency. However, equivariance evaluations uncover weak but observable consistency through structured analyses of…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

jiwanchung/acon
noneOfficial

Datasets

jiwan-chung/ACON
dataset· 12 dl
12 dl

Videos

Are Any-to-Any Models More Consistent Across Modality Transfers Than Specialists?· underline

Taxonomy

TopicsTopic Modeling