Evaluating Generative Models via One-Dimensional Code Distributions
Zexi Jia, Pengcheng Luo, Yijia Zhong, Jinchao Zhang, Jie Zhou

TL;DR
This paper introduces new token-based metrics for evaluating generative models that better correlate with human perception by analyzing discrete visual tokens, and presents a comprehensive benchmark dataset for stress-testing these metrics.
Contribution
It proposes Codebook Histogram Distance and Code Mixture Model Score as novel, training-free, token-based evaluation metrics, and introduces VisForm, a large benchmark dataset for assessing generative model quality.
Findings
Token-based metrics outperform feature-distribution metrics in correlating with human judgments.
The proposed metrics are training-free and applicable across diverse models and data.
VisForm benchmark enables robust evaluation under broad distribution shifts.
Abstract
Most evaluations of generative models rely on feature-distribution metrics such as FID, which operate on continuous recognition features that are explicitly trained to be invariant to appearance variations, and thus discard cues critical for perceptual quality. We instead evaluate models in the space of discrete visual tokens, where modern 1D image tokenizers compactly encode both semantic and perceptual information and quality manifests as predictable token statistics. We introduce Codebook Histogram Distance (CHD), a training-free distribution metric in token space, and Code Mixture Model Score (CMMS), a no-reference quality metric learned from synthetic degradations of token sequences. To stress-test metrics under broad distribution shifts, we further propose VisForm, a benchmark of 210K images spanning 62 visual forms and 12 generative models with expert annotations. Across AGIQA,…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsGenerative Adversarial Networks and Image Synthesis · Face recognition and analysis · Domain Adaptation and Few-Shot Learning
