QPT V2: Masked Image Modeling Advances Visual Scoring

Qizhi Xie; Kun Yuan; Yunpeng Qu; Mingda Wu; Ming Sun; Chao Zhou; Jihong Zhu

arXiv:2407.16541·cs.CV·March 30, 2026

QPT V2: Masked Image Modeling Advances Visual Scoring

Qizhi Xie, Kun Yuan, Yunpeng Qu, Mingda Wu, Ming Sun, Chao Zhou, Jihong Zhu

PDF

TL;DR

QPT V2 introduces a novel masked image modeling pretraining framework that enhances visual quality and aesthetics assessment by leveraging curated data, degradation techniques, and multi-scale modeling, outperforming existing methods.

Contribution

It is the first MIM-based pretraining framework specifically designed for unified quality and aesthetics assessment in visual content.

Findings

01

QPT V2 outperforms state-of-the-art methods on 11 benchmarks.

02

The framework effectively captures high-level semantics and fine-grained details.

03

Extensive experiments demonstrate superior generalization capabilities.

Abstract

Quality assessment and aesthetics assessment aim to evaluate the perceived quality and aesthetics of visual content. Current learning-based methods suffer greatly from the scarcity of labeled data and usually perform sub-optimally in terms of generalization. Although masked image modeling (MIM) has achieved noteworthy advancements across various high-level tasks (e.g., classification, detection etc.). In this work, we take on a novel perspective to investigate its capabilities in terms of quality- and aesthetics-awareness. To this end, we propose Quality- and aesthetics-aware pretraining (QPT V2), the first pretraining framework based on MIM that offers a unified solution to quality and aesthetics assessment. To perceive the high-level semantics and fine-grained details, pretraining data is curated. To comprehensively encompass quality- and aesthetics-related factors, degradation is…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.