CLIP Brings Better Features to Visual Aesthetics Learners

Liwu Xu; Jinjin Xu; Yuzhe Yang; Xilu Wang; Yijie Huang; Yaqian Li

arXiv:2307.15640·cs.CV·August 5, 2025·1 cites

CLIP Brings Better Features to Visual Aesthetics Learners

Liwu Xu, Jinjin Xu, Yuzhe Yang, Xilu Wang, Yijie Huang, Yaqian Li

PDF

Open Access

TL;DR

This paper introduces a novel semi-supervised knowledge distillation framework leveraging CLIP to improve lightweight image aesthetics assessment models, especially in low-data scenarios, achieving state-of-the-art results.

Contribution

It proposes a unified two-phase CLIP-based semi-supervised knowledge distillation method that effectively transfers features from CLIP to lightweight IAA models.

Findings

01

Achieves state-of-the-art performance on IAA benchmarks.

02

Effective transfer of CLIP features enhances IAA model representations.

03

Model initialization is guided by CLIP's feature transfer.

Abstract

Image Aesthetics Assessment (IAA) is a challenging task due to its subjective nature and expensive manual annotations. Recent large-scale vision-language models, such as Contrastive Language-Image Pre-training (CLIP), have shown their promising representation capability for various downstream tasks. However, the application of CLIP to resource-constrained and low-data IAA tasks remains limited. While few attempts to leverage CLIP in IAA have mainly focused on carefully designed prompts, we extend beyond this by allowing models from different domains and with different model sizes to acquire knowledge from CLIP. To achieve this, we propose a unified and flexible two-phase CLIP-based Semi-supervised Knowledge Distillation (CSKD) paradigm, aiming to learn a lightweight IAA model while leveraging CLIP's strong generalization capability. Specifically, CSKD employs a feature alignment…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsVisual Attention and Saliency Detection · Image Processing Techniques and Applications · Cell Image Analysis Techniques

MethodsALIGN · Contrastive Language-Image Pre-training