CLIP Brings Better Features to Visual Aesthetics Learners
Liwu Xu, Jinjin Xu, Yuzhe Yang, Xilu Wang, Yijie Huang, Yaqian Li

TL;DR
This paper introduces a novel semi-supervised knowledge distillation framework leveraging CLIP to improve lightweight image aesthetics assessment models, especially in low-data scenarios, achieving state-of-the-art results.
Contribution
It proposes a unified two-phase CLIP-based semi-supervised knowledge distillation method that effectively transfers features from CLIP to lightweight IAA models.
Findings
Achieves state-of-the-art performance on IAA benchmarks.
Effective transfer of CLIP features enhances IAA model representations.
Model initialization is guided by CLIP's feature transfer.
Abstract
Image Aesthetics Assessment (IAA) is a challenging task due to its subjective nature and expensive manual annotations. Recent large-scale vision-language models, such as Contrastive Language-Image Pre-training (CLIP), have shown their promising representation capability for various downstream tasks. However, the application of CLIP to resource-constrained and low-data IAA tasks remains limited. While few attempts to leverage CLIP in IAA have mainly focused on carefully designed prompts, we extend beyond this by allowing models from different domains and with different model sizes to acquire knowledge from CLIP. To achieve this, we propose a unified and flexible two-phase CLIP-based Semi-supervised Knowledge Distillation (CSKD) paradigm, aiming to learn a lightweight IAA model while leveraging CLIP's strong generalization capability. Specifically, CSKD employs a feature alignment…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsVisual Attention and Saliency Detection · Image Processing Techniques and Applications · Cell Image Analysis Techniques
MethodsALIGN · Contrastive Language-Image Pre-training
