Visual-Language Model Knowledge Distillation Method for Image Quality Assessment
Yongkang Hou, Jiarun Song

TL;DR
This paper introduces a knowledge distillation approach from CLIP to smaller models for image quality assessment, improving efficiency and accuracy across multiple datasets.
Contribution
It proposes a novel visual-language knowledge distillation method that enhances IQA models by leveraging CLIP's capabilities while reducing complexity.
Findings
Significantly reduces model complexity.
Outperforms existing IQA methods.
Demonstrates strong potential for practical deployment.
Abstract
Image Quality Assessment (IQA) is a core task in computer vision. Multimodal methods based on vision-language models, such as CLIP, have demonstrated exceptional generalization capabilities in IQA tasks. To address the issues of excessive parameter burden and insufficient ability to identify local distorted features in CLIP for IQA, this study proposes a visual-language model knowledge distillation method aimed at guiding the training of models with architectural advantages using CLIP's IQA knowledge. First, quality-graded prompt templates were designed to guide CLIP to output quality scores. Then, CLIP is fine-tuned to enhance its capabilities in IQA tasks. Finally, a modality-adaptive knowledge distillation strategy is proposed to achieve guidance from the CLIP teacher model to the student model. Our experiments were conducted on multiple IQA datasets, and the results show that the…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsMultimodal Machine Learning Applications · Visual Attention and Saliency Detection · Advanced Neural Network Applications
