Visual-Language Model Knowledge Distillation Method for Image Quality Assessment

Yongkang Hou; Jiarun Song

arXiv:2507.15680·cs.CV·July 24, 2025

Visual-Language Model Knowledge Distillation Method for Image Quality Assessment

Yongkang Hou, Jiarun Song

PDF

Open Access

TL;DR

This paper introduces a knowledge distillation approach from CLIP to smaller models for image quality assessment, improving efficiency and accuracy across multiple datasets.

Contribution

It proposes a novel visual-language knowledge distillation method that enhances IQA models by leveraging CLIP's capabilities while reducing complexity.

Findings

01

Significantly reduces model complexity.

02

Outperforms existing IQA methods.

03

Demonstrates strong potential for practical deployment.

Abstract

Image Quality Assessment (IQA) is a core task in computer vision. Multimodal methods based on vision-language models, such as CLIP, have demonstrated exceptional generalization capabilities in IQA tasks. To address the issues of excessive parameter burden and insufficient ability to identify local distorted features in CLIP for IQA, this study proposes a visual-language model knowledge distillation method aimed at guiding the training of models with architectural advantages using CLIP's IQA knowledge. First, quality-graded prompt templates were designed to guide CLIP to output quality scores. Then, CLIP is fine-tuned to enhance its capabilities in IQA tasks. Finally, a modality-adaptive knowledge distillation strategy is proposed to achieve guidance from the CLIP teacher model to the student model. Our experiments were conducted on multiple IQA datasets, and the results show that the…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsMultimodal Machine Learning Applications · Visual Attention and Saliency Detection · Advanced Neural Network Applications