Image and Video Quality Assessment using Prompt-Guided Latent Diffusion Models for Cross-Dataset Generalization

Shankhanil Mitra; Diptanu De; Shika Rao; Rajiv Soundararajan

arXiv:2406.04654·eess.IV·December 30, 2025

Image and Video Quality Assessment using Prompt-Guided Latent Diffusion Models for Cross-Dataset Generalization

Shankhanil Mitra, Diptanu De, Shika Rao, Rajiv Soundararajan

PDF

Open Access

TL;DR

This paper introduces a novel approach for image and video quality assessment using diffusion models and quality-aware prompts, achieving better cross-dataset generalization and efficiency in handling diverse visual data.

Contribution

It leverages diffusion model denoising processes and cross-attention maps to develop a generalized quality assessment method for images and videos, incorporating a temporal quality modulator for efficiency.

Findings

01

Superior generalization across multiple datasets

02

Effective handling of diverse content types

03

Enhanced efficiency with frame-rate sub-sampling

Abstract

The design of image and video quality assessment (QA) algorithms is extremely important to benchmark and calibrate user experience in modern visual systems. A major drawback of the state-of-the-art QA methods is their limited ability to generalize across diverse image and video datasets with reasonable distribution shifts. In this work, we leverage the denoising process of diffusion models for generalized image QA (IQA) and video QA (VQA) by understanding the degree of alignment between learnable quality-aware text prompts and images or video frames. In particular, we learn cross-attention maps from intermediate layers of the denoiser of latent diffusion models (LDMs) to capture quality-aware representations of images or video frames. Since applying text-to-image LDMs for every video frame is computationally expensive for videos, we only estimate the quality of a frame-rate sub-sampled…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsImage and Video Quality Assessment

MethodsDiffusion