DiffVQA: Video Quality Assessment Using Diffusion Feature Extractor

Wei-Ting Chen; Yu-Jiet Vong; Yi-Tsung Lee; Sy-Yen Kuo and; Qiang Gao; Sizhuo Ma; Jian Wang

arXiv:2505.03261·cs.CV·May 7, 2025

DiffVQA: Video Quality Assessment Using Diffusion Feature Extractor

Wei-Ting Chen, Yu-Jiet Vong, Yi-Tsung Lee, Sy-Yen Kuo and, Qiang Gao, Sizhuo Ma, Jian Wang

PDF

Open Access

TL;DR

This paper introduces DiffVQA, a novel video quality assessment framework that leverages diffusion models for robust feature extraction, significantly improving alignment with human perception and generalization across diverse datasets.

Contribution

The paper proposes a diffusion model-based feature extractor for VQA, enhancing performance and generalization over existing CNN and ViT methods.

Findings

01

DiffVQA outperforms existing methods on multiple datasets.

02

DiffVQA demonstrates strong cross-dataset generalization.

03

The diffusion-based features improve correlation with human perceptual scores.

Abstract

Video Quality Assessment (VQA) aims to evaluate video quality based on perceptual distortions and human preferences. Despite the promising performance of existing methods using Convolutional Neural Networks (CNNs) and Vision Transformers (ViTs), they often struggle to align closely with human perceptions, particularly in diverse real-world scenarios. This challenge is exacerbated by the limited scale and diversity of available datasets. To address this limitation, we introduce a novel VQA framework, DiffVQA, which harnesses the robust generalization capabilities of diffusion models pre-trained on extensive datasets. Our framework adapts these models to reconstruct identical input frames through a control module. The adapted diffusion model is then used to extract semantic and distortion features from a resizing branch and a cropping branch, respectively. To enhance the model's ability…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsImage and Video Quality Assessment · Advanced Image Processing Techniques · Video Analysis and Summarization

MethodsDiffusion · Mamba: Linear-Time Sequence Modeling with Selective State Spaces · ALIGN