Zoom-VQA: Patches, Frames and Clips Integration for Video Quality   Assessment

Kai Zhao; Kun Yuan; Ming Sun; Xing Wen

arXiv:2304.06440·cs.CV·April 14, 2023·1 cites

Zoom-VQA: Patches, Frames and Clips Integration for Video Quality Assessment

Kai Zhao, Kun Yuan, Ming Sun, Xing Wen

PDF

Open Access 1 Repo

TL;DR

Zoom-VQA introduces a multi-level video quality assessment model that integrates patch, frame, and clip information, achieving state-of-the-art results on multiple benchmarks and excelling in the NTIRE 2023 VQA challenge.

Contribution

It proposes a novel multi-level architecture with patch attention, frame alignment, and clip ensemble components for improved video quality assessment.

Findings

01

Achieved state-of-the-art results on four VQA benchmarks.

02

Secured 2nd place in NTIRE 2023 VQA challenge.

03

Outperformed previous methods on LSVQ subsets.

Abstract

Video quality assessment (VQA) aims to simulate the human perception of video quality, which is influenced by factors ranging from low-level color and texture details to high-level semantic content. To effectively model these complicated quality-related factors, in this paper, we decompose video into three levels (\ie, patch level, frame level, and clip level), and propose a novel Zoom-VQA architecture to perceive spatio-temporal features at different levels. It integrates three components: patch attention module, frame pyramid alignment, and clip ensemble strategy, respectively for capturing region-of-interest in the spatial dimension, multi-level information at different feature levels, and distortions distributed over the temporal dimension. Owing to the comprehensive design, Zoom-VQA obtains state-of-the-art results on four VQA benchmarks and achieves 2nd place in the NTIRE 2023 VQA…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

k-zha14/zoom-vqa
pytorchOfficial

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsImage and Video Quality Assessment · Advanced Image Fusion Techniques · Image Enhancement Techniques

MethodsContrastive Language-Image Pre-training