Zoom-VQA: Patches, Frames and Clips Integration for Video Quality Assessment
Kai Zhao, Kun Yuan, Ming Sun, Xing Wen

TL;DR
Zoom-VQA introduces a multi-level video quality assessment model that integrates patch, frame, and clip information, achieving state-of-the-art results on multiple benchmarks and excelling in the NTIRE 2023 VQA challenge.
Contribution
It proposes a novel multi-level architecture with patch attention, frame alignment, and clip ensemble components for improved video quality assessment.
Findings
Achieved state-of-the-art results on four VQA benchmarks.
Secured 2nd place in NTIRE 2023 VQA challenge.
Outperformed previous methods on LSVQ subsets.
Abstract
Video quality assessment (VQA) aims to simulate the human perception of video quality, which is influenced by factors ranging from low-level color and texture details to high-level semantic content. To effectively model these complicated quality-related factors, in this paper, we decompose video into three levels (\ie, patch level, frame level, and clip level), and propose a novel Zoom-VQA architecture to perceive spatio-temporal features at different levels. It integrates three components: patch attention module, frame pyramid alignment, and clip ensemble strategy, respectively for capturing region-of-interest in the spatial dimension, multi-level information at different feature levels, and distortions distributed over the temporal dimension. Owing to the comprehensive design, Zoom-VQA obtains state-of-the-art results on four VQA benchmarks and achieves 2nd place in the NTIRE 2023 VQA…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsImage and Video Quality Assessment · Advanced Image Fusion Techniques · Image Enhancement Techniques
MethodsContrastive Language-Image Pre-training
