Capturing Co-existing Distortions in User-Generated Content for   No-reference Video Quality Assessment

Kun Yuan; Zishang Kong; Chuanchuan Zheng; Ming Sun; Xing Wen

arXiv:2307.16813·cs.CV·August 1, 2023

Capturing Co-existing Distortions in User-Generated Content for No-reference Video Quality Assessment

Kun Yuan, Zishang Kong, Chuanchuan Zheng, Ming Sun, Xing Wen

PDF

Open Access

TL;DR

This paper introduces Visual Quality Transformer (VQT), a novel approach for no-reference video quality assessment that efficiently captures co-existing distortions and outperforms existing methods on multiple datasets.

Contribution

The paper proposes a new transformer-based model with sparse temporal attention and multi-pathway architecture to better detect multiple distortions in user-generated videos.

Findings

01

VQT achieves superior accuracy on three no-reference VQA datasets.

02

VQT outperforms industrial algorithms like VMAF and AVQT on full-reference datasets.

03

The proposed STA reduces computational complexity from O(T^2) to O(T log T).

Abstract

Video Quality Assessment (VQA), which aims to predict the perceptual quality of a video, has attracted raising attention with the rapid development of streaming media technology, such as Facebook, TikTok, Kwai, and so on. Compared with other sequence-based visual tasks (\textit{e.g.,} action recognition), VQA faces two under-estimated challenges unresolved in User Generated Content (UGC) videos. \textit{First}, it is not rare that several frames containing serious distortions (\textit{e.g.,}blocking, blurriness), can determine the perceptual quality of the whole video, while other sequence-based tasks require more frames of equal importance for representations. \textit{Second}, the perceptual quality of a video exhibits a multi-distortion distribution, due to the differences in the duration and probability of occurrence for various distortions. In order to solve the above challenges, we…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsImage and Video Quality Assessment · Visual Attention and Saliency Detection · Advanced Image Fusion Techniques

MethodsMulti-Head Attention · Attention Is All You Need · Softmax · Position-Wise Feed-Forward Layer · Layer Normalization · Linear Layer · Dense Connections · Label Smoothing · Dropout · Adam