Priorformer: A UGC-VQA Method with content and distortion priors

Yajing Pei; Shiyu Huang; Yiting Lu; Xin Li; Zhibo Chen

arXiv:2406.16297·cs.CV·June 25, 2024

Priorformer: A UGC-VQA Method with content and distortion priors

Yajing Pei, Shiyu Huang, Yiting Lu, Xin Li, Zhibo Chen

PDF

Open Access

TL;DR

PriorFormer is a novel UGC video quality assessment method that uses content and distortion priors to improve adaptability and performance across diverse content and degradations, achieving state-of-the-art results.

Contribution

The paper introduces a new prior-augmented vision transformer that incorporates content and distortion priors for improved blind video quality assessment of UGC.

Findings

01

Achieves state-of-the-art performance on three UGC VQA datasets.

02

Effectively models diverse content and distortions in UGC videos.

03

Utilizes pre-trained feature extractors for content and distortion embeddings.

Abstract

User Generated Content (UGC) videos are susceptible to complicated and variant degradations and contents, which prevents the existing blind video quality assessment (BVQA) models from good performance since the lack of the adapability of distortions and contents. To mitigate this, we propose a novel prior-augmented perceptual vision transformer (PriorFormer) for the BVQA of UGC, which boots its adaptability and representation capability for divergent contents and distortions. Concretely, we introduce two powerful priors, i.e., the content and distortion priors, by extracting the content and distortion embeddings from two pre-trained feature extractors. Then we adopt these two powerful embeddings as the adaptive prior tokens, which are transferred to the vision transformer backbone jointly with implicit quality features. Based on the above strategy, the proposed PriorFormer achieves…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsSpeech and Audio Processing · Neural Networks and Applications · Image and Signal Denoising Methods

MethodsAttention Is All You Need · Softmax · Layer Normalization · Linear Layer · Dense Connections · Multi-Head Attention · Residual Connection · Vision Transformer