VQA$^2$: Visual Question Answering for Video Quality Assessment

Ziheng Jia; Zicheng Zhang; Jiaying Qian; Haoning Wu; Wei Sun; Chunyi; Li; Xiaohong Liu; Weisi Lin; Guangtao Zhai; Xiongkuo Min

arXiv:2411.03795·cs.CV·December 3, 2024·2 cites

VQA$^2$: Visual Question Answering for Video Quality Assessment

Ziheng Jia, Zicheng Zhang, Jiaying Qian, Haoning Wu, Wei Sun, Chunyi, Li, Xiaohong Liu, Weisi Lin, Guangtao Zhai, Xiongkuo Min

PDF

Open Access 1 Repo 1 Models

TL;DR

This paper introduces VQA2, a new dataset and models for video quality assessment using visual question answering, achieving state-of-the-art results and surpassing GPT-4o in understanding tasks.

Contribution

It presents the first VQA instruction dataset for video quality assessment and develops models that integrate spatial-temporal perception, advancing low-level video quality understanding.

Findings

01

VQA2 dataset contains 157,755 question-answer pairs across various video types.

02

VQA2-Assistant outperforms GPT-4o in visual quality understanding tasks.

03

Models achieve strong performance in both video quality scoring and understanding.

Abstract

The advent and proliferation of large multi-modal models (LMMs) have introduced new paradigms to computer vision, transforming various tasks into a unified visual question answering framework. Video Quality Assessment (VQA), a classic field in low-level visual perception, focused initially on quantitative video quality scoring. However, driven by advances in LMMs, it is now progressing toward more holistic visual quality understanding tasks. Recent studies in the image domain have demonstrated that Visual Question Answering (VQA) can markedly enhance low-level visual quality evaluation. Nevertheless, related work has not been explored in the video domain, leaving substantial room for improvement. To address this gap, we introduce the VQA2 Instruction Dataset - the first visual question answering instruction dataset that focuses on video quality assessment. This dataset consists of 3…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

q-future/visual-question-answering-for-video-quality-assessment
pytorchOfficial

Models

🤗
q-future/VQA-Assistant-llava_qwen
model· ♡ 1
♡ 1

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsMultimodal Machine Learning Applications · Advanced Image and Video Retrieval Techniques · Visual Attention and Saliency Detection