Multilevel Semantic-Aware Model for AI-Generated Video Quality   Assessment

Jiaze Li; Haoran Xu; Shiding Zhu; Junwei He; Haozhao Wang

arXiv:2501.02706·cs.CV·January 7, 2025

Multilevel Semantic-Aware Model for AI-Generated Video Quality Assessment

Jiaze Li, Haoran Xu, Shiding Zhu, Junwei He, Haozhao Wang

PDF

Open Access

TL;DR

This paper introduces MSA-VQA, a hierarchical, semantic-aware model leveraging CLIP for assessing AI-generated video quality, achieving state-of-the-art performance through multi-level analysis and semantic supervision.

Contribution

The paper presents a novel multilevel framework with semantic supervision and mutation-aware modules specifically designed for AI-generated video quality assessment.

Findings

01

Achieves state-of-the-art results on video quality benchmarks.

02

Effectively captures semantic consistency and subtle frame variations.

03

Demonstrates robustness across different AI-generated video datasets.

Abstract

The rapid development of diffusion models has greatly advanced AI-generated videos in terms of length and consistency recently, yet assessing AI-generated videos still remains challenging. Previous approaches have often focused on User-Generated Content(UGC), but few have targeted AI-Generated Video Quality Assessment methods. In this work, we introduce MSA-VQA, a Multilevel Semantic-Aware Model for AI-Generated Video Quality Assessment, which leverages CLIP-based semantic supervision and cross-attention mechanisms. Our hierarchical framework analyzes video content at three levels: frame, segment, and video. We propose a Prompt Semantic Supervision Module using text encoder of CLIP to ensure semantic consistency between videos and conditional prompts. Additionally, we propose the Semantic Mutation-aware Module to capture subtle variations between frames. Extensive experiments…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsImage and Video Quality Assessment · Visual Attention and Saliency Detection

MethodsDiffusion · Contrastive Language-Image Pre-training