VABench: A Comprehensive Benchmark for Audio-Video Generation

Daili Hua; Xizhi Wang; Bohan Zeng; Xinyi Huang; Hao Liang; Junbo Niu; Xinlong Chen; Quanqing Xu; Wentao Zhang

arXiv:2512.09299·cs.CV·April 7, 2026

VABench: A Comprehensive Benchmark for Audio-Video Generation

Daili Hua, Xizhi Wang, Bohan Zeng, Xinyi Huang, Hao Liang, Junbo Niu, Xinlong Chen, Quanqing Xu, Wentao Zhang

PDF

1 Repo

TL;DR

VABench is a new comprehensive benchmark framework designed to evaluate the quality and synchronization of audio-video generation across multiple task types and content categories.

Contribution

It introduces a multi-dimensional evaluation framework covering 15 metrics and 7 content categories for assessing synchronized audio-video generation models.

Findings

01

Systematic analysis and visualization of evaluation results.

02

Establishes a new standard for assessing audio-video generation models.

03

Addresses the gap in existing benchmarks for synchronized audio-video outputs.

Abstract

Recent advances in video generation have been remarkable, enabling models to produce visually compelling videos with synchronized audio. While existing video generation benchmarks provide comprehensive metrics for visual quality, they lack convincing evaluations for audio-video generation, especially for models aiming to generate synchronized audio-video outputs. To address this gap, we introduce VABench, a comprehensive and multi-dimensional benchmark framework designed to systematically evaluate the capabilities of synchronous audio-video generation. VABench encompasses three primary task types: text-to-audio-video (T2AV), image-to-audio-video (I2AV), and stereo audio-video generation. It further establishes two major evaluation modules covering 15 dimensions. These dimensions specifically assess pairwise similarities (text-video, text-audio, video-audio), audio-video synchronization,…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

tanabcc/VABench
github

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.