TL;DR
This paper introduces VADB, a large-scale video aesthetic database with multi-dimensional annotations, and VADB-Net, a dual-modal pre-training framework that advances video aesthetic assessment by leveraging rich annotations and multimodal data.
Contribution
The paper presents the first large-scale, professionally annotated video aesthetic database and a novel dual-modal pre-training framework for improved aesthetic assessment.
Findings
VADB-Net outperforms existing models in scoring tasks.
The database enables multi-dimensional aesthetic analysis.
Rich annotations facilitate downstream aesthetic assessment tasks.
Abstract
Video aesthetic assessment, a vital area in multimedia computing, integrates computer vision with human cognition. Its progress is limited by the lack of standardized datasets and robust models, as the temporal dynamics of video and multimodal fusion challenges hinder direct application of image-based methods. This study introduces VADB, the largest video aesthetic database with 10,490 diverse videos annotated by 37 professionals across multiple aesthetic dimensions, including overall and attribute-specific aesthetic scores, rich language comments and objective tags. We propose VADB-Net, a dual-modal pre-training framework with a two-stage training strategy, which outperforms existing video quality assessment models in scoring tasks and supports downstream video aesthetic assessment tasks. The dataset and source code are available at https://github.com/BestiVictory/VADB.
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
