TL;DR
This paper introduces a deep multimodality learning framework for assessing UAV video aesthetic quality, leveraging multiple data modalities and a novel motion stream network to improve aesthetic judgment and scene classification.
Contribution
It presents a new multistream deep learning model with a specialized motion network for UAV video aesthetic assessment and constructs a large dataset for training and evaluation.
Findings
Outperforms traditional SVM and classification methods.
Accurately judges professional vs. amateur videos.
Enables applications like video grading and path planning.
Abstract
Despite the growing number of unmanned aerial vehicles (UAVs) and aerial videos, there is a paucity of studies focusing on the aesthetics of aerial videos that can provide valuable information for improving the aesthetic quality of aerial photography. In this article, we present a method of deep multimodality learning for UAV video aesthetic quality assessment. More specifically, a multistream framework is designed to exploit aesthetic attributes from multiple modalities, including spatial appearance, drone camera motion, and scene structure. A novel specially designed motion stream network is proposed for this new multistream framework. We construct a dataset with 6,000 UAV video shots captured by drone cameras. Our model can judge whether a UAV video was shot by professional photographers or amateurs together with the scene type classification. The experimental results reveal that our…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
