
TL;DR
This paper presents a data-driven approach to automatic video editing that leverages neural networks and imitation learning to select and cut footage based on learned cinematography rules, aiming to produce engaging visual stories.
Contribution
It introduces a novel method combining visual feature extraction and imitation learning for automatic video editing, mimicking professional editing principles.
Findings
Controller learns basic cinematography editing rules
Produces coherent and visually appealing video edits
Demonstrates effectiveness on a corpus of motion pictures
Abstract
Automatic video editing involving at least the steps of selecting the most valuable footage from points of view of visual quality and the importance of action filmed; and cutting the footage into a brief and coherent visual story that would be interesting to watch is implemented in a purely data-driven manner. Visual semantic and aesthetic features are extracted by the ImageNet-trained convolutional neural network, and the editing controller is trained by an imitation learning algorithm. As a result, at test time the controller shows the signs of observing basic cinematography editing rules learned from the corpus of motion pictures masterpieces.
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
