V-Trans4Style: Visual Transition Recommendation for Video Production Style Adaptation
Pooja Guhan, Tsung-Wei Huang, Guan-Ming Su, Subhadra Gopalakrishnan,, Dinesh Manocha

TL;DR
V-Trans4Style is a transformer-based algorithm that recommends visually seamless transitions for videos, enabling style adaptation across various production types with improved accuracy demonstrated on a new dataset.
Contribution
The paper introduces a novel transformer-based model with style conditioning for adaptive video transition recommendation, outperforming existing methods and providing a new dataset for style-specific evaluation.
Findings
Outperforms state-of-the-art by 10-80% in Recall@K and mean rank
Style conditioning improves style similarity by around 12%
Introduces AutoTransition++ dataset with 6k videos categorized by style
Abstract
We introduce V-Trans4Style, an innovative algorithm tailored for dynamic video content editing needs. It is designed to adapt videos to different production styles like documentaries, dramas, feature films, or a specific YouTube channel's video-making technique. Our algorithm recommends optimal visual transitions to help achieve this flexibility using a more bottom-up approach. We first employ a transformer-based encoder-decoder network to learn recommending temporally consistent and visually seamless sequences of visual transitions using only the input videos. We then introduce a style conditioning module that leverages this model to iteratively adjust the visual transitions obtained from the decoder through activation maximization. We demonstrate the efficacy of our method through experiments conducted on our newly introduced AutoTransition++ dataset. It is a 6k video version of…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsGenerative Adversarial Networks and Image Synthesis · Human Motion and Animation · Video Analysis and Summarization
