Automating Video Thumbnails Selection and Generation with Multimodal and Multistage Analysis
Elia Fantini

TL;DR
This thesis introduces an automated, multimodal, multistage pipeline for selecting and generating video thumbnails that meet aesthetic and representational criteria, improving efficiency and user engagement.
Contribution
It presents a novel multistage pipeline combining state-of-the-art models and large language models for automated thumbnail selection and generation, with a GUI tool for rapid evaluation.
Findings
Over 53% of proposed thumbnails matched professional choices.
Participants preferred 45.77% of our method's thumbnails.
Professionals saw a 3.57-fold increase in valid candidates.
Abstract
This thesis presents an innovative approach to automate video thumbnail selection for traditional broadcast content. Our methodology establishes stringent criteria for diverse, representative, and aesthetically pleasing thumbnails, considering factors like logo placement space, incorporation of vertical aspect ratios, and accurate recognition of facial identities and emotions. We introduce a sophisticated multistage pipeline that can select candidate frames or generate novel images by blending video elements or using diffusion models. The pipeline incorporates state-of-the-art models for various tasks, including downsampling, redundancy reduction, automated cropping, face recognition, closed-eye and emotion detection, shot scale and aesthetic prediction, segmentation, matting, and harmonization. It also leverages large language models and visual transformers for semantic consistency. A…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsVideo Analysis and Summarization · Human Motion and Animation · Advanced Image and Video Retrieval Techniques
MethodsDiffusion
