Automatic Funny Scene Extraction from Long-form Cinematic Videos
Sibendu Paul, Haotian Jiang, Caren Chen

TL;DR
This paper presents an end-to-end system for automatically extracting and ranking humorous scenes from long cinematic videos, improving scene detection and humor tagging to enhance content creation and user engagement.
Contribution
It introduces novel multimodal scene localization and humor tagging methods tailored for cinematic content, achieving significant improvements over existing techniques.
Findings
18.3% AP improvement in scene detection
F1 score of 0.834 for humor detection
87% of extracted clips are humorous
Abstract
Automatically extracting engaging and high-quality humorous scenes from cinematic titles is pivotal for creating captivating video previews and snackable content, boosting user engagement on streaming platforms. Long-form cinematic titles, with their extended duration and complex narratives, challenge scene localization, while humor's reliance on diverse modalities and its nuanced style add further complexity. This paper introduces an end-to-end system for automatically identifying and ranking humorous scenes from long-form cinematic titles, featuring shot detection, multimodal scene localization, and humor tagging optimized for cinematic content. Key innovations include a novel scene segmentation approach combining visual and textual cues, improved shot representations via guided triplet mining, and a multimodal humor tagging framework leveraging both audio and text. Our system…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsVideo Analysis and Summarization · Multimodal Machine Learning Applications · Humor Studies and Applications
