From Trailers to Storylines: An Efficient Way to Learn from Movies
Qingqiu Huang, Yuanjun Xiong, Yu Xiong, Yuqi Zhang, Dahua Lin

TL;DR
This paper introduces a novel framework for learning visual and temporal features from movies by using trailers for visual learning and full movies for temporal analysis, significantly reducing training costs.
Contribution
It proposes a new approach that separates visual and temporal learning from different data sources, enabling efficient learning from long movies.
Findings
Reduces training time substantially
Learns effective visual features from trailers
Preserves long-term temporal structures
Abstract
The millions of movies produced in the human history are valuable resources for computer vision research. However, learning a vision model from movie data would meet with serious difficulties. A major obstacle is the computational cost -- the length of a movie is often over one hour, which is substantially longer than the short video clips that previous study mostly focuses on. In this paper, we explore an alternative approach to learning vision models from movies. Specifically, we consider a framework comprised of a visual module and a temporal analysis module. Unlike conventional learning methods, the proposed approach learns these modules from different sets of data -- the former from trailers while the latter from movies. This allows distinctive visual features to be learned within a reasonable budget while still preserving long-term temporal structures across an entire movie. We…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsVideo Analysis and Summarization · Human Pose and Action Recognition · Multimodal Machine Learning Applications
