Long-range Multimodal Pretraining for Movie Understanding
Dawit Mureja Argaw, Joon-Young Lee, Markus Woodson, In So Kweon,, Fabian Caba Heilbron

TL;DR
This paper introduces a novel long-range multimodal pretraining approach for movies, improving transferability and state-of-the-art performance across multiple movie understanding benchmarks.
Contribution
The work presents a new pretraining strategy and model that leverage long-range relationships across all movie modalities, enhancing transferability and efficiency.
Findings
Achieves state-of-the-art results on LVU benchmark tasks.
Demonstrates significant data efficiency over previous methods.
Sets new state-of-the-art in five different transfer benchmarks.
Abstract
Learning computer vision models from (and for) movies has a long-standing history. While great progress has been attained, there is still a need for a pretrained multimodal model that can perform well in the ever-growing set of movie understanding tasks the community has been establishing. In this work, we introduce Long-range Multimodal Pretraining, a strategy, and a model that leverages movie data to train transferable multimodal and cross-modal encoders. Our key idea is to learn from all modalities in a movie by observing and extracting relationships over a long-range. After pretraining, we run ablation studies on the LVU benchmark and validate our modeling choices and the importance of learning from long-range time spans. Our model achieves state-of-the-art on several LVU tasks while being much more data efficient than previous works. Finally, we evaluate our model's transferability…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsMultimodal Machine Learning Applications · Human Pose and Action Recognition · Advanced Image and Video Retrieval Techniques
