MovieCLIP: Visual Scene Recognition in Movies

Digbalay Bose; Rajat Hebbar; Krishna Somandepalli; Haoyang Zhang; Yin; Cui; Kree Cole-McLaughlin; Huisheng Wang; Shrikanth Narayanan

arXiv:2210.11065·cs.CV·October 25, 2022

MovieCLIP: Visual Scene Recognition in Movies

Digbalay Bose, Rajat Hebbar, Krishna Somandepalli, Haoyang Zhang, Yin, Cui, Kree Cole-McLaughlin, Huisheng Wang, Shrikanth Narayanan

PDF

Open Access 1 Repo 1 Video

TL;DR

This paper introduces MovieCLIP, a large-scale dataset and model for recognizing complex visual scenes in movies, leveraging weak supervision from CLIP to improve scene understanding and downstream tasks.

Contribution

The work creates a new extensive movie-centric scene taxonomy and a weakly labeled dataset using CLIP, enabling improved visual scene recognition in movies.

Findings

01

Weakly labeled dataset of 1.12 million shots from 32K movie clips.

02

Baseline models trained on MovieCLIP outperform previous methods.

03

Features from MovieCLIP enhance downstream scene and genre classification.

Abstract

Longform media such as movies have complex narrative structures, with events spanning a rich variety of ambient visual scenes. Domain specific challenges associated with visual scenes in movies include transitions, person coverage, and a wide array of real-life and fictional scenarios. Existing visual scene datasets in movies have limited taxonomies and don't consider the visual scene transition within movie clips. In this work, we address the problem of visual scene recognition in movies by first automatically curating a new and extensive movie-centric taxonomy of 179 scene labels derived from movie scripts and auxiliary web-based video datasets. Instead of manual annotations which can be expensive, we use CLIP to weakly label 1.12 million shots from 32K movie clips based on our proposed taxonomy. We provide baseline visual models trained on the weakly labeled dataset called MovieCLIP…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

usc-sail/mica-MovieCLIP
pytorchOfficial

Videos

MovieCLIP: Visual Scene Recognition in Movies· youtube

Taxonomy

TopicsVideo Analysis and Summarization · Multimodal Machine Learning Applications · Advanced Image and Video Retrieval Techniques

MethodsContrastive Language-Image Pre-training