TCAM: Temporal Class Activation Maps for Object Localization in Weakly-Labeled Unconstrained Videos
Soufiane Belharbi, Ismail Ben Ayed, Luke McCaffrey, Eric Granger

TL;DR
This paper introduces TCAM, a novel method leveraging temporal class activation maps for improved weakly-supervised object localization in unconstrained videos, achieving state-of-the-art accuracy with real-time processing capabilities.
Contribution
The paper proposes a new Temporal CAM approach that aggregates frame-wise CAMs for better spatio-temporal localization in videos, using pseudo-labels and constraints for enhanced accuracy.
Findings
Achieves state-of-the-art localization accuracy on YouTube-Objects datasets.
Enables real-time object localization by processing independent frames in parallel.
Demonstrates adaptability for tasks like object tracking and detection.
Abstract
Weakly supervised video object localization (WSVOL) allows locating object in videos using only global video tags such as object class. State-of-art methods rely on multiple independent stages, where initial spatio-temporal proposals are generated using visual and motion cues, then prominent objects are identified and refined. Localization is done by solving an optimization problem over one or more videos, and video tags are typically used for video clustering. This requires a model per-video or per-class making for costly inference. Moreover, localized regions are not necessary discriminant because of unsupervised motion methods like optical flow, or because video tags are discarded from optimization. In this paper, we leverage the successful class activation mapping (CAM) methods, designed for WSOL based on still images. A new Temporal CAM (TCAM) method is introduced to train a…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
TCAM: Temporal Class Activation Maps for Object Localization in Weakly-Labeled Unconstrained Videos· youtube
Taxonomy
TopicsAdvanced Image and Video Retrieval Techniques · Domain Adaptation and Few-Shot Learning · Robotics and Sensor-Based Localization
MethodsContrastive Language-Image Pre-training · Class-activation map · Max Pooling · Conditional Random Field
