PIVOT: Prompting for Video Continual Learning

Andr\'es Villa; Juan Le\'on Alc\'azar; Motasem Alfarra; Kumail; Alhamoud; Julio Hurtado; Fabian Caba Heilbron; Alvaro Soto; Bernard Ghanem

arXiv:2212.04842·cs.CV·April 6, 2023

PIVOT: Prompting for Video Continual Learning

Andr\'es Villa, Juan Le\'on Alc\'azar, Motasem Alfarra, Kumail, Alhamoud, Julio Hurtado, Fabian Caba Heilbron, Alvaro Soto, Bernard Ghanem

PDF

Open Access

TL;DR

PIVOT is a novel prompting-based method for video continual learning that leverages pre-trained image models, reducing training complexity and forgetting, and significantly outperforms existing methods on ActivityNet.

Contribution

Introduces PIVOT, the first prompting approach for video continual learning that uses pre-trained image models without in-domain pre-training.

Findings

01

PIVOT improves state-of-the-art by 27% on ActivityNet.

02

Effective use of prompting reduces forgetting in continual learning.

03

Leverages pre-trained models to minimize trainable parameters.

Abstract

Modern machine learning pipelines are limited due to data availability, storage quotas, privacy regulations, and expensive annotation processes. These constraints make it difficult or impossible to train and update large-scale models on such dynamic annotated sets. Continual learning directly approaches this problem, with the ultimate goal of devising methods where a deep neural network effectively learns relevant patterns for new (unseen) classes, without significantly altering its performance on previously learned ones. In this paper, we address the problem of continual learning for video data. We introduce PIVOT, a novel method that leverages extensive knowledge in pre-trained models from the image domain, thereby reducing the number of trainable parameters and the associated forgetting. Unlike previous methods, ours is the first approach that effectively uses prompting mechanisms…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsDomain Adaptation and Few-Shot Learning · Multimodal Machine Learning Applications · Human Pose and Action Recognition