Open-World Dynamic Prompt and Continual Visual Representation Learning

Youngeun Kim; Jun Fang; Qin Zhang; Zhaowei Cai; Yantao Shen; Rahul; Duggal; Dripta S. Raychaudhuri; Zhuowen Tu; Yifan Xing; Onkar Dabeer

arXiv:2409.05312·cs.CV·October 1, 2024

Open-World Dynamic Prompt and Continual Visual Representation Learning

Youngeun Kim, Jun Fang, Qin Zhang, Zhaowei Cai, Yantao Shen, Rahul, Duggal, Dripta S. Raychaudhuri, Zhuowen Tu, Yifan Xing, Onkar Dabeer

PDF

Open Access

TL;DR

This paper introduces DPaRL, a prompt-based continual learning method for open-world visual representation, which dynamically generates prompts and learns discriminative features to improve image retrieval in evolving environments.

Contribution

The paper proposes a novel dynamic prompt generation approach for continual learning, enhancing open-world visual representation learning beyond static prompt methods.

Findings

01

DPaRL outperforms state-of-the-art methods by 4.7% in Recall@1.

02

Dynamic prompts improve generalization to unseen classes.

03

Joint learning of prompts and representations enhances retrieval accuracy.

Abstract

The open world is inherently dynamic, characterized by ever-evolving concepts and distributions. Continual learning (CL) in this dynamic open-world environment presents a significant challenge in effectively generalizing to unseen test-time classes. To address this challenge, we introduce a new practical CL setting tailored for open-world visual representation learning. In this setting, subsequent data streams systematically introduce novel classes that are disjoint from those seen in previous training phases, while also remaining distinct from the unseen test classes. In response, we present Dynamic Prompt and Representation Learner (DPaRL), a simple yet effective Prompt-based CL (PCL) method. Our DPaRL learns to generate dynamic prompts for inference, as opposed to relying on a static prompt pool in previous PCL methods. In addition, DPaRL jointly learns dynamic prompt generation and…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsHuman Pose and Action Recognition · Advanced Vision and Imaging · Multimodal Machine Learning Applications