Caption, Create, Continue: Continual Learning with Pre-trained Generative Vision-Language Models
Indu Solomon, Aye Phyu Phyu Aung, Uttam Kumar, Senthilnath Jayavelu

TL;DR
This paper introduces CLTS, a continual learning framework that uses pre-trained vision-language models to mitigate forgetting without storing real data, achieving high accuracy and memory efficiency.
Contribution
CLTS is a novel continual learning approach that leverages generative models and task routing to reduce data storage needs and improve performance.
Findings
Up to 54% improvement in average task accuracy
63 times better memory efficiency than recent baselines
Effective handling of class-incremental tasks without real data storage
Abstract
Continual learning (CL) enables models to adapt to evolving data streams without catastrophic forgetting, a fundamental requirement for real-world AI systems. However, the current methods often depend on large replay buffers or heavily annotated datasets which are impractical due to storage, privacy, and cost constraints. We propose CLTS (Continual Learning via Text-Image Synergy), a novel class-incremental framework that mitigates forgetting without storing real task data. CLTS leverages pre-trained vision-language models, BLIP (Bootstrapping Language-Image Pre-training) for caption generation and stable diffusion for sample generation. Each task is handled by a dedicated Task Head, while a Task Router learns to assign inputs to the correct Task Head using the generated data. On three benchmark datasets, CLTS improves average task accuracy by up to 54% and achieves 63 times better…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsProblem and Project Based Learning · Intelligent Tutoring Systems and Adaptive Learning
MethodsDiffusion
