Continually Learn to Map Visual Concepts to Large Language Models in Resource-constrained Environments

Clea Rebillard; Julio Hurtado; Andrii Krutsylo; Lucia Passaro; Vincenzo Lomonaco

arXiv:2407.08279·cs.AI·July 30, 2025

Continually Learn to Map Visual Concepts to Large Language Models in Resource-constrained Environments

Clea Rebillard, Julio Hurtado, Andrii Krutsylo, Lucia Passaro, Vincenzo Lomonaco

PDF

Open Access

TL;DR

This paper introduces CVM, a continual learning method that maps visual representations into a language-based knowledge space, enabling resource-efficient, robust visual learning on constrained devices.

Contribution

CVM is a novel approach that leverages fixed large language models to improve continual visual learning in resource-limited environments.

Findings

01

CVM outperforms state-of-the-art continual learning methods on five benchmarks.

02

CVM enables effective visual concept mapping without large visual model updates.

03

CVM demonstrates robustness and generalization in resource-constrained settings.

Abstract

Learning continually from a stream of non-i.i.d. data is an open challenge in deep learning, even more so when working in resource-constrained environments such as embedded devices. Visual models that are continually updated through supervised learning are often prone to overfitting, catastrophic forgetting, and biased representations. On the other hand, large language models contain knowledge about multiple concepts and their relations, which can foster a more robust, informed and coherent learning process. This work proposes Continual Visual Mapping (CVM), an approach that continually ground vision representations to a knowledge space extracted from a fixed Language model. Specifically, CVM continually trains a small and efficient visual model to map its representations into a conceptual space established by a fixed Large Language Model. Due to their smaller nature, CVM can be used…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsMultimodal Machine Learning Applications · Natural Language Processing Techniques · Topic Modeling