CoLLIE: Continual Learning of Language Grounding from Language-Image   Embeddings

Gabriel Skantze; Bram Willemsen

arXiv:2111.07993·cs.CL·July 12, 2022

CoLLIE: Continual Learning of Language Grounding from Language-Image Embeddings

Gabriel Skantze, Bram Willemsen

PDF

1 Repo

TL;DR

CoLLIE is a continual learning model that adapts language embeddings in multimodal spaces like CLIP, enabling efficient learning of new language use with minimal interference, demonstrated on referring expression tasks.

Contribution

It introduces a transformation-based approach for continual language grounding in vision models, enhancing generalization and few-shot learning capabilities.

Findings

01

Effective adaptation to new language use with few examples

02

Minimal impact on original zero-shot performance

03

Successful application to referring expression tasks

Abstract

This paper presents CoLLIE: a simple, yet effective model for continual learning of how language is grounded in vision. Given a pre-trained multimodal embedding model, where language and images are projected in the same semantic space (in this case CLIP by OpenAI), CoLLIE learns a transformation function that adjusts the language embeddings when needed to accommodate new language use. This is done by predicting the difference vector that needs to be applied, as well as a scaling factor for this vector, so that the adjustment is only applied when needed. Unlike traditional few-shot learning, the model does not just learn new classes and labels, but can also generalize to similar language use and leverage semantic compositionality. We verify the model's performance on two different tasks of identifying the targets of referring expressions, where it has to learn new language use. The…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

gabriel-skantze/CoLLIE
pytorchOfficial

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

MethodsContrastive Language-Image Pre-training