MetaReVision: Meta-Learning with Retrieval for Visually Grounded Compositional Concept Acquisition
Guangyue Xu, Parisa Kordjamshidi, Joyce Chai

TL;DR
MetaReVision is a retrieval-augmented meta-learning framework that enables rapid recognition of novel compositional concepts in images by leveraging primitive concepts from past experiences, advancing grounded visual understanding.
Contribution
It introduces a novel retrieval-enhanced meta-learning model for grounded compositional concept learning, incorporating primitive concept retrieval to improve generalization.
Findings
MetaReVision outperforms baseline models on CompCOCO and CompFlickr datasets.
The retrieval module significantly enhances compositional learning performance.
MetaReVision enables fast adaptation to recognize new compositional concepts.
Abstract
Humans have the ability to learn novel compositional concepts by recalling and generalizing primitive concepts acquired from past experiences. Inspired by this observation, in this paper, we propose MetaReVision, a retrieval-enhanced meta-learning model to address the visually grounded compositional concept learning problem. The proposed MetaReVision consists of a retrieval module and a meta-learning module which are designed to incorporate retrieved primitive concepts as a supporting set to meta-train vision-anguage models for grounded compositional concept recognition. Through meta-learning from episodes constructed by the retriever, MetaReVision learns a generic compositional representation that can be fast updated to recognize novel compositional concepts. We create CompCOCO and CompFlickr to benchmark the grounded compositional concept learning. Our experimental results show that…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsMultimodal Machine Learning Applications · Domain Adaptation and Few-Shot Learning · Topic Modeling
MethodsSparse Evolutionary Training
