An Image is Worth Multiple Words: Discovering Object Level Concepts   using Multi-Concept Prompt Learning

Chen Jin; Ryutaro Tanno; Amrutha Saseendran; Tom Diethe; Philip Teare

arXiv:2310.12274·cs.CV·May 28, 2024·1 cites

An Image is Worth Multiple Words: Discovering Object Level Concepts using Multi-Concept Prompt Learning

Chen Jin, Ryutaro Tanno, Amrutha Saseendran, Tom Diethe, Philip Teare

PDF

Open Access 2 Repos

TL;DR

This paper introduces Multi-Concept Prompt Learning (MCPL), a method to learn multiple unknown object-level concepts simultaneously from a single image-text pair without annotations, improving concept disentanglement and efficiency.

Contribution

The paper proposes MCPL, a novel prompt learning approach that learns multiple concepts at once without prior knowledge or annotations, with regularisation techniques to improve accuracy.

Findings

01

Successfully learns semantically disentangled concepts

02

Requires less than 10% storage space compared to existing methods

03

Effective on real-world and biomedical images

Abstract

Textural Inversion, a prompt learning method, learns a singular text embedding for a new "word" to represent image style and appearance, allowing it to be integrated into natural language sentences to generate novel synthesised images. However, identifying multiple unknown object-level concepts within one scene remains a complex challenge. While recent methods have resorted to cropping or masking individual images to learn multiple concepts, these techniques often require prior knowledge of new concepts and are labour-intensive. To address this challenge, we introduce Multi-Concept Prompt Learning (MCPL), where multiple unknown "words" are simultaneously learned from a single sentence-image pair, without any imagery annotations. To enhance the accuracy of word-concept correlation and refine attention mask boundaries, we propose three regularisation techniques: Attention Masking, Prompts…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsMultimodal Machine Learning Applications · Domain Adaptation and Few-Shot Learning · Advanced Image and Video Retrieval Techniques