Multi-Modal Proxy Learning Towards Personalized Visual Multiple   Clustering

Jiawei Yao; Qi Qian; Juhua Hu

arXiv:2404.15655·cs.CV·April 25, 2024

Multi-Modal Proxy Learning Towards Personalized Visual Multiple Clustering

Jiawei Yao, Qi Qian, Juhua Hu

PDF

Open Access 1 Repo

TL;DR

Multi-MaP introduces a multi-modal proxy learning framework that uses CLIP and GPT-4 to align user interests with relevant visual clusterings, significantly improving multi-clustering performance.

Contribution

The paper presents a novel multi-modal proxy learning approach that incorporates large language models to personalize and enhance multi-clustering of visual data.

Findings

01

Outperforms state-of-the-art multi-clustering methods on benchmarks.

02

Effectively captures user interests through keyword-based text proxies.

03

Demonstrates robustness across diverse visual clustering tasks.

Abstract

Multiple clustering has gained significant attention in recent years due to its potential to reveal multiple hidden structures of data from different perspectives. The advent of deep multiple clustering techniques has notably advanced the performance by uncovering complex patterns and relationships within large datasets. However, a major challenge arises as users often do not need all the clusterings that algorithms generate, and figuring out the one needed requires a substantial understanding of each clustering result. Traditionally, aligning a user's brief keyword of interest with the corresponding vision components was challenging, but the emergence of multi-modal and large language models (LLMs) has begun to bridge this gap. In response, given unlabeled target visual data, we propose Multi-MaP, a novel method employing a multi-modal proxy learning process. It leverages CLIP encoders…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

alexander-yao/multi-map
pytorchOfficial

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsImage Retrieval and Classification Techniques · Advanced Clustering Algorithms Research · Video Analysis and Summarization

MethodsAttention Is All You Need · Dropout · Softmax · Position-Wise Feed-Forward Layer · Byte Pair Encoding · Absolute Position Encodings · Linear Layer · Dense Connections · Label Smoothing · Residual Connection