Customized Multiple Clustering via Multi-Modal Subspace Proxy Learning

Jiawei Yao; Qi Qian; Juhua Hu

arXiv:2411.03978·cs.LG·November 7, 2024

Customized Multiple Clustering via Multi-Modal Subspace Proxy Learning

Jiawei Yao, Qi Qian, Juhua Hu

PDF

Open Access 1 Repo 1 Video

TL;DR

This paper introduces Multi-Sub, an innovative end-to-end multiple clustering method that leverages multi-modal subspace proxy learning with CLIP and GPT-4 to enable user-specific, customizable data grouping.

Contribution

The paper proposes a novel framework combining CLIP and GPT-4 for automatic proxy word generation, allowing flexible, user-oriented multiple clustering.

Findings

01

Outperforms existing methods on various datasets

02

Effectively aligns textual prompts with visual data

03

Enables customizable data clustering based on user preferences

Abstract

Multiple clustering aims to discover various latent structures of data from different aspects. Deep multiple clustering methods have achieved remarkable performance by exploiting complex patterns and relationships in data. However, existing works struggle to flexibly adapt to diverse user-specific needs in data grouping, which may require manual understanding of each clustering. To address these limitations, we introduce Multi-Sub, a novel end-to-end multiple clustering approach that incorporates a multi-modal subspace proxy learning framework in this work. Utilizing the synergistic capabilities of CLIP and GPT-4, Multi-Sub aligns textual prompts expressing user preferences with their corresponding visual representations. This is achieved by automatically generating proxy words from large language models that act as subspace bases, thus allowing for the customized representation of data…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

alexander-yao/multi-sub
pytorchOfficial

Videos

Customized Multiple Clustering via Multi-Modal Subspace Proxy Learning· slideslive

Taxonomy

TopicsFace and Expression Recognition

MethodsLinear Layer · Layer Normalization · Position-Wise Feed-Forward Layer · Adam · Attention Is All You Need · Multi-Head Attention · Residual Connection · Byte Pair Encoding · Dropout · Absolute Position Encodings