Unsupervised Part Discovery via Dual Representation Alignment

Jiahao Xia; Wenjian Huang; Min Xu; Jianguo Zhang; Haimin Zhang; Ziyu; Sheng; Dong Xu

arXiv:2408.08108·cs.CV·August 16, 2024

Unsupervised Part Discovery via Dual Representation Alignment

Jiahao Xia, Wenjian Huang, Min Xu, Jianguo Zhang, Haimin Zhang, Ziyu, Sheng, Dong Xu

PDF

Open Access 1 Repo

TL;DR

This paper introduces a novel unsupervised method for part-specific attention learning in images using a dual representation alignment approach with a new module called PartFormer, improving part discovery performance.

Contribution

It proposes a new paradigm and module for unsupervised part attention learning, aligning part representations with feature maps to enhance part discovery.

Findings

01

Achieves competitive performance on four datasets.

02

Demonstrates robustness due to part-specific attention.

03

Provides reliable pixel mask detectors for parts.

Abstract

Object parts serve as crucial intermediate representations in various downstream tasks, but part-level representation learning still has not received as much attention as other vision tasks. Previous research has established that Vision Transformer can learn instance-level attention without labels, extracting high-quality instance-level representations for boosting downstream tasks. In this paper, we achieve unsupervised part-specific attention learning using a novel paradigm and further employ the part representations to improve part discovery performance. Specifically, paired images are generated from the same image with different geometric transformations, and multiple part representations are extracted from these paired images using a novel module, named PartFormer. These part representations from the paired images are then exchanged to improve geometric transformation invariance.…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

jiahao-uts/unsupervisedpart
pytorchOfficial

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsImage Processing and 3D Reconstruction · Image Retrieval and Classification Techniques · Handwritten Text Recognition Techniques

MethodsLinear Layer · Layer Normalization · Multi-Head Attention · Attention Is All You Need · Position-Wise Feed-Forward Layer · Adam · Byte Pair Encoding · Softmax · Absolute Position Encodings · Vision Transformer