Unsupervised Part Discovery via Dual Representation Alignment
Jiahao Xia, Wenjian Huang, Min Xu, Jianguo Zhang, Haimin Zhang, Ziyu, Sheng, Dong Xu

TL;DR
This paper introduces a novel unsupervised method for part-specific attention learning in images using a dual representation alignment approach with a new module called PartFormer, improving part discovery performance.
Contribution
It proposes a new paradigm and module for unsupervised part attention learning, aligning part representations with feature maps to enhance part discovery.
Findings
Achieves competitive performance on four datasets.
Demonstrates robustness due to part-specific attention.
Provides reliable pixel mask detectors for parts.
Abstract
Object parts serve as crucial intermediate representations in various downstream tasks, but part-level representation learning still has not received as much attention as other vision tasks. Previous research has established that Vision Transformer can learn instance-level attention without labels, extracting high-quality instance-level representations for boosting downstream tasks. In this paper, we achieve unsupervised part-specific attention learning using a novel paradigm and further employ the part representations to improve part discovery performance. Specifically, paired images are generated from the same image with different geometric transformations, and multiple part representations are extracted from these paired images using a novel module, named PartFormer. These part representations from the paired images are then exchanged to improve geometric transformation invariance.…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsImage Processing and 3D Reconstruction · Image Retrieval and Classification Techniques · Handwritten Text Recognition Techniques
MethodsLinear Layer · Layer Normalization · Multi-Head Attention · Attention Is All You Need · Position-Wise Feed-Forward Layer · Adam · Byte Pair Encoding · Softmax · Absolute Position Encodings · Vision Transformer
