PDiscoFormer: Relaxing Part Discovery Constraints with Vision   Transformers

Ananthu Aniraj; Cassio F.Dantas; Dino Ienco; Diego Marcos

arXiv:2407.04538·cs.CV·July 23, 2024

PDiscoFormer: Relaxing Part Discovery Constraints with Vision Transformers

Ananthu Aniraj, Cassio F.Dantas, Dino Ienco, Diego Marcos

PDF

1 Repo 10 Models

TL;DR

This paper leverages pre-trained transformer-based vision models to relax geometric constraints in part discovery, significantly improving interpretability and accuracy in fine-grained classification tasks.

Contribution

It introduces a novel approach using vision transformers with a total variation prior, outperforming previous methods in unsupervised part discovery.

Findings

01

Outperforms previous methods on CUB, PartImageNet, Oxford Flowers

02

Achieves better part discovery metrics and classification accuracy

03

Shows that transformer models require different geometric priors for unsupervised part discovery

Abstract

Computer vision methods that explicitly detect object parts and reason on them are a step towards inherently interpretable models. Existing approaches that perform part discovery driven by a fine-grained classification task make very restrictive assumptions on the geometric properties of the discovered parts; they should be small and compact. Although this prior is useful in some cases, in this paper we show that pre-trained transformer-based vision models, such as self-supervised DINOv2 ViT, enable the relaxation of these constraints. In particular, we find that a total variation (TV) prior, which allows for multiple connected components of any size, substantially outperforms previous work. We test our approach on three fine-grained classification benchmarks: CUB, PartImageNet and Oxford Flowers, and compare our results to previously published methods as well as a re-implementation of…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

ananthu-aniraj/pdiscoformer
pytorchOfficial

Models

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.