TL;DR
This paper leverages pre-trained transformer-based vision models to relax geometric constraints in part discovery, significantly improving interpretability and accuracy in fine-grained classification tasks.
Contribution
It introduces a novel approach using vision transformers with a total variation prior, outperforming previous methods in unsupervised part discovery.
Findings
Outperforms previous methods on CUB, PartImageNet, Oxford Flowers
Achieves better part discovery metrics and classification accuracy
Shows that transformer models require different geometric priors for unsupervised part discovery
Abstract
Computer vision methods that explicitly detect object parts and reason on them are a step towards inherently interpretable models. Existing approaches that perform part discovery driven by a fine-grained classification task make very restrictive assumptions on the geometric properties of the discovered parts; they should be small and compact. Although this prior is useful in some cases, in this paper we show that pre-trained transformer-based vision models, such as self-supervised DINOv2 ViT, enable the relaxation of these constraints. In particular, we find that a total variation (TV) prior, which allows for multiple connected components of any size, substantially outperforms previous work. We test our approach on three fine-grained classification benchmarks: CUB, PartImageNet and Oxford Flowers, and compare our results to previously published methods as well as a re-implementation of…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
- 🤗ananthu-aniraj/pdiscoformer_cub_k_4model· 4 dl4 dl
- 🤗ananthu-aniraj/pdiscoformer_cub_k_8model· 5 dl5 dl
- 🤗ananthu-aniraj/pdiscoformer_cub_k_16model· 5 dl5 dl
- 🤗ananthu-aniraj/pdiscoformer_part_imagenet_ood_k_8model· 4 dl4 dl
- 🤗ananthu-aniraj/pdiscoformer_part_imagenet_ood_k_25model· 6 dl6 dl
- 🤗ananthu-aniraj/pdiscoformer_part_imagenet_ood_k_50model· 4 dl4 dl
- 🤗ananthu-aniraj/pdiscoformer_flowers_k_2model· 4 dl4 dl
- 🤗ananthu-aniraj/pdiscoformer_flowers_k_4model· 4 dl4 dl
- 🤗ananthu-aniraj/pdiscoformer_flowers_k_8model· 3 dl3 dl
- 🤗ananthu-aniraj/pdiscoformer_pimagenet_seg_k_16model· 4 dl4 dl
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
