Co-Segmentation without any Pixel-level Supervision with Application to Large-Scale Sketch Classification
Nikolaos-Antonios Ypsilantis, Ond\v{r}ej Chum

TL;DR
This paper introduces a pixel-supervision-free co-segmentation method using pre-trained Vision Transformers, achieving state-of-the-art results and enhancing large-scale sketch classification by leveraging image-level labels and domain adaptation.
Contribution
The novel approach combines two pre-trained ViT models for unsupervised co-segmentation and applies it to improve sketch classification with minimal supervision.
Findings
State-of-the-art co-segmentation performance with image-level supervision.
Significant improvement in sketch classification accuracy.
Effective domain adaptation from natural images to sketches.
Abstract
This work proposes a novel method for object co-segmentation, i.e. pixel-level localization of a common object in a set of images, that uses no pixel-level supervision for training. Two pre-trained Vision Transformer (ViT) models are exploited: ImageNet classification-trained ViT, whose features are used to estimate rough object localization through intra-class token relevance, and a self-supervised DINO-ViT for intra-image token relevance. On recent challenging benchmarks, the method achieves state-of-the-art performance among methods trained with the same level of supervision (image labels) while being competitive with methods trained with pixel-level supervision (binary masks). The benefits of the proposed co-segmentation method are further demonstrated in the task of large-scale sketch recognition, that is, the classification of sketches into a wide range of categories. The limited…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsIndustrial Vision Systems and Defect Detection
MethodsDropout · Layer Normalization · Adam · Attention Is All You Need · Dense Connections · Residual Connection · Position-Wise Feed-Forward Layer · Linear Layer · Byte Pair Encoding · Absolute Position Encodings
