Co-Segmentation without any Pixel-level Supervision with Application to   Large-Scale Sketch Classification

Nikolaos-Antonios Ypsilantis; Ond\v{r}ej Chum

arXiv:2410.13582·cs.CV·October 18, 2024

Co-Segmentation without any Pixel-level Supervision with Application to Large-Scale Sketch Classification

Nikolaos-Antonios Ypsilantis, Ond\v{r}ej Chum

PDF

Open Access

TL;DR

This paper introduces a pixel-supervision-free co-segmentation method using pre-trained Vision Transformers, achieving state-of-the-art results and enhancing large-scale sketch classification by leveraging image-level labels and domain adaptation.

Contribution

The novel approach combines two pre-trained ViT models for unsupervised co-segmentation and applies it to improve sketch classification with minimal supervision.

Findings

01

State-of-the-art co-segmentation performance with image-level supervision.

02

Significant improvement in sketch classification accuracy.

03

Effective domain adaptation from natural images to sketches.

Abstract

This work proposes a novel method for object co-segmentation, i.e. pixel-level localization of a common object in a set of images, that uses no pixel-level supervision for training. Two pre-trained Vision Transformer (ViT) models are exploited: ImageNet classification-trained ViT, whose features are used to estimate rough object localization through intra-class token relevance, and a self-supervised DINO-ViT for intra-image token relevance. On recent challenging benchmarks, the method achieves state-of-the-art performance among methods trained with the same level of supervision (image labels) while being competitive with methods trained with pixel-level supervision (binary masks). The benefits of the proposed co-segmentation method are further demonstrated in the task of large-scale sketch recognition, that is, the classification of sketches into a wide range of categories. The limited…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsIndustrial Vision Systems and Defect Detection

MethodsDropout · Layer Normalization · Adam · Attention Is All You Need · Dense Connections · Residual Connection · Position-Wise Feed-Forward Layer · Linear Layer · Byte Pair Encoding · Absolute Position Encodings