OCFormer: One-Class Transformer Network for Image Classification

Prerana Mukherjee; Chandan Kumar Roy; Swalpa Kumar Roy

arXiv:2204.11449·cs.CV·April 26, 2022·5 cites

OCFormer: One-Class Transformer Network for Image Classification

Prerana Mukherjee, Chandan Kumar Roy, Swalpa Kumar Roy

PDF

Open Access

TL;DR

OCFormer introduces a novel Vision Transformer-based framework for one-class classification, utilizing Gaussian noise as pseudo-negatives, achieving superior results over CNN-based methods on multiple datasets.

Contribution

This work is the first to apply Vision Transformers to one-class classification, introducing a Gaussian noise pseudo-negative approach and an optimized loss function.

Findings

01

Significant performance improvements over CNN-based methods

02

Effective latent space representation using Gaussian noise

03

Validated on multiple benchmark datasets

Abstract

We propose a novel deep learning framework based on Vision Transformers (ViT) for one-class classification. The core idea is to use zero-centered Gaussian noise as a pseudo-negative class for latent space representation and then train the network using the optimal loss function. In prior works, there have been tremendous efforts to learn a good representation using varieties of loss functions, which ensures both discriminative and compact properties. The proposed one-class Vision Transformer (OCFormer) is exhaustively experimented on CIFAR-10, CIFAR-100, Fashion-MNIST and CelebA eyeglasses datasets. Our method has shown significant improvements over competing CNN based one-class classifier approaches.

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsRetinal Imaging and Analysis · Digital Imaging for Blood Diseases · Brain Tumor Detection and Classification

MethodsMulti-Head Attention · Attention Is All You Need · Linear Layer · Byte Pair Encoding · Position-Wise Feed-Forward Layer · Dense Connections · Softmax · Label Smoothing · Vision Transformer · Dropout