OCFormer: One-Class Transformer Network for Image Classification
Prerana Mukherjee, Chandan Kumar Roy, Swalpa Kumar Roy

TL;DR
OCFormer introduces a novel Vision Transformer-based framework for one-class classification, utilizing Gaussian noise as pseudo-negatives, achieving superior results over CNN-based methods on multiple datasets.
Contribution
This work is the first to apply Vision Transformers to one-class classification, introducing a Gaussian noise pseudo-negative approach and an optimized loss function.
Findings
Significant performance improvements over CNN-based methods
Effective latent space representation using Gaussian noise
Validated on multiple benchmark datasets
Abstract
We propose a novel deep learning framework based on Vision Transformers (ViT) for one-class classification. The core idea is to use zero-centered Gaussian noise as a pseudo-negative class for latent space representation and then train the network using the optimal loss function. In prior works, there have been tremendous efforts to learn a good representation using varieties of loss functions, which ensures both discriminative and compact properties. The proposed one-class Vision Transformer (OCFormer) is exhaustively experimented on CIFAR-10, CIFAR-100, Fashion-MNIST and CelebA eyeglasses datasets. Our method has shown significant improvements over competing CNN based one-class classifier approaches.
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsRetinal Imaging and Analysis · Digital Imaging for Blood Diseases · Brain Tumor Detection and Classification
MethodsMulti-Head Attention · Attention Is All You Need · Linear Layer · Byte Pair Encoding · Position-Wise Feed-Forward Layer · Dense Connections · Softmax · Label Smoothing · Vision Transformer · Dropout
