ViTNF: Leveraging Neural Fields to Boost Vision Transformers in Generalized Category Discovery

Jiayi Su; Dequan Jin

arXiv:2506.02367·cs.CV·June 4, 2025

ViTNF: Leveraging Neural Fields to Boost Vision Transformers in Generalized Category Discovery

Jiayi Su, Dequan Jin

PDF

Open Access

TL;DR

This paper introduces ViTNF, a novel neural field-based architecture replacing the MLP head in Vision Transformers, significantly improving generalized category discovery performance with reduced training complexity.

Contribution

The paper proposes a neural field-based classifier for Vision Transformers, simplifying training and enhancing accuracy in generalized category discovery tasks.

Findings

01

Outperforms state-of-the-art on CIFAR-100, ImageNet-100, CUB-200, and Standard Cars.

02

Achieves 19% and 16% accuracy improvements in new and all classes.

03

Reduces training sample requirements and training difficulty.

Abstract

Generalized category discovery (GCD) is a highly popular task in open-world recognition, aiming to identify unknown class samples using known class data. By leveraging pre-training, meta-training, and fine-tuning, ViT achieves excellent few-shot learning capabilities. Its MLP head is a feedforward network, trained synchronously with the entire network in the same process, increasing the training cost and difficulty without fully leveraging the power of the feature extractor. This paper proposes a new architecture by replacing the MLP head with a neural field-based one. We first present a new static neural field function to describe the activity distribution of the neural field and then use two static neural field functions to build an efficient few-shot classifier. This neural field-based (NF) classifier consists of two coupled static neural fields. It stores the feature information of…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsNeural Networks and Applications · Digital Imaging for Blood Diseases · Domain Adaptation and Few-Shot Learning