SIEFormer: Spectral-Interpretable and -Enhanced Transformer for Generalized Category Discovery

Chunming Li; Shidong Wang; Tong Xin; and Haofeng Zhang

arXiv:2602.13067·cs.CV·February 16, 2026

SIEFormer: Spectral-Interpretable and -Enhanced Transformer for Generalized Category Discovery

Chunming Li, Shidong Wang, Tong Xin, and Haofeng Zhang

PDF

Open Access

TL;DR

SIEFormer introduces a spectral analysis-based transformer architecture that improves feature adaptability and interpretability for generalized category discovery tasks, achieving state-of-the-art results.

Contribution

The paper proposes a novel spectral-interpretable transformer with implicit and explicit branches, including a Band-adaptive Filter and Maneuverable Filtering Layer, enhancing global and local token dependencies.

Findings

01

Achieves state-of-the-art performance on multiple image recognition datasets.

02

Demonstrates the effectiveness of spectral analysis in transformer interpretability.

03

Validates the approach through extensive ablation studies and visualizations.

Abstract

This paper presents a novel approach, Spectral-Interpretable and -Enhanced Transformer (SIEFormer), which leverages spectral analysis to reinterpret the attention mechanism within Vision Transformer (ViT) and enhance feature adaptability, with particular emphasis on challenging Generalized Category Discovery (GCD) tasks. The proposed SIEFormer is composed of two main branches, each corresponding to an implicit and explicit spectral perspective of the ViT, enabling joint optimization. The implicit branch realizes the use of different types of graph Laplacians to model the local structure correlations of tokens, along with a novel Band-adaptive Filter (BaF) layer that can flexibly perform both band-pass and band-reject filtering. The explicit branch, on the other hand, introduces a Maneuverable Filtering Layer (MFL) that learns global dependencies among tokens by applying the Fourier…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsFace Recognition and Perception · Domain Adaptation and Few-Shot Learning · Face and Expression Recognition