A Hidden Stumbling Block in Generalized Category Discovery: Distracted Attention

Qiyu Xu; Zhanxuan Hu; Yu Duan; Ercheng Pei; Yonghang Tai

arXiv:2507.14315·cs.CV·July 22, 2025

A Hidden Stumbling Block in Generalized Category Discovery: Distracted Attention

Qiyu Xu, Zhanxuan Hu, Yu Duan, Ercheng Pei, Yonghang Tai

PDF

TL;DR

This paper identifies a distracted attention issue in Generalized Category Discovery, proposing an Attention Focusing mechanism with Token Importance Measurement and Token Adaptive Pruning to improve model focus and performance.

Contribution

It introduces a novel Attention Focusing module with two components, TIME and TAP, to address distracted attention in GCD, enhancing existing methods with minimal overhead.

Findings

01

Up to 15.4% performance improvement on GCD baseline

02

AF module is lightweight and easily integrated

03

Significant focus sharpening improves feature extraction

Abstract

Generalized Category Discovery (GCD) aims to classify unlabeled data from both known and unknown categories by leveraging knowledge from labeled known categories. While existing methods have made notable progress, they often overlook a hidden stumbling block in GCD: distracted attention. Specifically, when processing unlabeled data, models tend to focus not only on key objects in the image but also on task-irrelevant background regions, leading to suboptimal feature extraction. To remove this stumbling block, we propose Attention Focusing (AF), an adaptive mechanism designed to sharpen the model's focus by pruning non-informative tokens. AF consists of two simple yet effective components: Token Importance Measurement (TIME) and Token Adaptive Pruning (TAP), working in a cascade. TIME quantifies token importance across multiple scales, while TAP prunes non-informative tokens by utilizing…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.