Generalized Category Discovery under Domain Shifts: From Vision to Vision-Language Models

Hongjun Wang; Po Hu; Kai Han

arXiv:2605.00906·cs.CV·May 5, 2026

Generalized Category Discovery under Domain Shifts: From Vision to Vision-Language Models

Hongjun Wang, Po Hu, Kai Han

PDF

1 Repo

TL;DR

This paper introduces three frameworks for generalized category discovery under domain shifts, leveraging foundation models from vision to vision-language, with extensive experiments showing improved performance.

Contribution

It proposes novel methods HiLo, HLPrompt, and VLPrompt that adapt foundation models for GCD across domain shifts, a scenario less explored in prior work.

Findings

01

Consistent improvements over strong baselines on synthetic and real-world shifts.

02

The methods effectively disentangle domain and semantic features.

03

Leveraging vision-language models enhances GCD performance.

Abstract

Generalized Category Discovery (GCD) aims to categorize unlabelled instances from both known and unknown classes by transferring knowledge from labelled data of known classes. Existing methods assume all data comes from a single domain, yet real-world unlabelled data often exhibits domain shifts alongside semantic shifts. We study GCD under domain shifts and propose three frameworks that adapt foundation models, ranging from self-supervised vision models to vision-language models. (i) HiLo disentangles domain and semantic features through multi-level feature extraction and mutual information minimization, combined with PatchMix augmentation and curriculum sampling. (ii) HLPrompt extends HiLo with semantic-aware spatial prompt tuning to suppress background and domain noise. (iii) VLPrompt leverages vision-language models via factorized textual prompts and cross-modal consistency…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

https://visual-ai.github.io/hilo
github

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.