VLM-NCD:Novel Class Discovery with Vision-Based Large Language Models

Yuetong Su; Baoguo Wei; Xinyu Wang; Xu Li; Lixin Li

arXiv:2512.10262·cs.CV·December 12, 2025

VLM-NCD:Novel Class Discovery with Vision-Based Large Language Models

Yuetong Su, Baoguo Wei, Xinyu Wang, Xu Li, Lixin Li

PDF

Open Access

TL;DR

This paper introduces LLM-NCD, a multimodal framework that enhances novel class discovery by integrating visual and textual semantics, achieving significant accuracy improvements and robustness to data imbalance.

Contribution

It presents a novel multimodal approach combining visual-textual features and a dual-phase clustering mechanism for improved NCD performance.

Findings

01

Achieves up to 25.3% accuracy improvement on CIFAR-100.

02

Demonstrates resilience to long-tail data distributions.

03

Outperforms existing NCD methods significantly.

Abstract

Novel Class Discovery aims to utilise prior knowledge of known classes to classify and discover unknown classes from unlabelled data. Existing NCD methods for images primarily rely on visual features, which suffer from limitations such as insufficient feature discriminability and the long-tail distribution of data. We propose LLM-NCD, a multimodal framework that breaks this bottleneck by fusing visual-textual semantics and prototype guided clustering. Our key innovation lies in modelling cluster centres and semantic prototypes of known classes by jointly optimising known class image and text features, and a dualphase discovery mechanism that dynamically separates known or novel samples via semantic affinity thresholds and adaptive clustering. Experiments on the CIFAR-100 dataset show that compared to the current methods, this method achieves up to 25.3% improvement in accuracy for…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsDomain Adaptation and Few-Shot Learning · Face recognition and analysis · Multimodal Machine Learning Applications