Novel Categories Discovery Via Constraints on Empirical Prediction Statistics
Zahid Hasan, Abu Zaher Md Faridee, Masud Ahmed, Sanjay Purushotham,, Heesung Kwon, Hyungtae Lee, Nirmalya Roy

TL;DR
This paper introduces a novel semantic clustering method for discovering new categories in data by constraining statistical properties of predicted class probabilities, outperforming existing approaches across multiple datasets.
Contribution
It proposes a new approach that aligns class probability distributions using statistical constraints and directional statistics, enabling effective semantic clustering without external clustering methods.
Findings
Achieves high classification accuracy on known classes (~90%)
Attains competitive clustering accuracy (~75%) on novel categories
Demonstrates effectiveness across image, video, and time-series data
Abstract
Novel Categories Discovery (NCD) aims to cluster novel data based on the class semantics of known classes using the open-world partial class space annotated dataset. As an alternative to the traditional pseudo-labeling-based approaches, we leverage the connection between the data sampling and the provided multinoulli (categorical) distribution of novel classes. We introduce constraints on individual and collective statistics of predicted novel class probabilities to implicitly achieve semantic-based clustering. More specifically, we align the class neuron activation distributions under Monte-Carlo sampling of novel classes in large batches by matching their empirical first-order (mean) and second-order (covariance) statistics with the multinoulli distribution of the labels while applying instance information constraints and prediction consistency under label-preserving augmentations. We…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsAnomaly Detection Techniques and Applications · Machine Learning in Healthcare · COVID-19 diagnosis using AI
MethodsALIGN
