Unlocking ImageNet's Multi-Object Nature: Automated Large-Scale Multilabel Annotation
Junyu Chen, Md Yousuf Harun, Christopher Kanan

TL;DR
This paper introduces an automated, scalable method to generate high-quality multi-label annotations for ImageNet, improving model accuracy and transferability by better reflecting real-world multi-object scenes.
Contribution
The authors develop an unsupervised pipeline using Vision Transformers to create multi-label annotations for ImageNet's training set without human input, enhancing learning signals.
Findings
Models trained with multi-label annotations outperform single-label models in accuracy.
Multi-label supervision improves transferability to downstream tasks.
Generated annotations align well with human judgment.
Abstract
The original ImageNet benchmark enforces a single-label assumption, despite many images depicting multiple objects. This leads to label noise and limits the richness of the learning signal. Multi-label annotations more accurately reflect real-world visual scenes, where multiple objects co-occur and contribute to semantic understanding, enabling models to learn richer and more robust representations. While prior efforts (e.g., ReaL, ImageNetv2) have improved the validation set, there has not yet been a scalable, high-quality multi-label annotation for the training set. To this end, we present an automated pipeline to convert the ImageNet training set into a multi-label dataset, without human annotations. Using self-supervised Vision Transformers, we perform unsupervised object discovery, select regions aligned with original labels to train a lightweight classifier, and apply it to all…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsText and Document Classification Technologies · Machine Learning and Data Classification · Domain Adaptation and Few-Shot Learning
