Image-Based Vehicle Classification by Synergizing Features from Supervised and Self-Supervised Learning Paradigms
Shihan Ma, Jidong J. Yang

TL;DR
This paper proposes a vehicle classification method that combines features from supervised and self-supervised learning, achieving high accuracy by integrating representations from DINO and data2vec with wheel positional data.
Contribution
It introduces a novel fusion of self-supervised and supervised features, including wheel positional information, to enhance vehicle classification accuracy.
Findings
Data2Vec representations outperform DINO in classification tasks.
The combined approach achieves 97.2% Top-1 accuracy on 13 vehicle classes.
Wheel masking strategy improves feature finetuning and classification performance.
Abstract
This paper introduces a novel approach to leverage features learned from both supervised and self-supervised paradigms, to improve image classification tasks, specifically for vehicle classification. Two state-of-the-art self-supervised learning methods, DINO and data2vec, were evaluated and compared for their representation learning of vehicle images. The former contrasts local and global views while the latter uses masked prediction on multi-layered representations. In the latter case, supervised learning is employed to finetune a pretrained YOLOR object detector for detecting vehicle wheels, from which definitive wheel positional features are retrieved. The representations learned from these self-supervised learning methods were combined with the wheel positional features for the vehicle classification task. Particularly, a random wheel masking strategy was utilized to finetune the…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
MethodsMulti-Head Attention · Attention Is All You Need · Softmax · Layer Normalization · Linear Layer · Dense Connections · Residual Connection · Vision Transformer
