Image-Based Vehicle Classification by Synergizing Features from   Supervised and Self-Supervised Learning Paradigms

Shihan Ma; Jidong J. Yang

arXiv:2302.00648·cs.CV·February 2, 2023

Image-Based Vehicle Classification by Synergizing Features from Supervised and Self-Supervised Learning Paradigms

Shihan Ma, Jidong J. Yang

PDF

TL;DR

This paper proposes a vehicle classification method that combines features from supervised and self-supervised learning, achieving high accuracy by integrating representations from DINO and data2vec with wheel positional data.

Contribution

It introduces a novel fusion of self-supervised and supervised features, including wheel positional information, to enhance vehicle classification accuracy.

Findings

01

Data2Vec representations outperform DINO in classification tasks.

02

The combined approach achieves 97.2% Top-1 accuracy on 13 vehicle classes.

03

Wheel masking strategy improves feature finetuning and classification performance.

Abstract

This paper introduces a novel approach to leverage features learned from both supervised and self-supervised paradigms, to improve image classification tasks, specifically for vehicle classification. Two state-of-the-art self-supervised learning methods, DINO and data2vec, were evaluated and compared for their representation learning of vehicle images. The former contrasts local and global views while the latter uses masked prediction on multi-layered representations. In the latter case, supervised learning is employed to finetune a pretrained YOLOR object detector for detecting vehicle wheels, from which definitive wheel positional features are retrieved. The representations learned from these self-supervised learning methods were combined with the wheel positional features for the vehicle classification task. Particularly, a random wheel masking strategy was utilized to finetune the…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

MethodsMulti-Head Attention · Attention Is All You Need · Softmax · Layer Normalization · Linear Layer · Dense Connections · Residual Connection · Vision Transformer