Self-supervised cross-modality learning for uncertainty-aware object detection and recognition in applications which lack pre-labelled training data
Irum Mehboob, Li Sun, Alireza Astegarpanah, Rustam Stolkin

TL;DR
This paper introduces a self-supervised, uncertainty-aware deep learning framework for object detection and recognition in RGB images without requiring extensive labeled datasets, using a teacher-student pipeline with 3D detection, weak supervision, and Gaussian Processes.
Contribution
It presents a novel self-supervised learning approach combining 3D detection, weakly supervised classification, and uncertainty estimation for industrial object recognition.
Findings
Student network outperforms directly trained YOLO on limited data.
Gaussian Process provides meaningful uncertainty estimates.
Method enables real-time detection in complex scenes.
Abstract
This paper shows how an uncertainty-aware, deep neural network can be trained to detect, recognise and localise objects in 2D RGB images, in applications lacking annotated train-ng datasets. We propose a self-supervising teacher-student pipeline, in which a relatively simple teacher classifier, trained with only a few labelled 2D thumbnails, automatically processes a larger body of unlabelled RGB-D data to teach a student network based on a modified YOLOv3 architecture. Firstly, 3D object detection with back projection is used to automatically extract and teach 2D detection and localisation information to the student network. Secondly, a weakly supervised 2D thumbnail classifier, with minimal training on a small number of hand-labelled images, is used to teach object category recognition. Thirdly, we use a Gaussian Process GP to encode and teach a robust uncertainty estimation…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsAdvanced Image and Video Retrieval Techniques · Anomaly Detection Techniques and Applications · Domain Adaptation and Few-Shot Learning
MethodsBNB Customer Service Number +1-833-534-1729 · Average Pooling · Batch Normalization · Global Average Pooling · Softmax · 1x1 Convolution · Convolution · Residual Connection · k-Means Clustering · Logistic Regression
