Robots Autonomously Detecting People: A Multimodal Deep Contrastive Learning Method Robust to Intraclass Variations
Angus Fung, Beno Benhabib, Goldie Nejat

TL;DR
This paper introduces a novel multimodal deep learning approach for robot-based person detection that is robust to occlusions, pose variations, and lighting conditions, using a two-stage training process involving contrastive learning and a specialized detector.
Contribution
The paper presents a new multimodal person detection architecture with a unique pretraining method, TimCLR, that enhances invariance to intraclass variations and improves detection accuracy in challenging environments.
Findings
Outperforms existing methods in accuracy for occluded and deformed persons
Effective in diverse lighting and cluttered environments
Pretraining with TimCLR enhances cross-modal invariance
Abstract
Robotic detection of people in crowded and/or cluttered human-centered environments including hospitals, long-term care, stores and airports is challenging as people can become occluded by other people or objects, and deform due to variations in clothing or pose. There can also be loss of discriminative visual features due to poor lighting. In this paper, we present a novel multimodal person detection architecture to address the mobile robot problem of person detection under intraclass variations. We present a two-stage training approach using 1) a unique pretraining method we define as Temporal Invariant Multimodal Contrastive Learning (TimCLR), and 2) a Multimodal Faster R-CNN (MFRCNN) detector. TimCLR learns person representations that are invariant under intraclass variations through unsupervised learning. Our approach is unique in that it generates image pairs from natural…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsVideo Surveillance and Tracking Methods · Multimodal Machine Learning Applications · Advanced Neural Network Applications
MethodsContrastive Learning · Softmax · Region Proposal Network · RoIPool · Convolution · Faster R-CNN
