Deep Unsupervised Common Representation Learning for LiDAR and Camera Data using Double Siamese Networks
Andreas B\"uhler, Niclas V\"odisch, Mathias B\"urki, Lukas Schaupp

TL;DR
This paper introduces two unsupervised frameworks using Siamese networks and edge-guided training to learn a shared representation for LiDAR and camera data, addressing sensor modality domain gaps in autonomous robotics.
Contribution
It presents novel unsupervised training methods for cross-modal representation learning between LiDAR and camera data, leveraging Siamese networks and edge guidance.
Findings
Effective in learning common representations for LiDAR and camera data
Unsupervised training allows scalability and flexibility
Frameworks evaluated on computer vision tasks
Abstract
Domain gaps of sensor modalities pose a challenge for the design of autonomous robots. Taking a step towards closing this gap, we propose two unsupervised training frameworks for finding a common representation of LiDAR and camera data. The first method utilizes a double Siamese training structure to ensure consistency in the results. The second method uses a Canny edge image guiding the networks towards a desired representation. All networks are trained in an unsupervised manner, leaving room for scalability. The results are evaluated using common computer vision applications, and the limitations of the proposed approaches are outlined.
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsGenerative Adversarial Networks and Image Synthesis · Advanced Neural Network Applications · Advanced Image and Video Retrieval Techniques
