DistillPose: Lightweight Camera Localization Using Auxiliary Learning
Yehya Abouelnaga, Mai Bui, Slobodan Ilic

TL;DR
DistillPose introduces a lightweight, efficient camera localization method that significantly reduces model size and inference time while maintaining accuracy, using auxiliary learning and a distilled neural network architecture.
Contribution
The paper presents a novel lightweight retrieval-based camera localization pipeline that distills a larger model to drastically reduce parameters and computation without sacrificing accuracy.
Findings
Reduces model parameters by 98.87%
Decreases inference time by 89.18%
Maintains localization accuracy comparable to larger models
Abstract
We propose a lightweight retrieval-based pipeline to predict 6DOF camera poses from RGB images. Our pipeline uses a convolutional neural network (CNN) to encode a query image as a feature vector. A nearest neighbor lookup finds the pose-wise nearest database image. A siamese convolutional neural network regresses the relative pose from the nearest neighboring database image to the query image. The relative pose is then applied to the nearest neighboring absolute pose to obtain the query image's final absolute pose prediction. Our model is a distilled version of NN-Net that reduces its parameters by 98.87%, information retrieval feature vector size by 87.5%, and inference time by 89.18% without a significant decrease in localization accuracy.
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsRobotics and Sensor-Based Localization · Image and Object Detection Techniques · Advanced Vision and Imaging
