Efficient Depth Estimation for Unstable Stereo Camera Systems on AR   Glasses

Yongfan Liu; Hyoukjun Kwon

arXiv:2411.10013·cs.CV·April 30, 2025

Efficient Depth Estimation for Unstable Stereo Camera Systems on AR Glasses

Yongfan Liu, Hyoukjun Kwon

PDF

Open Access 1 Repo

TL;DR

This paper presents a novel, low-latency stereo depth estimation method optimized for AR glasses, eliminating preprocessing and leveraging hardware-aware neural network designs to improve speed and accuracy.

Contribution

It introduces Homography prediction with positional encoding and a group-pointwise convolution-based cost volume, enabling direct unrectified image processing and significant latency reduction.

Findings

01

11.8-30.3% accuracy improvement over state-of-the-art

02

44.5% reduction in end-to-end latency

03

10.0-24.3% error reduction with multi-task learning

Abstract

Stereo depth estimation is a fundamental component in augmented reality (AR), which requires low latency for real-time processing. However, preprocessing such as rectification and non-ML computations such as cost volume require significant amount of latency exceeding that of an ML model itself, which hinders the real-time processing required by AR. Therefore, we develop alternative approaches to the rectification and cost volume that consider ML acceleration (GPU and NPUs) in recent hardware. For pre-processing, we eliminate it by introducing homography matrix prediction network with a rectification positional encoding (RPE), which delivers both low latency and robustness to unrectified images. For cost volume, we replace it with a group-pointwise convolution-based operator and approximation of cosine similarity based on layernorm and dot product. Based on our approaches, we develop…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

UCI-ISA-Lab/MultiHeadDepth-HomoDepth
pytorchOfficial

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsAdvanced Vision and Imaging · Robotics and Sensor-Based Localization · Satellite Image Processing and Photogrammetry

MethodsNetwork On Network · ADaptive gradient method with the OPTimal convergence rate