Discriminately Treating Motion Components Evolves Joint Depth and Ego-Motion Learning

Mengtan Zhang; Zizhan Guo; Hongbo Zhao; Yi Feng; Zuyi Xiong; Yue Wang; Shaoyi Du; Hanli Wang; and Rui Fan

arXiv:2511.01502·cs.CV·November 4, 2025

Discriminately Treating Motion Components Evolves Joint Depth and Ego-Motion Learning

Mengtan Zhang, Zizhan Guo, Hongbo Zhao, Yi Feng, Zuyi Xiong, Yue Wang, Shaoyi Du, Hanli Wang, and Rui Fan

PDF

Open Access

TL;DR

This paper proposes a novel approach for joint depth and ego-motion learning that discriminately treats motion components, leveraging geometric regularities to improve robustness and accuracy in diverse conditions.

Contribution

It introduces a discriminative treatment of motion components that enhances geometric constraints and mutual derivation of depth and ego-motion, leading to state-of-the-art results.

Findings

01

Achieves state-of-the-art performance on multiple datasets.

02

Improves robustness under challenging real-world conditions.

03

Introduces a framework with geometric constraints on motion components.

Abstract

Unsupervised learning of depth and ego-motion, two fundamental 3D perception tasks, has made significant strides in recent years. However, most methods treat ego-motion as an auxiliary task, either mixing all motion types or excluding depth-independent rotational motions in supervision. Such designs limit the incorporation of strong geometric constraints, reducing reliability and robustness under diverse conditions. This study introduces a discriminative treatment of motion components, leveraging the geometric regularities of their respective rigid flows to benefit both depth and ego-motion estimation. Given consecutive video frames, network outputs first align the optical axes and imaging planes of the source and target cameras. Optical flows between frames are transformed through these alignments, and deviations are quantified to impose geometric constraints individually on each…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsAdvanced Vision and Imaging · Human Pose and Action Recognition · Face recognition and analysis