LEGO: Learning Edge with Geometry all at Once by Watching Videos
Zhenheng Yang, Peng Wang, Yang Wang, Wei Xu, Ram Nevatia

TL;DR
This paper introduces LEGO, an unsupervised deep learning framework that jointly estimates 3D geometry and edges from videos, significantly improving accuracy by incorporating a novel 3D-ASAP prior that enforces planar surface consistency.
Contribution
The paper presents a new unsupervised method that simultaneously learns edges, depth, and normals using a 3D-ASAP prior, enhancing geometric detail accuracy in 3D scene reconstruction.
Findings
Outperforms state-of-the-art on KITTI for depth and normal estimation.
Achieves superior edge detection accuracy on CityScapes.
Demonstrates consistent improvement across all evaluated tasks.
Abstract
Learning to estimate 3D geometry in a single image by watching unlabeled videos via deep convolutional network is attracting significant attention. In this paper, we introduce a "3D as-smooth-as-possible (3D-ASAP)" prior inside the pipeline, which enables joint estimation of edges and 3D scene, yielding results with significant improvement in accuracy for fine detailed structures. Specifically, we define the 3D-ASAP prior by requiring that any two points recovered in 3D from an image should lie on an existing planar surface if no other cues provided. We design an unsupervised framework that Learns Edges and Geometry (depth, normal) all at Once (LEGO). The predicted edges are embedded into depth and surface normal smoothness terms, where pixels without edges in-between are constrained to satisfy the prior. In our framework, the predicted depths, normals and edges are forced to be…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsAdvanced Vision and Imaging · Robotics and Sensor-Based Localization · Advanced Image and Video Retrieval Techniques
