The Third Monocular Depth Estimation Challenge

Jaime Spencer; Fabio Tosi; Matteo Poggi; Ripudaman Singh Arora; Chris; Russell; Simon Hadfield; Richard Bowden; GuangYuan Zhou; ZhengXin Li; Qiang; Rao; YiPing Bao; Xiao Liu; Dohyeong Kim; Jinseong Kim; Myunghyun Kim; Mykola; Lavreniuk; Rui Li; Qing Mao; Jiang Wu; Yu Zhu; Jinqiu Sun; Yanning Zhang,; Suraj Patni; Aradhye Agarwal; Chetan Arora; Pihai Sun; Kui Jiang; Gang Wu,; Jian Liu; Xianming Liu; Junjun Jiang; Xidan Zhang; Jianing Wei; Fangjun Wang,; Zhiming Tan; Jiabao Wang; Albert Luginov; Muhammad Shahzad; Seyed Hosseini,; Aleksander Trajcevski; James H. Elder

arXiv:2404.16831·cs.CV·April 30, 2024

The Third Monocular Depth Estimation Challenge

Jaime Spencer, Fabio Tosi, Matteo Poggi, Ripudaman Singh Arora, Chris, Russell, Simon Hadfield, Richard Bowden, GuangYuan Zhou, ZhengXin Li, Qiang, Rao, YiPing Bao, Xiao Liu, Dohyeong Kim, Jinseong Kim, Myunghyun Kim, Mykola, Lavreniuk, Rui Li, Qing Mao, Jiang Wu, Yu Zhu

PDF

Open Access

TL;DR

This paper reports on the third Monocular Depth Estimation Challenge, emphasizing zero-shot generalization to complex scenes, with recent methods leveraging foundational models to significantly improve 3D F-Score performance.

Contribution

It introduces the third edition of the challenge, highlighting the use of foundational models and demonstrating substantial performance improvements in depth estimation.

Findings

01

19 submissions outperformed the baseline

02

Use of foundational models like Depth Anything was common

03

Winners improved 3D F-Score from 17.51% to 23.72%

Abstract

This paper discusses the results of the third edition of the Monocular Depth Estimation Challenge (MDEC). The challenge focuses on zero-shot generalization to the challenging SYNS-Patches dataset, featuring complex scenes in natural and indoor settings. As with the previous edition, methods can use any form of supervision, i.e. supervised or self-supervised. The challenge received a total of 19 submissions outperforming the baseline on the test set: 10 among them submitted a report describing their approach, highlighting a diffused use of foundational models such as Depth Anything at the core of their method. The challenge winners drastically improved 3D F-Score performance, from 17.51% to 23.72%.

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsImage Processing Techniques and Applications · Industrial Vision Systems and Defect Detection · Advanced Vision and Imaging