Instance-wise Occlusion and Depth Orders in Natural Scenes
Hyunmin Lee, Jaesik Park

TL;DR
This paper presents InstaOrder, a large dataset of geometric orderings in natural scenes, and introduces InstaOrderNet and InstaDepthNet, models that leverage this data to improve understanding of occlusion, depth, and scene geometry.
Contribution
The paper introduces InstaOrder, a comprehensive dataset of geometric orderings, and proposes new models that utilize this data to enhance scene understanding tasks.
Findings
InstaOrderNet outperforms existing state-of-the-art methods in geometric order prediction.
InstaDepthNet improves depth estimation accuracy by incorporating geometric order loss.
The dataset reveals that occlusion and depth orders are complementary in scene understanding.
Abstract
In this paper, we introduce a new dataset, named InstaOrder, that can be used to understand the geometrical relationships of instances in an image. The dataset consists of 2.9M annotations of geometric orderings for class-labeled instances in 101K natural scenes. The scenes were annotated by 3,659 crowd-workers regarding (1) occlusion order that identifies occluder/occludee and (2) depth order that describes ordinal relations that consider relative distance from the camera. The dataset provides joint annotation of two kinds of orderings for the same instances, and we discover that the occlusion order and depth order are complementary. We also introduce a geometric order prediction network called InstaOrderNet, which is superior to state-of-the-art approaches. Moreover, we propose a dense depth prediction network called InstaDepthNet that uses auxiliary geometric order loss to boost the…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsAdvanced Vision and Imaging · Human Pose and Action Recognition · Video Analysis and Summarization
