Egocentric Scene Understanding via Multimodal Spatial Rectifier

Tien Do; Khiem Vuong; Hyun Soo Park

arXiv:2207.07077·cs.CV·July 15, 2022

Egocentric Scene Understanding via Multimodal Spatial Rectifier

Tien Do, Khiem Vuong, Hyun Soo Park

PDF

Open Access 1 Repo

TL;DR

This paper introduces a multimodal spatial rectifier and a new dataset, EDINA, to improve egocentric scene understanding, specifically depth and surface normal prediction, addressing challenges from non-canonical viewpoints and dynamic foreground objects.

Contribution

The paper proposes a multimodal spatial rectifier for egocentric images and introduces the EDINA dataset, enabling better learning of dynamic scene representations and significantly improving depth and normal estimation.

Findings

01

Outperforms baseline models on EDINA, FPHA, and EPIC-KITCHENS datasets.

02

Effectively stabilizes egocentric images from non-canonical viewpoints.

03

Enhances depth and surface normal prediction accuracy.

Abstract

In this paper, we study a problem of egocentric scene understanding, i.e., predicting depths and surface normals from an egocentric image. Egocentric scene understanding poses unprecedented challenges: (1) due to large head movements, the images are taken from non-canonical viewpoints (i.e., tilted images) where existing models of geometry prediction do not apply; (2) dynamic foreground objects including hands constitute a large proportion of visual scenes. These challenges limit the performance of the existing models learned from large indoor datasets, such as ScanNet and NYUv2, which comprise predominantly upright images of static scenes. We present a multimodal spatial rectifier that stabilizes the egocentric images to a set of reference directions, which allows learning a coherent visual representation. Unlike unimodal spatial rectifier that often produces excessive perspective warp…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

tien-d/EgoDepthNormal
pytorchOfficial

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsHuman Pose and Action Recognition · Advanced Vision and Imaging · Hand Gesture Recognition Systems

MethodsGravity