EgoLoc: Revisiting 3D Object Localization from Egocentric Videos with   Visual Queries

Jinjie Mai; Abdullah Hamdi; Silvio Giancola; Chen Zhao; Bernard Ghanem

arXiv:2212.06969·cs.CV·August 29, 2023·1 cites

EgoLoc: Revisiting 3D Object Localization from Egocentric Videos with Visual Queries

Jinjie Mai, Abdullah Hamdi, Silvio Giancola, Chen Zhao, Bernard Ghanem

PDF

Open Access 1 Repo

TL;DR

EgoLoc introduces a new pipeline that improves 3D object localization from egocentric videos by better integrating multi-view geometry and 2D object retrieval, significantly boosting success rates in the VQ3D task.

Contribution

The paper presents EgoLoc, a novel approach that enhances camera pose estimation and multi-view 3D displacements, achieving state-of-the-art results in 3D object localization from egocentric videos.

Findings

01

Achieves up to 87.12% success rate in VQ3D

02

Improves camera pose estimation robustness

03

Provides comprehensive analysis of VQ3D challenges

Abstract

With the recent advances in video and 3D understanding, novel 4D spatio-temporal methods fusing both concepts have emerged. Towards this direction, the Ego4D Episodic Memory Benchmark proposed a task for Visual Queries with 3D Localization (VQ3D). Given an egocentric video clip and an image crop depicting a query object, the goal is to localize the 3D position of the center of that query object with respect to the camera pose of a query frame. Current methods tackle the problem of VQ3D by unprojecting the 2D localization results of the sibling task Visual Queries with 2D Localization (VQ2D) into 3D predictions. Yet, we point out that the low number of camera poses caused by camera re-localization from previous VQ3D methods severally hinders their overall success rate. In this work, we formalize a pipeline (we dub EgoLoc) that better entangles 3D multiview geometry with 2D object…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

wayne-mai/egoloc
pytorchOfficial

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsRobotics and Sensor-Based Localization · Human Pose and Action Recognition · Advanced Image and Video Retrieval Techniques

MethodsContrastive Language-Image Pre-training