Estimating more camera poses for ego-centric videos is essential for   VQ3D

Jinjie Mai; Chen Zhao; Abdullah Hamdi; Silvio Giancola; Bernard Ghanem

arXiv:2211.10284·cs.CV·November 21, 2022·1 cites

Estimating more camera poses for ego-centric videos is essential for VQ3D

Jinjie Mai, Chen Zhao, Abdullah Hamdi, Silvio Giancola, Bernard Ghanem

PDF

Open Access

TL;DR

This paper introduces an improved camera pose estimation pipeline for VQ3D in egocentric videos, significantly boosting success rates by optimizing existing frameworks for better accuracy and efficiency.

Contribution

The authors develop a new camera pose estimation pipeline and optimize the VQ3D framework, achieving a twofold increase in success rate over previous baselines.

Findings

01

Top-1 success rate of 25.8% on VQ3D leaderboard

02

Enhanced camera pose estimation improves query accuracy

03

Optimized framework doubles success rate compared to baseline

Abstract

Visual queries 3D localization (VQ3D) is a task in the Ego4D Episodic Memory Benchmark. Given an egocentric video, the goal is to answer queries of the form "Where did I last see object X?", where the query object X is specified as a static image, and the answer should be a 3D displacement vector pointing to object X. However, current techniques use naive ways to estimate the camera poses of video frames, resulting in a low query with pose (QwP) ratio, thus a poor overall success rate. We design a new pipeline for the challenging egocentric video camera pose estimation problem in our work. Moreover, we revisit the current VQ3D framework and optimize it in terms of performance and efficiency. As a result, we get the top-1 overall success rate of 25.8% on VQ3D leaderboard, which is two times better than the 8.7% reported by the baseline.

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsRobotics and Sensor-Based Localization · Advanced Image and Video Retrieval Techniques · Multimodal Machine Learning Applications