Motor Focus: Fast Ego-Motion Prediction for Assistive Visual Navigation

Hao Wang; Jiayou Qin; Xiwen Chen; Ashish Bastola; John Suchanek; Zihao; Gong; Abolfazl Razi

arXiv:2404.17031·cs.CV·December 18, 2024

Motor Focus: Fast Ego-Motion Prediction for Assistive Visual Navigation

Hao Wang, Jiayou Qin, Xiwen Chen, Ashish Bastola, John Suchanek, Zihao, Gong, Abolfazl Razi

PDF

Open Access 1 Repo

TL;DR

Motor Focus is a fast, lightweight visual framework that accurately predicts human ego-motion in complex scenes, enhancing assistive navigation for visually impaired users without requiring camera calibration.

Contribution

This paper introduces Motor Focus, a novel image-based ego-motion prediction method that filters camera motion without calibration, improving speed and robustness in assistive navigation.

Findings

01

Achieves over 40 FPS speed

02

Demonstrates MAE of 60 pixels in accuracy

03

Shows robustness with SNR of 23 dB

Abstract

Assistive visual navigation systems for visually impaired individuals have become increasingly popular thanks to the rise of mobile computing. Most of these devices work by translating visual information into voice commands. In complex scenarios where multiple objects are present, it is imperative to prioritize object detection and provide immediate notifications for key entities in specific directions. This brings the need for identifying the observer's motion direction (ego-motion) by merely processing visual information, which is the key contribution of this paper. Specifically, we introduce Motor Focus, a lightweight image-based framework that predicts the ego-motion - the humans (and humanoid machines) movement intentions based on their visual feeds, while filtering out camera motion without any camera calibration. To this end, we implement an optical flow-based pixel-wise temporal…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

jiayouqin/h-splitter
pytorchOfficial

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsAdvanced Vision and Imaging · Virtual Reality Applications and Impacts · Human Pose and Action Recognition

MethodsSPEED: Separable Pyramidal Pooling EncodEr-Decoder for Real-Time Monocular Depth Estimation on Low-Resource Settings · Focus