A Flexible-Frame-Rate Vision-Aided Inertial Object Tracking System for Mobile Devices
Yo-Chung Lau, Kuan-Wei Tseng, I-Ju Hsieh, Hsiao-Ching Tseng, Yi-Ping, Hung

TL;DR
This paper introduces a real-time, flexible-frame-rate object tracking system for mobile devices that combines visual and inertial data, ensuring high accuracy and robustness at up to 120 FPS.
Contribution
It presents a novel client-server architecture with inertial pose propagation and RGB-based pose estimation, including a bias correction and failure detection mechanism.
Findings
Supports up to 120 FPS tracking
Achieves high accuracy and robustness
Operates efficiently on low-end devices
Abstract
Real-time object pose estimation and tracking is challenging but essential for emerging augmented reality (AR) applications. In general, state-of-the-art methods address this problem using deep neural networks which indeed yield satisfactory results. Nevertheless, the high computational cost of these methods makes them unsuitable for mobile devices where real-world applications usually take place. In addition, head-mounted displays such as AR glasses require at least 90~FPS to avoid motion sickness, which further complicates the problem. We propose a flexible-frame-rate object pose estimation and tracking system for mobile devices. It is a monocular visual-inertial-based system with a client-server architecture. Inertial measurement unit (IMU) pose propagation is performed on the client side for high speed tracking, and RGB image-based 3D pose estimation is performed on the server side…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsRobotics and Sensor-Based Localization · Advanced Vision and Imaging · Augmented Reality Applications
MethodsSPEED: Separable Pyramidal Pooling EncodEr-Decoder for Real-Time Monocular Depth Estimation on Low-Resource Settings
