Mojito: LLM-Aided Motion Instructor with Jitter-Reduced Inertial Tokens
Ziwei Shan, Yaoyu He, Chengfeng Zhao, Jiashen Du, Jingyan Zhang,, Qixuan Zhang, Jingyi Yu, Lan Xu

TL;DR
Mojito combines inertial measurement units with large language models to enable robust, real-time motion capture and analysis, overcoming challenges like sensor noise and drift for improved understanding of human movements.
Contribution
This paper introduces Mojito, a novel system integrating IMUs with LLMs for interactive, jitter-reduced motion capture and behavioral analysis, addressing key limitations of existing methods.
Findings
Enhanced motion capture accuracy with inertial sensors
Real-time behavioral analysis capabilities
Reduced jitter and drift in long-term motion tracking
Abstract
Human bodily movements convey critical insights into action intentions and cognitive processes, yet existing multimodal systems primarily focused on understanding human motion via language, vision, and audio, which struggle to capture the dynamic forces and torques inherent in 3D motion. Inertial measurement units (IMUs) present a promising alternative, offering lightweight, wearable, and privacy-conscious motion sensing. However, processing of streaming IMU data faces challenges such as wireless transmission instability, sensor noise, and drift, limiting their utility for long-term real-time motion capture (MoCap), and more importantly, online motion analysis. To address these challenges, we introduce Mojito, an intelligent motion agent that integrates inertial sensing with large language models (LLMs) for interactive motion capture and behavioral analysis.
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsTeleoperation and Haptic Systems · Soft Robotics and Applications
