Motion Question Answering via Modular Motion Programs
Mark Endo, Joy Hsu, Jiaman Li, Jiajun Wu

TL;DR
This paper introduces the HumanMotionQA task for evaluating complex reasoning over human motion sequences and proposes NSPose, a neuro-symbolic method that outperforms baselines in this challenging setting.
Contribution
The paper presents a new dataset for multi-step reasoning in human motion analysis and introduces NSPose, a modular neuro-symbolic approach for this task.
Findings
NSPose outperforms baseline methods on HumanMotionQA.
The dataset enables evaluation of complex spatio-temporal reasoning.
NSPose effectively grounds motion concepts and temporal relations.
Abstract
In order to build artificial intelligence systems that can perceive and reason with human behavior in the real world, we must first design models that conduct complex spatio-temporal reasoning over motion sequences. Moving towards this goal, we propose the HumanMotionQA task to evaluate complex, multi-step reasoning abilities of models on long-form human motion sequences. We generate a dataset of question-answer pairs that require detecting motor cues in small portions of motion sequences, reasoning temporally about when events occur, and querying specific motion attributes. In addition, we propose NSPose, a neuro-symbolic method for this task that uses symbolic reasoning and a modular design to ground motion through learning motion concepts, attribute neural operators, and temporal relations. We demonstrate the suitability of NSPose for the HumanMotionQA task, outperforming all…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
Taxonomy
TopicsMultimodal Machine Learning Applications · Natural Language Processing Techniques · Human Pose and Action Recognition
