IMoRe: Implicit Program-Guided Reasoning for Human Motion Q&A

Chen Li; Chinthani Sugandhika; Yeo Keat Ee; Eric Peh; Hao Zhang; Hong Yang; Deepu Rajan; Basura Fernando

arXiv:2508.01984·cs.CV·August 5, 2025

IMoRe: Implicit Program-Guided Reasoning for Human Motion Q&A

Chen Li, Chinthani Sugandhika, Yeo Keat Ee, Eric Peh, Hao Zhang, Hong Yang, Deepu Rajan, Basura Fernando

PDF

Open Access

TL;DR

IMoRe introduces an implicit, program-guided reasoning framework for human motion question answering that unifies multiple query types without manual module design, leveraging structured programs and multi-level motion representations.

Contribution

The paper proposes a novel implicit reasoning approach that conditions on structured program functions and uses a dynamic motion representation selection mechanism, improving scalability and adaptability.

Findings

01

Achieves state-of-the-art results on Babel-QA.

02

Generalizes well to HuMMan-based motion Q extbackslash{}A dataset.

03

Effectively captures both high-level semantics and fine-grained motion cues.

Abstract

Existing human motion Q\&A methods rely on explicit program execution, where the requirement for manually defined functional modules may limit the scalability and adaptability. To overcome this, we propose an implicit program-guided motion reasoning (IMoRe) framework that unifies reasoning across multiple query types without manually designed modules. Unlike existing implicit reasoning approaches that infer reasoning operations from question words, our model directly conditions on structured program functions, ensuring a more precise execution of reasoning steps. Additionally, we introduce a program-guided reading mechanism, which dynamically selects multi-level motion representations from a pretrained motion Vision Transformer (ViT), capturing both high-level semantics and fine-grained motion cues. The reasoning module iteratively refines memory representations, leveraging structured…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsMultimodal Machine Learning Applications · Human Motion and Animation · Human Pose and Action Recognition