Procedure Planning in Instructional Videos via Contextual Modeling and Model-based Policy Learning
Jing Bi, Jiebo Luo, Chenliang Xu

TL;DR
This paper introduces a novel approach for procedure planning in instructional videos by modeling human decision-making with Bayesian inference and model-based imitation learning, achieving state-of-the-art goal-reaching performance.
Contribution
It proposes a new formulation and algorithms for modeling goal-directed actions in videos, addressing limitations of previous world models by incorporating contextual information.
Findings
Achieves state-of-the-art goal-reaching accuracy
Effectively models human decision-making in instructional videos
Learned features facilitate planning in latent space
Abstract
Learning new skills by observing humans' behaviors is an essential capability of AI. In this work, we leverage instructional videos to study humans' decision-making processes, focusing on learning a model to plan goal-directed actions in real-life videos. In contrast to conventional action recognition, goal-directed actions are based on expectations of their outcomes requiring causal knowledge of potential consequences of actions. Thus, integrating the environment structure with goals is critical for solving this task. Previous works learn a single world model will fail to distinguish various tasks, resulting in an ambiguous latent space; planning through it will gradually neglect the desired outcomes since the global information of the future goal degrades quickly as the procedure evolves. We address these limitations with a new formulation of procedure planning and propose novel…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsHuman Pose and Action Recognition · Multimodal Machine Learning Applications · Reinforcement Learning in Robotics
