Procedure Planning in Instructional Videos via Contextual Modeling and   Model-based Policy Learning

Jing Bi; Jiebo Luo; Chenliang Xu

arXiv:2110.01770·cs.CV·October 12, 2021

Procedure Planning in Instructional Videos via Contextual Modeling and Model-based Policy Learning

Jing Bi, Jiebo Luo, Chenliang Xu

PDF

Open Access

TL;DR

This paper introduces a novel approach for procedure planning in instructional videos by modeling human decision-making with Bayesian inference and model-based imitation learning, achieving state-of-the-art goal-reaching performance.

Contribution

It proposes a new formulation and algorithms for modeling goal-directed actions in videos, addressing limitations of previous world models by incorporating contextual information.

Findings

01

Achieves state-of-the-art goal-reaching accuracy

02

Effectively models human decision-making in instructional videos

03

Learned features facilitate planning in latent space

Abstract

Learning new skills by observing humans' behaviors is an essential capability of AI. In this work, we leverage instructional videos to study humans' decision-making processes, focusing on learning a model to plan goal-directed actions in real-life videos. In contrast to conventional action recognition, goal-directed actions are based on expectations of their outcomes requiring causal knowledge of potential consequences of actions. Thus, integrating the environment structure with goals is critical for solving this task. Previous works learn a single world model will fail to distinguish various tasks, resulting in an ambiguous latent space; planning through it will gradually neglect the desired outcomes since the global information of the future goal degrades quickly as the procedure evolves. We address these limitations with a new formulation of procedure planning and propose novel…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsHuman Pose and Action Recognition · Multimodal Machine Learning Applications · Reinforcement Learning in Robotics