Skip-Plan: Procedure Planning in Instructional Videos via Condensed   Action Space Learning

Zhiheng Li; Wenjia Geng; Muheng Li; Lei Chen; Yansong Tang; Jiwen Lu,; Jie Zhou

arXiv:2310.00608·cs.CV·October 3, 2023

Skip-Plan: Procedure Planning in Instructional Videos via Condensed Action Space Learning

Zhiheng Li, Wenjia Geng, Muheng Li, Lei Chen, Yansong Tang, Jiwen Lu,, Jie Zhou

PDF

Open Access 1 Repo

TL;DR

Skip-Plan introduces a novel procedure planning method for instructional videos that simplifies long action sequences into reliable shorter sub-chains, improving performance by avoiding high-dimensional supervision and error accumulation.

Contribution

It proposes a chain model with skipping strategy to condense action space, enabling more reliable and efficient procedure planning in instructional videos.

Findings

01

Achieves state-of-the-art results on CrossTask and COIN benchmarks.

02

Effectively reduces error propagation in long action sequences.

03

Demonstrates robustness by skipping unreliable intermediate actions.

Abstract

In this paper, we propose Skip-Plan, a condensed action space learning method for procedure planning in instructional videos. Current procedure planning methods all stick to the state-action pair prediction at every timestep and generate actions adjacently. Although it coincides with human intuition, such a methodology consistently struggles with high-dimensional state supervision and error accumulation on action sequences. In this work, we abstract the procedure planning problem as a mathematical chain model. By skipping uncertain nodes and edges in action chains, we transfer long and complex sequence functions into short but reliable ones in two ways. First, we skip all the intermediate state supervision and only focus on action predictions. Second, we decompose relatively long chains into multiple short sub-chains by skipping unreliable intermediate actions. By this means, our model…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

2024-MindSpore-1/Code9/tree/main/skip_-plan-mindspore-master
mindspore

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsHuman Pose and Action Recognition · Video Analysis and Summarization · Multimodal Machine Learning Applications

MethodsFocus