PDPP: Projected Diffusion for Procedure Planning in Instructional Videos

Hanlin Wang; Yilu Wu; Sheng Guo; Limin Wang

arXiv:2303.14676·cs.CV·January 23, 2025·1 cites

PDPP: Projected Diffusion for Procedure Planning in Instructional Videos

Hanlin Wang, Yilu Wu, Sheng Guo, Limin Wang

PDF

Open Access 1 Repo

TL;DR

This paper introduces PDPP, a diffusion-based framework for procedure planning in instructional videos that models entire action sequences directly from task labels, reducing annotation costs and addressing uncertainty.

Contribution

It proposes a novel diffusion-based approach for procedure planning that eliminates the need for intermediate supervision and autoregressive modeling, with joint training for variable horizon lengths.

Findings

01

Achieves state-of-the-art performance on multiple datasets.

02

Effectively models uncertainty in procedure planning.

03

Demonstrates strong generalization across different tasks.

Abstract

In this paper, we study the problem of procedure planning in instructional videos, which aims to make a plan (i.e. a sequence of actions) given the current visual observation and the desired goal. Previous works cast this as a sequence modeling problem and leverage either intermediate visual observations or language instructions as supervision to make autoregressive planning, resulting in complex learning schemes and expensive annotation costs. To avoid intermediate supervision annotation and error accumulation caused by planning autoregressively, we propose a diffusion-based framework, coined as PDPP, to directly model the whole action sequence distribution with task label as supervision instead. Our core idea is to treat procedure planning as a distribution fitting problem under the given observations, thus transform the planning problem to a sampling process from this distribution…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

mcg-nju/pdpp
pytorchOfficial

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsMultimodal Machine Learning Applications · Human Pose and Action Recognition · Domain Adaptation and Few-Shot Learning

MethodsConvolution · *Communicated@Fast*How Do I Communicate to Expedia? · Concatenated Skip Connection · Max Pooling · U-Net · Diffusion