ViterbiPlanNet: Injecting Procedural Knowledge via Differentiable Viterbi for Planning in Instructional Videos

Luigi Seminara; Davide Moltisanti; Antonino Furnari

arXiv:2603.04265·cs.CV·March 5, 2026

ViterbiPlanNet: Injecting Procedural Knowledge via Differentiable Viterbi for Planning in Instructional Videos

Luigi Seminara, Davide Moltisanti, Antonino Furnari

PDF

Open Access

TL;DR

ViterbiPlanNet introduces a novel framework that explicitly incorporates procedural knowledge into action sequence prediction for instructional videos, improving efficiency and robustness through a differentiable Viterbi layer.

Contribution

The paper presents ViterbiPlanNet, a framework that embeds a Procedural Knowledge Graph within a differentiable Viterbi layer, enabling explicit procedural knowledge integration and end-to-end training.

Findings

01

Achieves state-of-the-art performance on CrossTask, COIN, and NIV datasets.

02

Uses significantly fewer parameters than diffusion- and LLM-based planners.

03

Demonstrates improved sample efficiency and robustness to shorter horizons.

Abstract

Procedural planning aims to predict a sequence of actions that transforms an initial visual state into a desired goal, a fundamental ability for intelligent agents operating in complex environments. Existing approaches typically rely on large-scale models that learn procedural structures implicitly, resulting in limited sample-efficiency and high computational cost. In this work we introduce ViterbiPlanNet, a principled framework that explicitly integrates procedural knowledge into the learning process through a Differentiable Viterbi Layer (DVL). The DVL embeds a Procedural Knowledge Graph (PKG) directly with the Viterbi decoding algorithm, replacing non-differentiable operations with smooth relaxations that enable end-to-end optimization. This design allows the model to learn through graph-based decoding. Experiments on CrossTask, COIN, and NIV demonstrate that ViterbiPlanNet achieves…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsMultimodal Machine Learning Applications · Reinforcement Learning in Robotics · AI-based Problem Solving and Planning