A Stitch in Time: Learning Procedural Workflow via Self-Supervised Plackett-Luce Ranking

Chengan Che; Chao Wang; Xinyue Chen; Sophia Tsoka; Luis C. Garcia-Peraza-Herrera

arXiv:2511.17805·cs.CV·March 24, 2026

A Stitch in Time: Learning Procedural Workflow via Self-Supervised Plackett-Luce Ranking

Chengan Che, Chao Wang, Xinyue Chen, Sophia Tsoka, Luis C. Garcia-Peraza-Herrera

PDF

Open Access 1 Models

TL;DR

This paper introduces PL-Stitch, a self-supervised learning framework that leverages the temporal order of video frames using Plackett-Luce ranking to improve procedural activity recognition in videos.

Contribution

It proposes a novel self-supervised approach with probabilistic objectives based on Plackett-Luce model to enhance understanding of procedural workflows in videos.

Findings

01

Achieves +11.4 percentage points in surgical phase recognition accuracy.

02

Attains +5.7 percentage points in cooking action segmentation accuracy.

03

Outperforms existing methods across five surgical and cooking benchmarks.

Abstract

Procedural activities, ranging from routine cooking to complex surgical operations, are highly structured sequences of actions performed in a specific temporal order. Despite the success of current self-supervised learning (SSL) methods on static images and short clips, these models often overlook the underlying sequential structure of such activities. We expose this lack of procedural awareness with a motivating experiment: models pretrained on forward and time-reversed sequences produce highly similar features, confirming that their representations are blind to the underlying procedural order. To address this shortcoming, we propose PL-Stitch, a self-supervised framework that harnesses the inherent temporal order of video frames as a powerful supervisory signal. Our approach integrates two novel probabilistic objectives based on the Plackett-Luce (PL) model. The primary PL objective…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Models

🤗
visurg/PL-Stitch
model· ♡ 1
♡ 1

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsSurgical Simulation and Training · Human Pose and Action Recognition · Multimodal Machine Learning Applications