Anchor-Constrained Viterbi for Set-Supervised Action Segmentation

Jun Li; Sinisa Todorovic

arXiv:2104.02113·cs.CV·April 7, 2021

Anchor-Constrained Viterbi for Set-Supervised Action Segmentation

Jun Li, Sinisa Todorovic

PDF

Open Access

TL;DR

This paper introduces an anchor-constrained Viterbi algorithm for weakly supervised action segmentation, leveraging pseudo-labels generated from a set-based supervision to improve segmentation accuracy.

Contribution

The novel anchor-constrained Viterbi algorithm effectively generates pseudo-ground truth for weak supervision, enhancing action segmentation performance over previous methods.

Findings

01

Outperforms prior methods on Breakfast, MPII Cooking2, Hollywood Extended datasets.

02

Achieves higher segmentation accuracy and alignment quality.

03

Demonstrates robustness in weakly supervised settings.

Abstract

This paper is about action segmentation under weak supervision in training, where the ground truth provides only a set of actions present, but neither their temporal ordering nor when they occur in a training video. We use a Hidden Markov Model (HMM) grounded on a multilayer perceptron (MLP) to label video frames, and thus generate a pseudo-ground truth for the subsequent pseudo-supervised training. In testing, a Monte Carlo sampling of action sets seen in training is used to generate candidate temporal sequences of actions, and select the maximum posterior sequence. Our key contribution is a new anchor-constrained Viterbi algorithm (ACV) for generating the pseudo-ground truth, where anchors are salient action parts estimated for each action from a given ground-truth set. Our evaluation on the tasks of action segmentation and alignment on the benchmark Breakfast, MPII Cooking2,…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsHuman Pose and Action Recognition · Multimodal Machine Learning Applications · Anomaly Detection Techniques and Applications