TL;DR
This paper introduces a teacher-student framework with a segmentally smoothing loss for timestamp supervised temporal action segmentation, improving stability and accuracy while reducing annotation costs.
Contribution
It proposes a novel teacher model to stabilize pseudo label generation and a segmentally smoothing loss to enhance prediction cohesion in timestamp supervised segmentation.
Findings
Outperforms state-of-the-art methods on three datasets.
Achieves comparable results to fully-supervised methods with less annotation.
Improves stability and reduces noise in pseudo label training.
Abstract
Temporal action segmentation in videos has drawn much attention recently. Timestamp supervision is a cost-effective way for this task. To obtain more information to optimize the model, the existing method generated pseudo frame-wise labels iteratively based on the output of a segmentation model and the timestamp annotations. However, this practice may introduce noise and oscillation during the training, and lead to performance degeneration. To address this problem, we propose a new framework for timestamp supervised temporal action segmentation by introducing a teacher model parallel to the segmentation model to help stabilize the process of model optimization. The teacher model can be seen as an ensemble of the segmentation model, which helps to suppress the noise and to improve the stability of pseudo labels. We further introduce a segmentally smoothing loss, which is more focused and…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
