Combining Boundary Supervision and Segment-Level Regularization for Fine-Grained Action Segmentation
Hinako Mitsuoka, Kazuhiro Hotta

TL;DR
This paper introduces a lightweight dual-loss training framework for temporal action segmentation that enhances boundary accuracy and segment coherence without complex architectures.
Contribution
It proposes a simple, architecture-agnostic training loss combining boundary regression and segment regularization, improving segmentation quality across models and datasets.
Findings
Improved F1 and Edit scores across three benchmark datasets.
Enhanced boundary localization and segment consistency.
Maintained frame-wise accuracy despite improved segmentation.
Abstract
Recent progress in Temporal Action Segmentation (TAS) has increasingly relied on complex architectures, which can hinder practical deployment. We present a lightweight dual-loss training framework that improves fine-grained segmentation quality with only one additional output channel and two auxiliary loss terms, requiring minimal architectural modification. Our approach combines a boundary-regression loss that promotes accurate temporal localization via a single-channel boundary prediction and a CDF-based segment-level regularization loss that encourages coherent within-segment structure by matching cumulative distributions over predicted and ground-truth segments. The framework is architecture-agnostic and can be integrated into existing TAS models (e.g., MS-TCN, C2F-TCN, FACT) as a training-time loss function. Across three benchmark datasets, the proposed method improves…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
