Boundary-Centric Active Learning for Temporal Action Segmentation
Halil Ismail Helvaci, Sen-ching Samson Cheung

TL;DR
This paper introduces B-ACT, a boundary-focused active learning framework for temporal action segmentation that improves label efficiency by targeting high-leverage boundary regions in videos.
Contribution
It proposes a novel boundary score and annotation protocol that selectively labels boundary frames, enhancing active learning for TAS tasks.
Findings
B-ACT outperforms existing active learning baselines on multiple datasets.
Boundary-centric supervision improves label efficiency significantly.
The approach achieves the best results under sparse annotation budgets.
Abstract
Temporal action segmentation (TAS) demands dense temporal supervision, yet most of the annotation cost in untrimmed videos is spent identifying and refining action transitions, where segmentation errors concentrate and small temporal shifts disproportionately degrade segmental metrics. We introduce B-ACT, a clip-budgeted active learning framework that explicitly allocates supervision to these high-leverage boundary regions. B-ACT operates in a hierarchical two-stage loop: (i) it ranks and queries unlabeled videos using predictive uncertainty, and (ii) within each selected video, it detects candidate transitions from the current model predictions and selects the top- boundaries via a novel boundary score that fuses neighborhood uncertainty, class ambiguity, and temporal predictive dynamics. Importantly, our annotation protocol requests labels for only the boundary frames while still…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
