HOI-aware Adaptive Network for Weakly-supervised Action Segmentation
Runzhong Zhang, Suchen Wang, Yueqi Duan, Yansong Tang, Yue Zhang, Yap-Peng Tan

TL;DR
This paper introduces AdaAct, an HOI-aware adaptive network for weakly-supervised action segmentation that leverages human-object interactions to improve accuracy in distinguishing similar actions.
Contribution
The novel integration of a video HOI encoder and a HyperNetwork for adaptive temporal encoding based on HOI sequences is proposed.
Findings
Effective in distinguishing similar actions like pouring juice and coffee.
Improves segmentation accuracy on Breakfast and 50Salads datasets.
Demonstrates robustness across different evaluation metrics.
Abstract
In this paper, we propose an HOI-aware adaptive network named AdaAct for weakly-supervised action segmentation. Most existing methods learn a fixed network to predict the action of each frame with the neighboring frames. However, this would result in ambiguity when estimating similar actions, such as pouring juice and pouring coffee. To address this, we aim to exploit temporally global but spatially local human-object interactions (HOI) as video-level prior knowledge for action segmentation. The long-term HOI sequence provides crucial contextual information to distinguish ambiguous actions, where our network dynamically adapts to the given HOI sequence at test time. More specifically, we first design a video HOI encoder that extracts, selects, and integrates the most representative HOI throughout the video. Then, we propose a two-branch HyperNetwork to learn an adaptive temporal…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
