Global2Local: Efficient Structure Search for Video Action Segmentation
Shang-Hua Gao, Qi Han, Zhong-Yu Li, Pai Peng, Liang Wang, Ming-Ming, Cheng

TL;DR
This paper introduces a global-to-local search method to automatically discover effective receptive field combinations for video action segmentation, outperforming hand-designed patterns and enhancing existing models.
Contribution
It proposes a novel global-to-local search scheme to optimize receptive field combinations, replacing manual design in action segmentation models.
Findings
Achieves state-of-the-art performance on action segmentation benchmarks.
Effectively finds diverse receptive field combinations beyond human-designed patterns.
Improves model accuracy by optimizing receptive fields through search.
Abstract
Temporal receptive fields of models play an important role in action segmentation. Large receptive fields facilitate the long-term relations among video clips while small receptive fields help capture the local details. Existing methods construct models with hand-designed receptive fields in layers. Can we effectively search for receptive field combinations to replace hand-designed patterns? To answer this question, we propose to find better receptive field combinations through a global-to-local search scheme. Our search scheme exploits both global search to find the coarse combinations and local search to get the refined receptive field combination patterns further. The global search finds possible coarse combinations other than human-designed patterns. On top of the global search, we propose an expectation guided iterative local search scheme to refine combinations effectively. Our…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsHuman Pose and Action Recognition · Video Surveillance and Tracking Methods · Video Analysis and Summarization
