Actor-Action Semantic Segmentation with Grouping Process Models
Chenliang Xu, Jason J. Corso

TL;DR
This paper introduces a dynamic grouping process model that combines local CRFs with hierarchical supervoxel decomposition for actor-action semantic segmentation, significantly improving accuracy on large-scale video datasets.
Contribution
It presents a novel dynamic model that integrates local CRFs with hierarchical supervoxels, enabling adaptive high-order grouping for better video segmentation.
Findings
Achieved 60% relative improvement over state-of-the-art methods.
Demonstrated effectiveness of bidirectional information flow during inference.
Validated on large-scale video dataset with significant accuracy gains.
Abstract
Actor-action semantic segmentation made an important step toward advanced video understanding problems: what action is happening; who is performing the action; and where is the action in space-time. Current models for this problem are local, based on layered CRFs, and are unable to capture long-ranging interaction of video parts. We propose a new model that combines these local labeling CRFs with a hierarchical supervoxel decomposition. The supervoxels provide cues for possible groupings of nodes, at various scales, in the CRFs to encourage adaptive, high-order groups for more effective labeling. Our model is dynamic and continuously exchanges information during inference: the local CRFs influence what supervoxels in the hierarchy are active, and these active nodes influence the connectivity in the CRF; we hence call it a grouping process model. The experimental results on a recent…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
Actor-Action Semantic Segmentation With Grouping Process Models· youtube
Taxonomy
TopicsHuman Pose and Action Recognition · Anomaly Detection Techniques and Applications · Advanced Vision and Imaging
