
TL;DR
This paper introduces a novel action parsing algorithm that leverages context features and dynamic programming to improve video segmentation accuracy, demonstrated on the Breakfast dataset.
Contribution
It presents a new action parsing method that incorporates context features and dynamic programming for better segmentation of video sequences.
Findings
Improved segmentation accuracy over existing methods
Effective use of context features enhances action segmentation
Demonstrated on Breakfast dataset with positive results
Abstract
We propose an action parsing algorithm to parse a video sequence containing an unknown number of actions into its action segments. We argue that context information, particularly the temporal information about other actions in the video sequence, is valuable for action segmentation. The proposed parsing algorithm temporally segments the video sequence into action segments. The optimal temporal segmentation is found using a dynamic programming search algorithm that optimizes the overall classification confidence score. The classification score of each segment is determined using local features calculated from that segment as well as context features calculated from other candidate action segments of the sequence. Experimental results on the Breakfast activity data-set showed improved segmentation accuracy compared to existing state-of-the-art parsing techniques.
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
