Unsupervised Human Action Detection by Action Matching
Basura Fernando, Sareh Shirazi, Stephen Gould

TL;DR
This paper introduces a novel unsupervised task of human action detection by matching video segments across long videos without category labels, enabling meaningful video alignment and potential applications in action discovery and video summarization.
Contribution
It proposes an effective unsupervised method utilizing temporal encoding and consistency to detect matching human action segments in videos without supervision.
Findings
Achieved 21.6% precision and 11.7% recall on MPII Cooking dataset.
Achieved 18.4% precision and 25.1% recall on THUMOS15 dataset.
Demonstrated the method's effectiveness across multiple activity recognition benchmarks.
Abstract
We propose a new task of unsupervised action detection by action matching. Given two long videos, the objective is to temporally detect all pairs of matching video segments. A pair of video segments are matched if they share the same human action. The task is category independent---it does not matter what action is being performed---and no supervision is used to discover such video segments. Unsupervised action detection by action matching allows us to align videos in a meaningful manner. As such, it can be used to discover new action categories or as an action proposal technique within, say, an action detection pipeline. Moreover, it is a useful pre-processing step for generating video highlights, e.g., from sports videos. We present an effective and efficient method for unsupervised action detection. We use an unsupervised temporal encoding method and exploit the temporal…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsHuman Pose and Action Recognition · Multimodal Machine Learning Applications · Video Analysis and Summarization
