Detecting the Moment of Completion: Temporal Models for Localising Action Completion
Farnoosh Heidarivincheh, Majid Mirmehdi, Dima Damen

TL;DR
This paper evaluates Hidden Markov Models and LSTM networks for localizing the completion moment in actions like opening or drinking, demonstrating high accuracy using CNN features on a standard dataset.
Contribution
It compares temporal models for action completion detection and highlights the effectiveness of fine-tuned CNN features over hand-crafted features.
Findings
Both models localize completion within 10 frames for ~75% of sequences.
Fine-tuned CNN features outperform hand-crafted features.
Inclusion of incomplete sequences improves model robustness.
Abstract
Action completion detection is the problem of modelling the action's progression towards localising the moment of completion - when the action's goal is confidently considered achieved. In this work, we assess the ability of two temporal models, namely Hidden Markov Models (HMM) and Long-Short Term Memory (LSTM), to localise completion for six object interactions: switch, plug, open, pull, pick and drink. We use a supervised approach, where annotations of pre-completion and post-completion frames are available per action, and fine-tuned CNN features are used to train temporal models. Tested on the Action-Completion-2016 dataset, we detect completion within 10 frames of annotations for ~75% of completed action sequences using both temporal models. Results show that fine-tuned CNN features outperform hand-crafted features for localisation, and that observing incomplete instances is…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsHuman Pose and Action Recognition · Anomaly Detection Techniques and Applications · Video Analysis and Summarization
