Beyond Sequences: A Benchmark for Atomic Hand-Object Interaction Using a Static RNN Encoder
Yousef Azizi Movahed, Fatemeh Ziaeetabar

TL;DR
This paper introduces a static RNN encoder approach for fine-grained hand-object interaction classification, achieving high accuracy and setting a new benchmark with structured features and lightweight models.
Contribution
It demonstrates that a static RNN encoder with sequence length one can outperform traditional sequential models in hand-object interaction classification.
Findings
Achieved 97.60% accuracy on the MANIAC dataset.
Successfully classified the challenging 'grabbing' transition with a 0.90 F1-score.
Proposed a new benchmark for low-level hand-object interaction recognition.
Abstract
Reliably predicting human intent in hand-object interactions is an open challenge for computer vision. Our research concentrates on a fundamental sub-problem: the fine-grained classification of atomic interaction states, namely 'approaching', 'grabbing', and 'holding'. To this end, we introduce a structured data engineering process that converts raw videos from the MANIAC dataset into 27,476 statistical-kinematic feature vectors. Each vector encapsulates relational and dynamic properties from a short temporal window of motion. Our initial hypothesis posited that sequential modeling would be critical, leading us to compare static classifiers (MLPs) against temporal models (RNNs). Counter-intuitively, the key discovery occurred when we set the sequence length of a Bidirectional RNN to one (seq_length=1). This modification converted the network's function, compelling it to act as a…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsHuman Pose and Action Recognition · Hand Gesture Recognition Systems · Multimodal Machine Learning Applications
