Human-AI Divergence in Ego-centric Action Recognition under Spatial and Spatiotemporal Manipulations
Sadegh Rahmaniboldaji, Filip Rybansky, Quoc C. Vuong, Anya C. Hurlbert, Frank Guerin, Andrew Gilbert

TL;DR
This study compares human and AI performance in egocentric action recognition under spatial and temporal manipulations, revealing humans rely on sparse cues while models depend on contextual features, highlighting key differences in robustness.
Contribution
The paper introduces a large-scale comparative analysis of human and AI egocentric action recognition using MIRCs, revealing distinct reliance on visual cues and sensitivities to spatial and temporal disruptions.
Findings
Humans sharply decline in recognition with spatial reduction, relying on hand-object cues.
Models degrade gradually, often relying on context and low-level features.
Humans are robust to temporal scrambling if spatial cues are preserved.
Abstract
Humans consistently outperform state-of-the-art AI models in action recognition, particularly in challenging real-world conditions involving low resolution, occlusion, and visual clutter. Understanding the sources of this performance gap is essential for developing more robust and human-aligned models. In this paper, we present a large-scale human-AI comparative study of egocentric action recognition using Minimal Identifiable Recognition Crops (MIRCs), defined as the smallest spatial or spatiotemporal regions sufficient for reliable human recognition. We used our previously introduced, Epic ReduAct, a systematically spatially reduced and temporally scrambled dataset derived from 36 EPIC KITCHENS videos, spanning multiple spatial reduction levels and temporal conditions. Recognition performance is evaluated using over 3,000 human participants and the Side4Video model. Our analysis…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsHuman Pose and Action Recognition · Emotion and Mood Recognition · Action Observation and Synchronization
