Deep set conditioned latent representations for action recognition
Akash Singh, Tom De Schepper, Kevin Mets, Peter Hellinckx, Jose, Oramas, Steven Latre

TL;DR
This paper introduces deep set conditioned latent representations for improved multi-label video action recognition, leveraging relational reasoning and set-based latent features to better identify atomic and composite actions.
Contribution
It proposes SCI3D, a two-stream relational network utilizing latent set representations and reasoning to enhance action recognition accuracy.
Findings
Achieved 1.49% higher mAP in atomic action recognition.
Achieved 17.57% higher mAP in composite action recognition.
Demonstrated benefits of relational inductive biases and set-based latent representations.
Abstract
In recent years multi-label, multi-class video action recognition has gained significant popularity. While reasoning over temporally connected atomic actions is mundane for intelligent species, standard artificial neural networks (ANN) still struggle to classify them. In the real world, atomic actions often temporally connect to form more complex composite actions. The challenge lies in recognising composite action of varying durations while other distinct composite or atomic actions occur in the background. Drawing upon the success of relational networks, we propose methods that learn to reason over the semantic concept of objects and actions. We empirically show how ANNs benefit from pretraining, relational inductive biases and unordered set-based latent representations. In this paper we propose deep set conditioned I3D (SCI3D), a two stream relational network that employs latent…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
