Deep set conditioned latent representations for action recognition

Akash Singh; Tom De Schepper; Kevin Mets; Peter Hellinckx; Jose; Oramas; Steven Latre

arXiv:2212.11030·cs.CV·December 22, 2022

Deep set conditioned latent representations for action recognition

Akash Singh, Tom De Schepper, Kevin Mets, Peter Hellinckx, Jose, Oramas, Steven Latre

PDF

TL;DR

This paper introduces deep set conditioned latent representations for improved multi-label video action recognition, leveraging relational reasoning and set-based latent features to better identify atomic and composite actions.

Contribution

It proposes SCI3D, a two-stream relational network utilizing latent set representations and reasoning to enhance action recognition accuracy.

Findings

01

Achieved 1.49% higher mAP in atomic action recognition.

02

Achieved 17.57% higher mAP in composite action recognition.

03

Demonstrated benefits of relational inductive biases and set-based latent representations.

Abstract

In recent years multi-label, multi-class video action recognition has gained significant popularity. While reasoning over temporally connected atomic actions is mundane for intelligent species, standard artificial neural networks (ANN) still struggle to classify them. In the real world, atomic actions often temporally connect to form more complex composite actions. The challenge lies in recognising composite action of varying durations while other distinct composite or atomic actions occur in the background. Drawing upon the success of relational networks, we propose methods that learn to reason over the semantic concept of objects and actions. We empirically show how ANNs benefit from pretraining, relational inductive biases and unordered set-based latent representations. In this paper we propose deep set conditioned I3D (SCI3D), a two stream relational network that employs latent…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.