TraMNet - Transition Matrix Network for Efficient Action Tube Proposals

Gurkirt Singh; Suman Saha; Fabio Cuzzolin

arXiv:1808.00297·eess.IV·August 2, 2018

TraMNet - Transition Matrix Network for Efficient Action Tube Proposals

Gurkirt Singh, Suman Saha, Fabio Cuzzolin

PDF

Open Access 1 Repo

TL;DR

TraMNet introduces a transition matrix approach for efficient and accurate spatiotemporal action localization by modeling actor and camera movements, reducing computational complexity and improving detection robustness.

Contribution

The paper proposes a novel transition-matrix-based network that models movement between anchor proposals, enabling efficient and translation-invariant action tube proposals with sparse annotations.

Findings

01

Reduces proposal search space from exponential to manageable size.

02

Achieves effective action localization on multiple datasets.

03

Handles sparse annotations effectively.

Abstract

Current state-of-the-art methods solve spatiotemporal action localisation by extending 2D anchors to 3D-cuboid proposals on stacks of frames, to generate sets of temporally connected bounding boxes called \textit{action micro-tubes}. However, they fail to consider that the underlying anchor proposal hypotheses should also move (transition) from frame to frame, as the actor or the camera does. Assuming we evaluate $n$ 2D anchors in each frame, then the number of possible transitions from each 2D anchor to the next, for a sequence of $f$ consecutive frames, is in the order of $O (n^{f})$ , expensive even for small values of $f$ . To avoid this problem, we introduce a Transition-Matrix-based Network (TraMNet) which relies on computing transition probabilities between anchor proposals while maximising their overlap with ground truth bounding boxes across frames, and enforcing sparsity via a…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

gurkirt/AMTNet
pytorch

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsHuman Pose and Action Recognition · Multimodal Machine Learning Applications · Advanced Vision and Imaging