TraMNet - Transition Matrix Network for Efficient Action Tube Proposals
Gurkirt Singh, Suman Saha, Fabio Cuzzolin

TL;DR
TraMNet introduces a transition matrix approach for efficient and accurate spatiotemporal action localization by modeling actor and camera movements, reducing computational complexity and improving detection robustness.
Contribution
The paper proposes a novel transition-matrix-based network that models movement between anchor proposals, enabling efficient and translation-invariant action tube proposals with sparse annotations.
Findings
Reduces proposal search space from exponential to manageable size.
Achieves effective action localization on multiple datasets.
Handles sparse annotations effectively.
Abstract
Current state-of-the-art methods solve spatiotemporal action localisation by extending 2D anchors to 3D-cuboid proposals on stacks of frames, to generate sets of temporally connected bounding boxes called \textit{action micro-tubes}. However, they fail to consider that the underlying anchor proposal hypotheses should also move (transition) from frame to frame, as the actor or the camera does. Assuming we evaluate 2D anchors in each frame, then the number of possible transitions from each 2D anchor to the next, for a sequence of consecutive frames, is in the order of , expensive even for small values of . To avoid this problem, we introduce a Transition-Matrix-based Network (TraMNet) which relies on computing transition probabilities between anchor proposals while maximising their overlap with ground truth bounding boxes across frames, and enforcing sparsity via a…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsHuman Pose and Action Recognition · Multimodal Machine Learning Applications · Advanced Vision and Imaging
