SimOn: A Simple Framework for Online Temporal Action Localization

Tuan N. Tang; Jungin Park; Kwonyoung Kim; Kwanghoon Sohn

arXiv:2211.04905·cs.CV·November 10, 2022

SimOn: A Simple Framework for Online Temporal Action Localization

Tuan N. Tang, Jungin Park, Kwonyoung Kim, Kwanghoon Sohn

PDF

Open Access 1 Repo

TL;DR

SimOn introduces a simple Transformer-based framework for online temporal action localization that effectively predicts action instances from streaming videos without future frame access, outperforming previous methods on benchmark datasets.

Contribution

The paper presents a novel end-to-end Transformer framework for On-TAL that leverages past visual context and learnable embeddings, setting new state-of-the-art results.

Findings

01

Outperforms previous methods on THUMOS14 and ActivityNet1.3 datasets.

02

Achieves new state-of-the-art performance in online temporal action localization.

03

Demonstrates robustness and effectiveness in online detection of action start.

Abstract

Online Temporal Action Localization (On-TAL) aims to immediately provide action instances from untrimmed streaming videos. The model is not allowed to utilize future frames and any processing techniques to modify past predictions, making On-TAL much more challenging. In this paper, we propose a simple yet effective framework, termed SimOn, that learns to predict action instances using the popular Transformer architecture in an end-to-end manner. Specifically, the model takes the current frame feature as a query and a set of past context information as keys and values of the Transformer. Different from the prior work that uses a set of outputs of the model as past contexts, we leverage the past visual context and the learnable context embedding for the current query. Experimental results on the THUMOS14 and ActivityNet1.3 datasets show that our model remarkably outperforms the previous…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

tuantng/simon
pytorchOfficial

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsHuman Pose and Action Recognition · Anomaly Detection Techniques and Applications · Multimodal Machine Learning Applications

MethodsMulti-Head Attention · Attention Is All You Need · Label Smoothing · Dense Connections · Softmax · Position-Wise Feed-Forward Layer · Linear Layer · Adam · Absolute Position Encodings · Layer Normalization