TL;DR
TSA-Net introduces a novel tube self-attention module with a tracker for efficient, flexible, and high-performing action quality assessment in videos, surpassing existing methods on multiple datasets.
Contribution
The paper presents TSA-Net, a new action quality assessment model incorporating a tube self-attention module and a tracker, improving efficiency and performance over prior approaches.
Findings
Achieves state-of-the-art results on AQA-7 and MTL-AQA datasets.
Introduces a new dataset for figure skating action assessment.
Demonstrates high computational efficiency and flexibility.
Abstract
In recent years, assessing action quality from videos has attracted growing attention in computer vision community and human computer interaction. Most existing approaches usually tackle this problem by directly migrating the model from action recognition tasks, which ignores the intrinsic differences within the feature map such as foreground and background information. To address this issue, we propose a Tube Self-Attention Network (TSA-Net) for action quality assessment (AQA). Specifically, we introduce a single object tracker into AQA and propose the Tube Self-Attention Module (TSA), which can efficiently generate rich spatio-temporal contextual information by adopting sparse feature interactions. The TSA module is embedded in existing video networks to form TSA-Net. Overall, our TSA-Net is with the following merits: 1) High computational efficiency, 2) High flexibility, and 3) The…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
