TL;DR
This paper introduces a weighted-averaging technique called Weight-Decider for better aggregation of clip features in action quality assessment, achieving state-of-the-art correlation scores on the MTL-AQA dataset.
Contribution
It proposes a learning-based weighted aggregation method (Weight-Decider) and evaluates ResNet architectures for improved action quality assessment.
Findings
Weighted-averaging improves performance over simple averaging.
ResNet-based features enhance action score prediction accuracy.
Achieved new state-of-the-art Spearman's rank correlation of 0.9315.
Abstract
Action quality assessment (AQA) aims at automatically judging human action based on a video of the said action and assigning a performance score to it. The majority of works in the existing literature on AQA divide RGB videos into short clips, transform these clips to higher-level representations using Convolutional 3D (C3D) networks, and aggregate them through averaging. These higher-level representations are used to perform AQA. We find that the current clip level feature aggregation technique of averaging is insufficient to capture the relative importance of clip level features. In this work, we propose a learning-based weighted-averaging technique. Using this technique, better performance can be obtained without sacrificing too much computational resources. We call this technique Weight-Decider(WD). We also experiment with ResNets for learning better representations for action…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
