TL;DR
This paper introduces SMART, a joint frame selection method that improves action recognition accuracy and reduces computational costs by selecting representative frames, outperforming existing strategies across multiple benchmarks.
Contribution
The paper proposes a novel joint frame selection approach called SMART that enhances action recognition accuracy and efficiency, applicable to both trimmed and untrimmed videos.
Findings
SMART reduces computational cost by 4 to 10 times.
SMART consistently outperforms other frame selection strategies.
The method improves recognition accuracy across multiple benchmarks.
Abstract
Action recognition is computationally expensive. In this paper, we address the problem of frame selection to improve the accuracy of action recognition. In particular, we show that selecting good frames helps in action recognition performance even in the trimmed videos domain. Recent work has successfully leveraged frame selection for long, untrimmed videos, where much of the content is not relevant, and easy to discard. In this work, however, we focus on the more standard short, trimmed action recognition problem. We argue that good frame selection can not only reduce the computational cost of action recognition but also increase the accuracy by getting rid of frames that are hard to classify. In contrast to previous work, we propose a method that instead of selecting frames by considering one at a time, considers them jointly. This results in a more efficient selection, where good…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
