Localization-Aware Multi-Scale Representation Learning for Repetitive   Action Counting

Sujia Wang; Xiangwei Shen; Yansong Tang; Xin Dong; Wenjia Geng; and; Lei Chen

arXiv:2501.07312·cs.CV·January 14, 2025

Localization-Aware Multi-Scale Representation Learning for Repetitive Action Counting

Sujia Wang, Xiangwei Shen, Yansong Tang, Xin Dong, Wenjia Geng, and, Lei Chen

PDF

TL;DR

This paper presents a novel localization-aware multi-scale learning framework that improves repetitive action counting in videos by reducing noise impact and capturing flexible temporal correlations.

Contribution

It introduces a localization-aware multi-scale representation learning framework with scale-specific and localization modules for robust, noise-resistant action counting.

Findings

01

Outperforms existing methods on RepCountA and UCFRep datasets.

02

Effectively reduces noise impact in repetitive action counting.

03

Enhances temporal correlation modeling across various action frequencies.

Abstract

Repetitive action counting (RAC) aims to estimate the number of class-agnostic action occurrences in a video without exemplars. Most current RAC methods rely on a raw frame-to-frame similarity representation for period prediction. However, this approach can be significantly disrupted by common noise such as action interruptions and inconsistencies, leading to sub-optimal counting performance in realistic scenarios. In this paper, we introduce a foreground localization optimization objective into similarity representation learning to obtain more robust and efficient video features. We propose a Localization-Aware Multi-Scale Representation Learning (LMRL) framework. Specifically, we apply a Multi-Scale Period-Aware Representation (MPR) with a scale-specific design to accommodate various action frequencies and learn more flexible temporal correlations. Furthermore, we introduce the…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.