ReL-SAR: Representation Learning for Skeleton Action Recognition with   Convolutional Transformers and BYOL

Safwen Naimi; Wassim Bouachir; Guillaume-Alexandre Bilodeau

arXiv:2409.05749·cs.CV·September 10, 2024

ReL-SAR: Representation Learning for Skeleton Action Recognition with Convolutional Transformers and BYOL

Safwen Naimi, Wassim Bouachir, Guillaume-Alexandre Bilodeau

PDF

Open Access 1 Repo

TL;DR

ReL-SAR introduces an unsupervised learning framework combining convolutional transformers and BYOL to improve skeleton action recognition, especially on limited data, by capturing spatial-temporal features efficiently.

Contribution

The paper proposes a novel lightweight convolutional transformer framework with a joint spatial-temporal modeling approach and a selection-permutation strategy, leveraging BYOL for unsupervised skeleton action recognition.

Findings

01

Achieved competitive results on multiple datasets.

02

Outperformed state-of-the-art methods in accuracy.

03

Demonstrated high computational efficiency.

Abstract

To extract robust and generalizable skeleton action recognition features, large amounts of well-curated data are typically required, which is a challenging task hindered by annotation and computation costs. Therefore, unsupervised representation learning is of prime importance to leverage unlabeled skeleton data. In this work, we investigate unsupervised representation learning for skeleton action recognition. For this purpose, we designed a lightweight convolutional transformer framework, named ReL-SAR, exploiting the complementarity of convolutional and attention layers for jointly modeling spatial and temporal cues in skeleton sequences. We also use a Selection-Permutation strategy for skeleton joints to ensure more informative descriptions from skeletal data. Finally, we capitalize on Bootstrap Your Own Latent (BYOL) to learn robust representations from unlabeled skeleton sequence…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

safwennaimi/representation-learning-for-skeleton-action-recognition-with-convolutional-transformers-and-byol
pytorchOfficial

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsMedical Imaging and Analysis · Human Pose and Action Recognition · Anomaly Detection Techniques and Applications

MethodsSoftmax · Attention Is All You Need