Action Selection Learning for Multi-label Multi-view Action Recognition

Trung Thanh Nguyen; Yasutomo Kawanishi; Takahiro Komamizu; Ichiro Ide

arXiv:2410.03302·cs.CV·October 21, 2024

Action Selection Learning for Multi-label Multi-view Action Recognition

Trung Thanh Nguyen, Yasutomo Kawanishi, Takahiro Komamizu, Ichiro Ide

PDF

1 Repo

TL;DR

This paper introduces MultiASL, a novel multi-view action recognition method that uses action selection learning with weak labels to improve view fusion and accurately recognize multiple actions in untrimmed videos.

Contribution

The study presents MultiASL, a new approach combining spatial-temporal transformers and pseudo ground-truth based action selection for weakly labeled multi-view action recognition.

Findings

01

Outperforms existing methods on MM-Office dataset

02

Effectively identifies relevant frames for action recognition

03

Enhances view fusion accuracy in weakly labeled scenarios

Abstract

Multi-label multi-view action recognition aims to recognize multiple concurrent or sequential actions from untrimmed videos captured by multiple cameras. Existing work has focused on multi-view action recognition in a narrow area with strong labels available, where the onset and offset of each action are labeled at the frame-level. This study focuses on real-world scenarios where cameras are distributed to capture a wide-range area with only weak labels available at the video-level. We propose the method named Multi-view Action Selection Learning (MultiASL), which leverages action selection learning to enhance view fusion by selecting the most useful information from different viewpoints. The proposed method includes a Multi-view Spatial-Temporal Transformer video encoder to extract spatial and temporal features from multi-viewpoint videos. Action Selection Learning is employed at the…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

thanhhff/MultiASL
pytorchOfficial

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

MethodsAttention Is All You Need · Dense Connections · Adam · Linear Layer · Residual Connection · Position-Wise Feed-Forward Layer · Label Smoothing · Dropout · Byte Pair Encoding · Absolute Position Encodings