Unified Framework with Consistency across Modalities for Human Activity   Recognition

Tuyen Tran; Thao Minh Le; Hung Tran; Truyen Tran

arXiv:2409.02385·cs.CV·September 5, 2024

Unified Framework with Consistency across Modalities for Human Activity Recognition

Tuyen Tran, Thao Minh Le, Hung Tran, Truyen Tran

PDF

Open Access 1 Repo

TL;DR

This paper introduces a novel multimodal framework with a unique compositional query machine and a consistency loss to improve human activity recognition in videos by effectively leveraging multiple input modalities.

Contribution

The paper presents a new neural architecture called COMPUTER that models interactions across modalities and enforces prediction consistency, advancing multimodal human activity recognition.

Findings

01

Achieves superior performance on action localization tasks

02

Effectively leverages complementary information across modalities

03

Demonstrates robustness in group activity recognition

Abstract

Recognizing human activities in videos is challenging due to the spatio-temporal complexity and context-dependence of human interactions. Prior studies often rely on single input modalities, such as RGB or skeletal data, limiting their ability to exploit the complementary advantages across modalities. Recent studies focus on combining these two modalities using simple feature fusion techniques. However, due to the inherent disparities in representation between these input modalities, designing a unified neural network architecture to effectively leverage their complementary information remains a significant challenge. To address this, we propose a comprehensive multimodal framework for robust video-based human activity recognition. Our key contribution is the introduction of a novel compositional query machine, called COMPUTER ($\textbf{COMP}ositional h\textbf{U}man-cen\textbf{T}ric…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

tranxuantuyen/computer
pytorchOfficial

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsContext-Aware Activity Recognition Systems · Human Pose and Action Recognition

MethodsFocus