LS-HAR: Language Supervised Human Action Recognition with Salient   Fusion, Construction Sites as a Use-Case

Mohammad Mahdavian; Mohammad Loni; Ted Samuelsson; Mo Chen

arXiv:2410.01962·cs.CV·March 6, 2025

LS-HAR: Language Supervised Human Action Recognition with Salient Fusion, Construction Sites as a Use-Case

Mohammad Mahdavian, Mohammad Loni, Ted Samuelsson, Mo Chen

PDF

Open Access

TL;DR

LS-HAR introduces a language-supervised approach for human action recognition that fuses skeleton and visual data using attention mechanisms, and provides a new dataset for construction site applications.

Contribution

The paper presents a novel language-guided feature extraction and salient fusion method for HAR, along with a new dataset for real-world construction site scenarios.

Findings

01

Achieves promising accuracy on multiple datasets

02

Demonstrates robustness across modalities

03

Provides a new dataset for construction site HAR

Abstract

Detecting human actions is a crucial task for autonomous robots and vehicles, often requiring the integration of various data modalities for improved accuracy. In this study, we introduce a novel approach to Human Action Recognition (HAR) using language supervision named LS-HAR based on skeleton and visual cues. Our method leverages a language model to guide the feature extraction process in the skeleton encoder. Specifically, we employ learnable prompts for the language model conditioned on the skeleton modality to optimize feature representation. Furthermore, we propose a fusion mechanism that combines dual-modality features using a salient fusion module, incorporating attention and transformer mechanisms to address the modalities' high dimensionality. This fusion process prioritizes informative video frames and body joints, enhancing the recognition accuracy of human actions.…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsOccupational Health and Safety Research

MethodsSoftmax · Attention Is All You Need