Including Semantic Information via Word Embeddings for Skeleton-based Action Recognition

Dustin Aganian; Erik Franze; Markus Eisenbach; Horst-Michael Gross

arXiv:2506.18721·cs.CV·June 24, 2025

Including Semantic Information via Word Embeddings for Skeleton-based Action Recognition

Dustin Aganian, Erik Franze, Markus Eisenbach, Horst-Michael Gross

PDF

TL;DR

This paper proposes a novel skeleton-based action recognition method that incorporates semantic information via word embeddings, significantly improving classification accuracy and generalization in complex assembly tasks.

Contribution

It introduces a new approach that replaces one-hot encodings with semantic volumes using word embeddings to encode keypoint and object semantics.

Findings

01

Significant performance improvement on multiple assembly datasets

02

Enhanced generalization across different skeleton types and object classes

03

Effective encoding of semantic relationships improves recognition accuracy

Abstract

Effective human action recognition is widely used for cobots in Industry 4.0 to assist in assembly tasks. However, conventional skeleton-based methods often lose keypoint semantics, limiting their effectiveness in complex interactions. In this work, we introduce a novel approach to skeleton-based action recognition that enriches input representations by leveraging word embeddings to encode semantic information. Our method replaces one-hot encodings with semantic volumes, enabling the model to capture meaningful relationships between joints and objects. Through extensive experiments on multiple assembly datasets, we demonstrate that our approach significantly improves classification performance, and enhances generalization capabilities by simultaneously supporting different skeleton types and object classes. Our findings highlight the potential of incorporating semantic information to…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.