Classification of assembly tasks combining multiple primitive actions using Transformers and xLSTMs

Miguel Neves; Pedro Neto

arXiv:2505.18012·cs.RO·May 26, 2025

Classification of assembly tasks combining multiple primitive actions using Transformers and xLSTMs

Miguel Neves, Pedro Neto

PDF

TL;DR

This paper compares LSTM, Transformer, and xLSTM models for classifying long assembly tasks from hand landmarks, demonstrating that xLSTM generalizes better to new operators and slightly outperforms Transformers.

Contribution

It introduces the application of xLSTM for classifying complex assembly tasks and compares its performance with LSTM and Transformer models in a human-robot collaboration context.

Findings

01

xLSTM outperforms LSTM and Transformer in accuracy.

02

Transformers perform well on trained operators but less on new ones.

03

xLSTM shows better generalization to unseen operators.

Abstract

The classification of human-performed assembly tasks is essential in collaborative robotics to ensure safety, anticipate robot actions, and facilitate robot learning. However, achieving reliable classification is challenging when segmenting tasks into smaller primitive actions is unfeasible, requiring us to classify long assembly tasks that encompass multiple primitive actions. In this study, we propose classifying long assembly sequential tasks based on hand landmark coordinates and compare the performance of two well-established classifiers, LSTM and Transformer, as well as a recent model, xLSTM. We used the HRC scenario proposed in the CT benchmark, which includes long assembly tasks that combine actions such as insertions, screw fastenings, and snap fittings. Testing was conducted using sequences gathered from both the human operator who performed the training sequences and three…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.