Classification of assembly tasks combining multiple primitive actions using Transformers and xLSTMs
Miguel Neves, Pedro Neto

TL;DR
This paper compares LSTM, Transformer, and xLSTM models for classifying long assembly tasks from hand landmarks, demonstrating that xLSTM generalizes better to new operators and slightly outperforms Transformers.
Contribution
It introduces the application of xLSTM for classifying complex assembly tasks and compares its performance with LSTM and Transformer models in a human-robot collaboration context.
Findings
xLSTM outperforms LSTM and Transformer in accuracy.
Transformers perform well on trained operators but less on new ones.
xLSTM shows better generalization to unseen operators.
Abstract
The classification of human-performed assembly tasks is essential in collaborative robotics to ensure safety, anticipate robot actions, and facilitate robot learning. However, achieving reliable classification is challenging when segmenting tasks into smaller primitive actions is unfeasible, requiring us to classify long assembly tasks that encompass multiple primitive actions. In this study, we propose classifying long assembly sequential tasks based on hand landmark coordinates and compare the performance of two well-established classifiers, LSTM and Transformer, as well as a recent model, xLSTM. We used the HRC scenario proposed in the CT benchmark, which includes long assembly tasks that combine actions such as insertions, screw fastenings, and snap fittings. Testing was conducted using sequences gathered from both the human operator who performed the training sequences and three…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
