Challenges of the Creation of a Dataset for Vision Based Human Hand Action Recognition in Industrial Assembly
Fabian Sturm, Elke Hergenroether, Julian Reinhardt, Petar Smilevski, Vojnovikj, Melanie Siegel

TL;DR
This paper introduces the Industrial Hand Action Dataset V1, a comprehensive industrial assembly dataset with diverse annotations, and demonstrates its effectiveness by training a Gated Transformer Network achieving 86.25% accuracy.
Contribution
The creation of a large, annotated industrial hand action dataset with unique features like occlusions and interactions, and its use to train a state-of-the-art transformer model.
Findings
Dataset contains over 2 million images with diverse industrial hand actions.
Gated Transformer Network achieved 86.25% accuracy on the dataset.
Dataset meets technical and legal requirements for industrial applications.
Abstract
This work presents the Industrial Hand Action Dataset V1, an industrial assembly dataset consisting of 12 classes with 459,180 images in the basic version and 2,295,900 images after spatial augmentation. Compared to other freely available datasets tested, it has an above-average duration and, in addition, meets the technical and legal requirements for industrial assembly lines. Furthermore, the dataset contains occlusions, hand-object interaction, and various fine-grained human hand actions for industrial assembly tasks that were not found in combination in examined datasets. The recorded ground truth assembly classes were selected after extensive observation of real-world use cases. A Gated Transformer Network, a state-of-the-art model from the transformer domain was adapted, and proved with a test accuracy of 86.25% before hyperparameter tuning by 18,269,959 trainable parameters, that…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsHand Gesture Recognition Systems · Human Pose and Action Recognition · Occupational Health and Safety Research
MethodsAttention Is All You Need · Test · Linear Layer · Dropout · Layer Normalization · Residual Connection · Multi-Head Attention · Byte Pair Encoding · Position-Wise Feed-Forward Layer · Softmax
