Optimizing Multitask Industrial Processes with Predictive Action Guidance
Naval Kishore Mehta, Arvind, Shyam Sunder Prasad, Sumeet Saurav,, Sanjay Singh

TL;DR
This paper presents a multimodal transformer-based system for egocentric activity anticipation in industrial assembly, providing proactive guidance and deviation detection to improve productivity and compliance.
Contribution
It introduces the MMTFRU network with multimodal fusion and integrated guidance strategies, advancing predictive accuracy and anomaly detection in complex industrial tasks.
Findings
Effective in predicting next actions in industrial assembly
Improves operator guidance and deviation detection
Validated on Meccano and EPIC-Kitchens-55 datasets
Abstract
Monitoring complex assembly processes is critical for maintaining productivity and ensuring compliance with assembly standards. However, variability in human actions and subjective task preferences complicate accurate task anticipation and guidance. To address these challenges, we introduce the Multi-Modal Transformer Fusion and Recurrent Units (MMTFRU) Network for egocentric activity anticipation, utilizing multimodal fusion to improve prediction accuracy. Integrated with the Operator Action Monitoring Unit (OAMU), the system provides proactive operator guidance, preventing deviations in the assembly process. OAMU employs two strategies: (1) Top-5 MMTF-RU predictions, combined with a reference graph and an action dictionary, for next-step recommendations; and (2) Top-1 MMTF-RU predictions, integrated with a reference graph, for detecting sequence deviations and predicting anomaly…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsManufacturing Process and Optimization · Digital Transformation in Industry
MethodsAbsolute Position Encodings · Softmax · Linear Layer · Attention Is All You Need · Adam · Residual Connection · Dropout · Multi-Head Attention · Position-Wise Feed-Forward Layer · Label Smoothing
