Transformer-XL for Long Sequence Tasks in Robotic Learning from Demonstration
Gao Tianci

TL;DR
This paper introduces a Transformer-XL based framework for robotic learning from demonstrations, effectively handling long multi-modal sequences to improve perception, decision-making, and task success rates.
Contribution
It presents a novel application of Transformer-XL in robotic LfD, integrating multi-modal sensory data for better long-term dependency modeling.
Findings
Significant improvements in task success rates and accuracy.
Enhanced computational efficiency over LSTM and CNN methods.
Robust perception and decision-making in robotic tasks.
Abstract
This paper presents an innovative application of Transformer-XL for long sequence tasks in robotic learning from demonstrations (LfD). The proposed framework effectively integrates multi-modal sensor inputs, including RGB-D images, LiDAR, and tactile sensors, to construct a comprehensive feature vector. By leveraging the advanced capabilities of Transformer-XL, particularly its attention mechanism and position encoding, our approach can handle the inherent complexities and long-term dependencies of multi-modal sensory data. The results of an extensive empirical evaluation demonstrate significant improvements in task success rates, accuracy, and computational efficiency compared to conventional methods such as Long Short-Term Memory (LSTM) networks and Convolutional Neural Networks (CNNs). The findings indicate that the Transformer-XL-based framework not only enhances the robot's…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsRobotics and Sensor-Based Localization · Robotics and Automated Systems
