Memory-based gaze prediction in deep imitation learning for robot manipulation
Heecheol Kim, Yoshiyuki Ohmura, Yasuo Kuniyoshi

TL;DR
This paper introduces a Transformer-based gaze prediction method that enables robots to perform manipulation tasks requiring memory of past states, addressing limitations of reactive control in complex environments.
Contribution
It presents a novel memory-augmented gaze prediction approach using self-attention, enhancing deep imitation learning for complex robot manipulation tasks.
Findings
Effective gaze prediction in real robot tasks
Improved manipulation performance with memory integration
Transformer-based model outperforms reactive methods
Abstract
Deep imitation learning is a promising approach that does not require hard-coded control rules in autonomous robot manipulation. The current applications of deep imitation learning to robot manipulation have been limited to reactive control based on the states at the current time step. However, future robots will also be required to solve tasks utilizing their memory obtained by experience in complicated environments (e.g., when the robot is asked to find a previously used object on a shelf). In such a situation, simple deep imitation learning may fail because of distractions caused by complicated environments. We propose that gaze prediction from sequential visual input enables the robot to perform a manipulation task that requires memory. The proposed algorithm uses a Transformer-based self-attention architecture for the gaze estimation based on sequential data to implement memory.…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsGaze Tracking and Assistive Technology · Robotics and Automated Systems · Domain Adaptation and Few-Shot Learning
