Attention Augmented ConvLSTM for Environment Prediction
Bernard Lange, Masha Itkina, Mykel J. Kochenderfer

TL;DR
This paper introduces two attention-augmented ConvLSTM models that significantly improve environment prediction accuracy in robotic systems by reducing blurring and preserving moving objects, demonstrated on real-world datasets.
Contribution
The paper proposes TAAConvLSTM and SAAConvLSTM, novel extensions to ConvLSTM incorporating attention mechanisms for better spatiotemporal environment prediction.
Findings
Improved prediction accuracy on KITTI and Waymo datasets.
Reduced blurring and better object preservation in predictions.
Enhanced suitability for safety-critical robotic applications.
Abstract
Safe and proactive planning in robotic systems generally requires accurate predictions of the environment. Prior work on environment prediction applied video frame prediction techniques to bird's-eye view environment representations, such as occupancy grids. ConvLSTM-based frameworks used previously often result in significant blurring and vanishing of moving objects, thus hindering their applicability for use in safety-critical applications. In this work, we propose two extensions to the ConvLSTM to address these issues. We present the Temporal Attention Augmented ConvLSTM (TAAConvLSTM) and Self-Attention Augmented ConvLSTM (SAAConvLSTM) frameworks for spatiotemporal occupancy prediction, and demonstrate improved performance over baseline architectures on the real-world KITTI and Waymo datasets.
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsMultimodal Machine Learning Applications · Video Surveillance and Tracking Methods · Human Pose and Action Recognition
MethodsSigmoid Activation · Convolution · Tanh Activation · ConvLSTM
