Kinaema: a recurrent sequence model for memory and pose in motion
Mert Bulent Sariyildiz, Philippe Weinzaepfel, Guillaume Bono, Gianluca Monaci, Christian Wolf

TL;DR
Kinaema is a novel recurrent sequence model that maintains an implicit memory for spatial navigation, enabling robots to efficiently locate themselves in previously seen environments without explicit history storage.
Contribution
The paper introduces Kinaema, a transformer-based recurrent model that compresses sensor history into an implicit latent memory for improved spatial awareness in robotics.
Findings
Kinaema effectively predicts relative positions in large scenes.
The model enables goal-oriented navigation based on prior observations.
It outperforms classical transformer approaches in efficiency.
Abstract
One key aspect of spatially aware robots is the ability to "find their bearings", ie. to correctly situate themselves in previously seen spaces. In this work, we focus on this particular scenario of continuous robotics operations, where information observed before an actual episode start is exploited to optimize efficiency. We introduce a new model, Kinaema, and agent, capable of integrating a stream of visual observations while moving in a potentially large scene, and upon request, processing a query image and predicting the relative position of the shown space with respect to its current position. Our model does not explicitly store an observation history, therefore does not have hard constraints on context length. It maintains an implicit latent memory, which is updated by a transformer in a recurrent way, compressing the history of sensor readings into a compact representation. We…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
Taxonomy
TopicsRobotics and Sensor-Based Localization · Multimodal Machine Learning Applications · Social Robot Interaction and HRI
