LiLMaps: Learnable Implicit Language Maps
Evgenii Kruzhkov, Sven Behnke

TL;DR
LiLMaps introduces a novel approach to implicit environment mapping by integrating vision-language features, improving scene understanding and interaction for autonomous robots using large language models.
Contribution
The paper presents a new decoder optimization technique for implicit language maps and addresses prediction inconsistencies across different viewpoints.
Findings
Enhanced mapping accuracy with vision-language integration
Improved performance in dynamic object scenarios
Effective handling of viewpoint prediction inconsistencies
Abstract
One of the current trends in robotics is to employ large language models (LLMs) to provide non-predefined command execution and natural human-robot interaction. It is useful to have an environment map together with its language representation, which can be further utilized by LLMs. Such a comprehensive scene representation enables numerous ways of interaction with the map for autonomously operating robots. In this work, we present an approach that enhances incremental implicit mapping through the integration of vision-language features. Specifically, we (i) propose a decoder optimization technique for implicit language maps which can be used when new objects appear on the scene, and (ii) address the problem of inconsistent vision-language predictions between different viewing positions. Our experiments demonstrate the effectiveness of LiLMaps and solid improvements in performance.
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsNatural Language Processing Techniques · Topic Modeling · Speech and dialogue systems
