A Semantic Autonomy Framework for VLM-Integrated Indoor Mobile Robots: Hybrid Deterministic Reasoning and Cross-Robot Adaptive Memory
Bogdan Felician Abaza, Andrei-Alexandru Staicu, Cristian Vasile Doicin

TL;DR
This paper introduces a six-layer semantic autonomy framework for indoor robots that combines hybrid reasoning and adaptive memory, enabling efficient natural language understanding and knowledge transfer across robots on edge hardware.
Contribution
It presents a novel framework integrating deterministic and vision-language model reasoning with cross-robot memory transfer, reducing latency and improving semantic understanding in indoor navigation.
Findings
88% instruction resolution without VLM inference in under 0.1 ms
100% semantic transfer accuracy across robots
Achieved multi-robot operation on Raspberry Pi 5 hardware
Abstract
Autonomous indoor mobile robots can navigate reliably to metric coordinates using established frameworks such as ROS 2 Navigation 2, yet they lack the ability to interpret natural language instructions that express intent rather than positions. Vision-Language Models offer the semantic reasoning required to bridge this gap, but their inference latency (2-9 seconds per decision on consumer hardware) and session-by-session amnesia limit practical deployment. This paper presents the Semantic Autonomy Stack, a six-layer reference framework for semantically autonomous indoor navigation, and validates a complete instance featuring hybrid deterministic-VLM reasoning and cross-robot adaptive memory on physical robots with off-the-shelf edge hardware. A seven-step parametric resolver handles 88% of instructions in under 0.1 milliseconds without invoking a language model, camera, or GPU; only…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
