A Semantic Autonomy Framework for VLM-Integrated Indoor Mobile Robots: Hybrid Deterministic Reasoning and Cross-Robot Adaptive Memory

Bogdan Felician Abaza; Andrei-Alexandru Staicu; Cristian Vasile Doicin

arXiv:2605.02525·cs.RO·May 5, 2026

A Semantic Autonomy Framework for VLM-Integrated Indoor Mobile Robots: Hybrid Deterministic Reasoning and Cross-Robot Adaptive Memory

Bogdan Felician Abaza, Andrei-Alexandru Staicu, Cristian Vasile Doicin

PDF

TL;DR

This paper introduces a six-layer semantic autonomy framework for indoor robots that combines hybrid reasoning and adaptive memory, enabling efficient natural language understanding and knowledge transfer across robots on edge hardware.

Contribution

It presents a novel framework integrating deterministic and vision-language model reasoning with cross-robot memory transfer, reducing latency and improving semantic understanding in indoor navigation.

Findings

01

88% instruction resolution without VLM inference in under 0.1 ms

02

100% semantic transfer accuracy across robots

03

Achieved multi-robot operation on Raspberry Pi 5 hardware

Abstract

Autonomous indoor mobile robots can navigate reliably to metric coordinates using established frameworks such as ROS 2 Navigation 2, yet they lack the ability to interpret natural language instructions that express intent rather than positions. Vision-Language Models offer the semantic reasoning required to bridge this gap, but their inference latency (2-9 seconds per decision on consumer hardware) and session-by-session amnesia limit practical deployment. This paper presents the Semantic Autonomy Stack, a six-layer reference framework for semantically autonomous indoor navigation, and validates a complete instance featuring hybrid deterministic-VLM reasoning and cross-robot adaptive memory on physical robots with off-the-shelf edge hardware. A seven-step parametric resolver handles 88% of instructions in under 0.1 milliseconds without invoking a language model, camera, or GPU; only…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.