From Language to Logic: A Theoretical Architecture for VLM-Grounded Safe Navigation

Kristy Sakano; Kalonji Harrington; and Mumu Xu

arXiv:2605.04327·cs.RO·May 7, 2026

From Language to Logic: A Theoretical Architecture for VLM-Grounded Safe Navigation

Kristy Sakano, Kalonji Harrington, and Mumu Xu

PDF

TL;DR

This paper introduces a formal architecture for autonomous robot navigation that integrates human safety rules and preferences using natural language, translating them into logic specifications for runtime planning and monitoring.

Contribution

It presents a novel framework combining vision-language models with formal logic to enable zero-shot scene understanding and safe navigation in unstructured outdoor environments.

Findings

01

Successfully translates natural language rules into STL specifications.

02

Grounds environment-centric rules into a 2D cost map.

03

Demonstrates runtime monitoring of dynamic safety requirements.

Abstract

We propose an architecture for integrating high-level, human-provided safety rules and operator-aligned semantic preferences into autonomous robot navigation in unstructured outdoor environments. In our approach, natural-language rules are translated into Signal Temporal Logic (STL) specifications that guide planning and navigation during runtime. Persistent, environment-centric rules and terrain preferences are grounded into a 2D cost map, while temporally dynamic requirements are expressed as STL specifications to be monitored during runtime. We hypothesize the use of Vision-Language Models (VLMs) for zero-shot scene understanding, enabling mapping between human instructions, semantic features, and environmental constraints. Within this framework, we construct an illustrative navigation model that is designed to satisfy a set of STL-encoded specifications and soft operator preferences…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.