MiniVLA-Nav v1: A Multi-Scene Simulation Dataset for Language-Conditioned Robot Navigation

Ali Al-Bustami; Jaerock Kwon

arXiv:2605.00397·cs.RO·May 4, 2026

MiniVLA-Nav v1: A Multi-Scene Simulation Dataset for Language-Conditioned Robot Navigation

Ali Al-Bustami, Jaerock Kwon

PDF

1 Repo 1 Datasets

TL;DR

MiniVLA-Nav v1 is a comprehensive simulation dataset designed for evaluating language-conditioned robot navigation across diverse environments, with detailed annotations and multiple evaluation splits.

Contribution

The paper introduces MiniVLA-Nav v1, a new multi-scene simulation dataset for language-guided robot navigation with rich annotations and diverse evaluation benchmarks.

Findings

01

Dataset contains 1,174 episodes with synchronized images, depth, and segmentation masks.

02

Includes multiple environments, object categories, and paraphrase templates for robust evaluation.

03

Supports in-distribution, template-paraphrase, and OOD object-category benchmarking.

Abstract

We present MiniVLA-Nav v1, a simulation dataset for Language-Conditioned Object Approach (LCOA) navigation: given a short natural-language instruction, an NVIDIA Nova Carter differential-drive robot must navigate to the named object and stop within 1 m across four photorealistic Isaac Sim environments (Office, Hospital, Full Warehouse, and Warehouse with Multiple Shelves). Each of the 1,174 episodes pairs an instruction with synchronized 640x640 RGB images, metric depth maps (float32, metres), and instance segmentation masks, together with continuous (v,omega) and 7x7 tokenized expert action labels recorded at 60 Hz from a vision-based proportional controller. Trajectory diversity is ensured through three spawn-distance tiers (near: 1.5-3.5 m, mid: 3.5-7.0 m, far: global curated points; Pearson r=0.94 between spawn distance and trajectory length), 12 object categories, 18 training…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

https://huggingface.co/datasets/alibustami/miniVLA-Nav
github

Datasets

alibustami/miniVLA-Nav
dataset· 4.1k dl
4.1k dl

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.