AURA: Multimodal Shared Autonomy for Real-World Urban Navigation

Yukai Ma; Honglin He; Selina Song; Wayne Wu; and Bolei Zhou

arXiv:2604.01659·cs.RO·April 3, 2026

AURA: Multimodal Shared Autonomy for Real-World Urban Navigation

Yukai Ma, Honglin He, Selina Song, Wayne Wu, and Bolei Zhou

PDF

TL;DR

AURA introduces a multimodal shared autonomy framework for urban navigation that reduces human effort and enhances safety by decomposing navigation tasks and aligning instructions with visual context.

Contribution

It presents a novel multi-modal framework with a spatial-aware instruction encoder and a large-scale dataset for urban navigation shared autonomy.

Findings

01

AURA effectively follows human instructions in simulation and real-world tests.

02

It reduces manual operation effort and improves navigation stability.

03

Shared autonomy decreases takeover frequency by over 44%.

Abstract

Long-horizon navigation in complex urban environments relies heavily on continuous human operation, which leads to fatigue, reduced efficiency, and safety concerns. Shared autonomy, where a Vision-Language AI agent and a human operator collaborate on maneuvering the mobile machine, presents a promising solution to address these issues. However, existing shared autonomy methods often require humans and AI to operate within the same action space, leading to high cognitive overhead. We present Assistive Urban Robot Autonomy (AURA), a new multi-modal framework that decomposes urban navigation into high-level human instruction and low-level AI control. AURA incorporates a Spatial-Aware Instruction Encoder to align various human instructions with visual and spatial context. To facilitate training, we construct MM-CoS, a large-scale dataset comprising teleoperation and vision-language…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.