A Unified Perception-Language-Action Framework for Adaptive Autonomous Driving
Yi Zhang, Erik Leo Ha{\ss}, Kuo-Yi Chao, Nenad Petrovic, Yinglei Song, Chengdong Wu, Alois Knoll

TL;DR
This paper introduces a unified perception-language-action framework for autonomous driving that combines multi-sensor data with large language model reasoning to improve adaptability, interpretability, and safety in complex environments.
Contribution
It presents a novel integrated architecture using GPT-4.1 to unify perception, language understanding, and action planning for autonomous vehicles.
Findings
Superior trajectory tracking and speed prediction in urban scenarios
Enhanced adaptive planning in complex environments
Demonstrated improved safety and interpretability
Abstract
Autonomous driving systems face significant challenges in achieving human-like adaptability, robustness, and interpretability in complex, open-world environments. These challenges stem from fragmented architectures, limited generalization to novel scenarios, and insufficient semantic extraction from perception. To address these limitations, we propose a unified Perception-Language-Action (PLA) framework that integrates multi-sensor fusion (cameras, LiDAR, radar) with a large language model (LLM)-augmented Vision-Language-Action (VLA) architecture, specifically a GPT-4.1-powered reasoning core. This framework unifies low-level sensory processing with high-level contextual reasoning, tightly coupling perception with natural language-based semantic understanding and decision-making to enable context-aware, explainable, and safety-bounded autonomous driving. Evaluations on an urban…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsRobotic Path Planning Algorithms · Robotics and Automated Systems
