Tactical Decision for Multi-UGV Confrontation with a Vision-Language Model-Based Commander
Li Wang, Qizhen Wu, Lei Chen

TL;DR
This paper introduces a novel vision-language model-based commander for multi-UGV confrontations, enabling interpretable, strategic decision-making from perception, outperforming traditional rule-based and reinforcement learning methods in simulation.
Contribution
It presents a unified perception-decision framework using vision-language and large language models for autonomous tactical decisions in complex environments.
Findings
Achieves over 80% win rate in simulations.
Provides interpretable and adaptable decision-making.
Establishes a cognitive-like process for autonomous agents.
Abstract
In multiple unmanned ground vehicle confrontations, autonomously evolving multi-agent tactical decisions from situational awareness remain a significant challenge. Traditional handcraft rule-based methods become vulnerable in the complicated and transient battlefield environment, and current reinforcement learning methods mainly focus on action manipulation instead of strategic decisions due to lack of interpretability. Here, we propose a vision-language model-based commander to address the issue of intelligent perception-to-decision reasoning in autonomous confrontations. Our method integrates a vision language model for scene understanding and a lightweight large language model for strategic reasoning, achieving unified perception and decision within a shared semantic space, with strong adaptability and interpretability. Unlike rule-based search and reinforcement learning methods, the…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsMilitary Defense Systems Analysis · Military Strategy and Technology · UAV Applications and Optimization
MethodsFocus
