CoMaTrack: Competitive Multi-Agent Game-Theoretic Tracking with Vision-Language-Action Models

Youzhi Liu; Li Gao; Liu Liu; Mingyang Lv; Yang Cai

arXiv:2603.22846·cs.AI·April 1, 2026

CoMaTrack: Competitive Multi-Agent Game-Theoretic Tracking with Vision-Language-Action Models

Youzhi Liu, Li Gao, Liu Liu, Mingyang Lv, Yang Cai

PDF

1 Repo

TL;DR

CoMaTrack introduces a multi-agent reinforcement learning framework for embodied visual tracking, enhancing robustness and adaptability through competitive scenarios and providing a new benchmark for evaluation.

Contribution

It presents a novel game-theoretic multi-agent training framework and a comprehensive open-source benchmark for language-conditioned embodied visual tracking.

Findings

01

Achieved state-of-the-art results on standard benchmarks and CoMaTrack-Bench.

02

A 3B VLM trained with CoMaTrack surpasses previous models on EVT-Bench.

03

The benchmark enables standardized robustness evaluation under adversarial interactions.

Abstract

Embodied Visual Tracking (EVT), a core dynamic task in embodied intelligence, requires an agent to precisely follow a language-specified target. Yet most existing methods rely on single-agent imitation learning, suffering from costly expert data and limited generalization due to static training environments. Inspired by competition-driven capability evolution, we propose CoMaTrack, a competitive game-theoretic multi-agent reinforcement learning framework that trains agents in a dynamic adversarial setting with competitive subtasks, yielding stronger adaptive planning and interference-resilient strategies. We further introduce CoMaTrack-Bench, the first open-source Habitat-based benchmark protocol and episode set for language-conditioned competitive EVT featuring dynamic dueling, featuring game scenarios between a tracker and adaptive opponents across diverse environments and…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

wlqcode/CoMaTrack-Bench
github

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.