TL;DR
NaviMaster introduces a unified reinforcement learning framework that combines GUI and embodied navigation tasks, leveraging a shared data collection and reward strategy to improve generalization and performance across benchmarks.
Contribution
It is the first to unify GUI and embodied navigation tasks within a single framework using a common MDP formulation and training pipeline.
Findings
Outperforms state-of-the-art in GUI navigation and embodied tasks.
Effective data mixing and reward design improve learning efficiency.
Unified training strategy enhances generalization across benchmarks.
Abstract
Recent advances in Graphical User Interface (GUI) and embodied navigation have driven progress, yet these domains have largely evolved in isolation, with disparate datasets and training paradigms. In this paper, we observe that both tasks can be formulated as Markov Decision Processes (MDP), suggesting a foundational principle for their unification. Hence, we present NaviMaster, the first unified agent capable of unifying GUI navigation and embodied navigation within a single framework. Specifically, NaviMaster (i) proposes a visual-target trajectory collection pipeline that generates trajectories for both GUI and embodied tasks using a single formulation. (ii) employs a unified reinforcement learning framework on the mix data to improve generalization. (iii) designs a novel distance-aware reward to ensure efficient learning from the trajectories. Through extensive experiments on…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
