# Galaxea Open-World Dataset and G0 Dual-System VLA Model

**Authors:** Tao Jiang, Tianyuan Yuan, Yicheng Liu, Chenhao Lu, Jianning Cui, Xiao Liu, Shuiqi Cheng, Jiyang Gao, Huazhe Xu, Hang Zhao

arXiv: 2509.00576 · 2025-09-03

## TL;DR

This paper introduces the Galaxea Open-World Dataset, a large-scale collection of robot behaviors in real environments, and G0, a dual-system framework combining multimodal planning and fine-grained execution, trained through a multi-stage curriculum.

## Contribution

It provides a new diverse dataset and a novel dual-system model with a multi-stage training process for improved robot manipulation tasks.

## Key findings

- Single-embodiment pre-training is crucial for performance.
- The dataset enables effective training and evaluation.
- G0 outperforms baseline models in benchmarks.

## Abstract

We present Galaxea Open-World Dataset, a large-scale, diverse collection of robot behaviors recorded in authentic human living and working environments. All demonstrations are gathered using a consistent robotic embodiment, paired with precise subtask-level language annotations to facilitate both training and evaluation. Building on this dataset, we introduce G0, a dual-system framework that couples a Vision-Language Model (VLM) for multimodal planning with a Vision-Language-Action (VLA) model for fine-grained execution. G0 is trained using a three-stage curriculum: cross-embodiment pre-training, single-embodiment pre-training, and task-specific post-training. A comprehensive benchmark spanning tabletop manipulation, few-shot learning, and long-horizon mobile manipulation, demonstrates the effectiveness of our approach. In particular, we find that the single-embodiment pre-training stage, together with the Galaxea Open-World Dataset, plays a critical role in achieving strong performance.

## Full text

_Full body text omitted from this summary view._ Fetch the complete paper as Markdown: https://tomesphere.com/paper/2509.00576/full.md

## Figures

19 figures with captions in the complete paper: https://tomesphere.com/paper/2509.00576/full.md

## References

30 references — full list in the complete paper: https://tomesphere.com/paper/2509.00576/full.md

---
Source: https://tomesphere.com/paper/2509.00576