TL;DR
VLA Foundry is an open-source framework that unifies training for vision-language-action models, supporting from-scratch and pretrained backbones, and demonstrating competitive performance on manipulation tasks.
Contribution
It introduces a shared training stack for VLA models, enabling end-to-end training and evaluation, with released code and models for community use.
Findings
From-scratch trained model matches prior closed-source performance.
Using Qwen3-VL backbone improves multi-task manipulation policy.
Framework simplifies training and evaluation of VLA models.
Abstract
We present VLA Foundry, an open-source framework that unifies LLM, VLM, and VLA training in a single codebase. Most open-source VLA efforts specialize on the action training stage, often stitching together incompatible pretraining pipelines. VLA Foundry instead provides a shared training stack with end-to-end control, from language pretraining to action-expert fine-tuning. VLA Foundry supports both from-scratch training and pretrained backbones from Hugging Face. To demonstrate the utility of our framework, we train and release two types of models: the first trained fully from scratch through our LLM-->VLM-->VLA pipeline and the second built on the pretrained Qwen3-VL backbone. We evaluate closed-loop policy performance of both models on LBM Eval, an open-data, open-source simulator. We also contribute usability improvements to the simulator and the STEP analysis tools for easier public…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
- 🤗TRI-ML/Foundry-LLM-1.2B-1Tmodel· 26 dl· ♡ 226 dl♡ 2
- 🤗TRI-ML/Foundry-LLM-1.2B-800Bmodel· 11 dl· ♡ 111 dl♡ 1
- 🤗TRI-ML/Foundry-VLM-1.3B-165Mmodel· 15 dl· ♡ 115 dl♡ 1
- 🤗TRI-ML/Foundry-VLM-1.3B-200Mmodel· 19 dl19 dl
- 🤗TRI-ML/Foundry-Qwen3VLA-2.1Bmodel· 91 dl· ♡ 191 dl♡ 1
- 🤗TRI-ML/Foundry-VLA-1.7B-fullmodel· 13 dl13 dl
- 🤗TRI-ML/Foundry-VLA-1.7B-simmodel· 37 dl· ♡ 237 dl♡ 2
- 🤗TRI-ML/Foundry-VLA-1.7B-realmodel· 24 dl· ♡ 224 dl♡ 2
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
