VeOmni: Scaling Any Modality Model Training with Model-Centric Distributed Recipe Zoo

Qianli Ma; Yaowei Zheng; Zhelun Shi; Zhongkai Zhao; Bin Jia; Ziyue Huang; Zhiqi Lin; Youjie Li; Jiacheng Yang; Yanghua Peng; Zhi Zhang; Xin Liu

arXiv:2508.02317·cs.CL·August 8, 2025

VeOmni: Scaling Any Modality Model Training with Model-Centric Distributed Recipe Zoo

Qianli Ma, Yaowei Zheng, Zhelun Shi, Zhongkai Zhao, Bin Jia, Ziyue Huang, Zhiqi Lin, Youjie Li, Jiacheng Yang, Yanghua Peng, Zhi Zhang, Xin Liu

PDF

6 Models

TL;DR

VeOmni is a modular training framework that enables efficient, scalable training of omni-modal large language models by decoupling communication from computation and supporting flexible configurations.

Contribution

It introduces model-centric distributed recipes and a flexible interface, significantly improving scalability and ease of integrating new modalities in omni-modal LLM training.

Findings

01

Achieved 2,800 tokens/sec/GPU throughput for a 30B parameter model.

02

Scaled training to 160K context lengths using 3D parallelism on 128 GPUs.

03

Demonstrated superior efficiency and scalability in omni-modal LLM training.

Abstract

Recent advances in large language models (LLMs) have driven impressive progress in omni-modal understanding and generation. However, training omni-modal LLMs remains a significant challenge due to the heterogeneous model architectures required to process diverse modalities, necessitating sophisticated system design for efficient large-scale training. Existing frameworks typically entangle model definition with parallel logic, incurring limited scalability and substantial engineering overhead for end-to-end omni-modal training. We present VeOmni, a modular and efficient training framework to accelerate the development of omni-modal LLMs. VeOmni introduces model-centric distributed recipes that decouples communication from computation, enabling efficient 3D parallelism on omni-modal LLMs. VeOmni also features a flexible configuration interface supporting seamless integration of new…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Models

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.