TL;DR
VeOmni is a modular training framework that enables efficient, scalable training of omni-modal large language models by decoupling communication from computation and supporting flexible configurations.
Contribution
It introduces model-centric distributed recipes and a flexible interface, significantly improving scalability and ease of integrating new modalities in omni-modal LLM training.
Findings
Achieved 2,800 tokens/sec/GPU throughput for a 30B parameter model.
Scaled training to 160K context lengths using 3D parallelism on 128 GPUs.
Demonstrated superior efficiency and scalability in omni-modal LLM training.
Abstract
Recent advances in large language models (LLMs) have driven impressive progress in omni-modal understanding and generation. However, training omni-modal LLMs remains a significant challenge due to the heterogeneous model architectures required to process diverse modalities, necessitating sophisticated system design for efficient large-scale training. Existing frameworks typically entangle model definition with parallel logic, incurring limited scalability and substantial engineering overhead for end-to-end omni-modal training. We present VeOmni, a modular and efficient training framework to accelerate the development of omni-modal LLMs. VeOmni introduces model-centric distributed recipes that decouples communication from computation, enabling efficient 3D parallelism on omni-modal LLMs. VeOmni also features a flexible configuration interface supporting seamless integration of new…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
- 🤗robbyant/lingbot-vla-4bmodel· 251 dl· ♡ 28251 dl♡ 28
- 🤗robbyant/lingbot-vla-4b-depthmodel· 63 dl· ♡ 1863 dl♡ 18
- 🤗robbyant/lingbot-vla-4b-posttrain-robotwinmodel· 37 dl· ♡ 237 dl♡ 2
- 🤗robbyant/lingbot-vla-4b-depth-posttrain-robotwinmodel· 20 dl· ♡ 220 dl♡ 2
- 🤗RLinf/RLinf-lingbot-vla-4bmodel· 11 dl11 dl
- 🤗bazaar-research/lingbot-vlamodel
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
