MinT: Managed Infrastructure for Training and Serving Millions of LLMs

Mind Lab: Song Cao; Vic Cao; Andrew Chen; Kaijie Chen; Cleon Cheng; Steven Chiang; Kaixuan Fan; Hera Feng; Huan Feng; Arthur Fu; Jun Gao; Hongquan Gu; Aaron Guan; Nolan Ho; Mutian Hong; Hailee Hou; Peixuan Hua; Charles Huang; Miles Jiang; Nora Jiang; Yuyi Jiang; Qiuyu Jin; Fancy Kong; Andrew Lei; Kyrie Lei; Alexy Li; Lucian Li; Ray Li; Theo Li; Zhihui Li; Jiayi Lin; Kairus Liu; Kieran Liu; Logan Liu; Xiang Liu; Irvine Lu; Maeve Luo; Runze Lv; Pony Ma; Verity Niu; Anson Qiu; Vincent Wang; Rio Yang; Maxwell Yao; Carrie Ye; Regis Ye; Wenlin Ye; Josh Ying; Danney Zeng; Yuhan Zhan; Anya Zhang; Di Zhang; Ruijia Zhang; Sueky Zhang; Ya Zhang; Wei Zhao; Ada Zhou; Changhai Zhou; Yuhua Zhou; Xinyue Zhu; Murphy Zhuang

arXiv:2605.13779·cs.LG·May 14, 2026

MinT: Managed Infrastructure for Training and Serving Millions of LLMs

Mind Lab: Song Cao, Vic Cao, Andrew Chen, Kaijie Chen, Cleon Cheng, Steven Chiang, Kaixuan Fan, Hera Feng, Huan Feng, Arthur Fu, Jun Gao, Hongquan Gu, Aaron Guan, Nolan Ho, Mutian Hong, Hailee Hou, Peixuan Hua, Charles Huang, Miles Jiang, Nora Jiang, Yuyi Jiang, Qiuyu Jin

PDF

1 Repo

TL;DR

MinT is a scalable managed infrastructure system that efficiently trains, updates, and serves millions of LoRA-adapted policies over large base models, optimizing resource use and deployment speed.

Contribution

It introduces MinT, a novel system that manages large-scale LoRA policies across multiple axes, enabling efficient training, updating, and serving of billions of parameters.

Findings

01

Validated training and serving of models beyond 1 trillion parameters.

02

Achieved 18.3x reduction in step time for 4B dense models using adapter-only handoff.

03

Supported 1 million-scale policy catalogs with efficient live engine loading.

Abstract

We present MindLab Toolkit (MinT), a managed infrastructure system for Low-Rank Adaptation (LoRA) post-training and online serving. MinT targets a setting where many trained policies are produced over a small number of expensive base-model deployments. Instead of materializing each policy as a merged full checkpoint, MinT keeps the base model resident and moves exported LoRA adapter revisions through rollout, update, export, evaluation, serving, and rollback, hiding distributed training, serving, scheduling, and data movement behind a service interface. MinT scales this path along three axes. Scale Up extends LoRA RL to frontier-scale dense and MoE architectures, including MLA and DSA attention paths, with training and serving validated beyond 1T total parameters. Scale Down moves only the exported LoRA adapter, which can be under 1% of base-model size in rank-1 settings; adapter-only…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

mindlab-research/mindlab-toolkit
github

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.