Chain-of-Models Pre-Training: Rethinking Training Acceleration of Vision Foundation Models

Jiawei Fan; Shigeng Wang; Chao Li; Xiaolong Liu; Anbang Yao

arXiv:2604.12391·cs.CV·April 15, 2026

Chain-of-Models Pre-Training: Rethinking Training Acceleration of Vision Foundation Models

Jiawei Fan, Shigeng Wang, Chao Li, Xiaolong Liu, Anbang Yao

PDF

TL;DR

CoM-PT introduces a model family-level pre-training method that accelerates vision foundation model training by sequential inverse knowledge transfer, achieving significant efficiency gains without performance loss.

Contribution

It proposes a novel training acceleration approach for vision models that scales efficiently with model family size, unlike traditional individual model pre-training methods.

Findings

01

Achieves up to 72% reduction in computational complexity.

02

Demonstrates acceleration ratios up to 7.09X across model families.

03

Validates effectiveness across 45 datasets for zero-shot and fine-tuning tasks.

Abstract

In this paper, we present Chain-of-Models Pre-Training (CoM-PT), a novel performance-lossless training acceleration method for vision foundation models (VFMs). This approach fundamentally differs from existing acceleration methods in its core motivation: rather than optimizing each model individually, CoM-PT is designed to accelerate the training pipeline at the model family level, scaling efficiently as the model family expands. Specifically, CoM-PT establishes a pre-training sequence for the model family, arranged in ascending order of model size, called model chain. In this chain, only the smallest model undergoes standard individual pre-training, while the other models are efficiently trained through sequential inverse knowledge transfer from their smaller predecessors by jointly reusing the knowledge in the parameter space and the feature space. As a result, CoM-PT enables all…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.