Constraint-based Pre-training: From Structured Constraints to Scalable Model Initialization

Fu Feng; Yucheng Xie; Ruixiao Shi; Jing Wang; Xin Geng

arXiv:2604.14769·cs.LG·April 17, 2026

Constraint-based Pre-training: From Structured Constraints to Scalable Model Initialization

Fu Feng, Yucheng Xie, Ruixiao Shi, Jing Wang, Xin Geng

PDF

TL;DR

This paper introduces a constraint-based pre-training method called WeiT that enables flexible, scalable model initialization across different sizes and architectures, improving efficiency and performance.

Contribution

It proposes a novel structured constraint paradigm that disentangles size-agnostic knowledge from size-specific adaptation, facilitating multi-scale model initialization.

Findings

01

WeiT achieves state-of-the-art results in diverse perception and embodied tasks.

02

The method enables faster convergence and better performance across various model scales.

03

It generalizes well to both Transformer and Convolution architectures.

Abstract

The pre-training and fine-tuning paradigm has become the dominant approach for model adaptation. However, conventional pre-training typically yields models at a fixed scale, whereas practical deployment often requires models of varying sizes, exposing its limitations when target model scales differ from those used during pre-training. To address this, we propose an innovative constraint-based pre-training paradigm that imposes structured constraints during pre-training to disentangle size-agnostic knowledge into reusable weight templates, while assigning size-specific adaptation to lightweight weight scalers, thereby reformulating variable-sized model initialization as a multi-task adaptation problem. Within this paradigm, we further introduce WeiT, which employs Kronecker-based constraints to regularize the pre-training process. Specifically, model parameters are represented as…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.