One-for-All Model Initialization with Frequency-Domain Knowledge
Jianlu Shen, Fu Feng, Yucheng Xie, Jiaqi Lv, Xin Geng

TL;DR
This paper introduces FRONT, a frequency-domain method that extracts and transfers a model's foundational knowledge from low-frequency weight components, enabling flexible, training-free initialization of models across various scales with improved efficiency.
Contribution
The paper reveals that a model's core knowledge is encoded in low-frequency weights and proposes FRONT, a DCT-based framework for efficient, scalable, and training-free model initialization.
Findings
FRONT achieves state-of-the-art transfer performance.
Accelerates convergence by up to 15 times in vision tasks.
Reduces training FLOPs by 40.5% in language tasks.
Abstract
Transferring knowledge by fine-tuning large-scale pre-trained networks has become a standard paradigm for downstream tasks, yet the knowledge of a pre-trained model is tightly coupled with monolithic architecture, which restricts flexible reuse across models of varying scales. In response to this challenge, recent approaches typically resort to either parameter selection, which fails to capture the interdependent structure of this knowledge, or parameter prediction using generative models that depend on impractical access to large network collections. In this paper, we empirically demonstrate that a model's foundational, task-agnostic knowledge, its "learngene", is encoded within the low-frequency components of its weights, and can be efficiently inherited by downstream models. Based on this insight, we propose FRONT (FRequency dOmain kNowledge Transfer), a novel framework that uses the…
Peer Reviews
Decision·Submitted to ICLR 2026
1.The motivation of the paper is clear, and the writing is generally well-structured. 2.The paper provides evidence that task-agnostic knowledge resides in a model’s low-frequency components—an intuitively plausible and insightful finding. It also instantiates the learngene concept as low-frequency representations that can be readily extracted from the model. 3.The experiments are generally thorough and demonstrate the effectiveness of the proposed method.
Please refer to the Questions section below.
- The proposed method extracts a low-frequency learngene and uses padding or truncating to initialize a variety of models across ViT and CNN. It generalizes well across different depths and width, with minimal computation needed. - The proposed method speeds up convergence and cuts compute versus scratch or learned-transform baselines.
- The motivation behind the design is unclear. Why stacking weights across layers and then conduct 3D DCT, what if do this process on 2D weights and then use some selective process to get the learngene? - The presentation of the experimental results is not that clear, and the experimental settings are concernable. For instance, in table 1, it’s unclear to see what’s the base model in each block is used for initialization? And the results reported in the way of 10-epoch accuracy is not optimal.
1. The concrete instantiation of learngene as low-frequency components is intuitive and creative, with convincing evidence in Figure 1 demonstrating stability of low-frequency components across models and tasks. 2. FRONT's zero-cost extraction and flexible padding/truncation mechanism make it substantially more practical than training-based methods like GHN-3 and WAVE. 3. The evaluation spans ViT/ResNet/MLP/CNN architectures, multiple datasets, both vision and language domains, and systematic
1. The frequency ratio r varies by model size (2.2M/3.2M/13.0M for Ti/S/B in Table 1) without principled justification, suggesting $r$ is model-size dependent. This systematic issue is not explored, and hyperparameters like decay rates $γ_d$ in Eq. 6 lack principled selection guidelines. 2. When comparing with training-based methods (WAVE/TLEG), FRONT+ also requires 150 epochs of training, so these should be evaluated separately from FRONT's direct extraction. 3. In Table 3, FRONT occasionally
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsDomain Adaptation and Few-Shot Learning · Generative Adversarial Networks and Image Synthesis · Multimodal Machine Learning Applications
