Less is More: Efficient Weight Farcasting with 1-Layer Neural Network

Xiao Shou; Debarun Bhattacharjya; Yanna Ding; Chen Zhao; Rui Li; and; Jianxi Gao

arXiv:2505.02714·cs.LG·May 6, 2025

Less is More: Efficient Weight Farcasting with 1-Layer Neural Network

Xiao Shou, Debarun Bhattacharjya, Yanna Ding, Chen Zhao, Rui Li, and, Jianxi Gao

PDF

Open Access

TL;DR

This paper presents a novel, efficient weight forecasting framework for deep neural networks that uses only initial and final weights, improving accuracy and reducing computational costs.

Contribution

The study introduces a new long-term time series forecasting approach for weights, along with a tailored regularizer, diverging from traditional training efficiency techniques.

Findings

01

Outperforms existing methods in forecasting accuracy

02

Reduces computational overhead during training

03

Effective on synthetic and real-world models like DistilBERT

Abstract

Addressing the computational challenges inherent in training large-scale deep neural networks remains a critical endeavor in contemporary machine learning research. While previous efforts have focused on enhancing training efficiency through techniques such as gradient descent with momentum, learning rate scheduling, and weight regularization, the demand for further innovation continues to burgeon as model sizes keep expanding. In this study, we introduce a novel framework which diverges from conventional approaches by leveraging long-term time series forecasting techniques. Our method capitalizes solely on initial and final weight values, offering a streamlined alternative for complex model architectures. We also introduce a novel regularizer that is tailored to enhance the forecasting performance of our approach. Empirical evaluations conducted on synthetic weight sequences and…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsVehicle License Plate Recognition

MethodsRefunds@Expedia|||How do I get a full refund from Expedia? · Attention Is All You Need · Linear Warmup With Linear Decay · Dropout · Layer Normalization · Attention Dropout · Softmax · Residual Connection · WordPiece · Linear Layer