Sign Lock-In: Randomly Initialized Weight Signs Persist and Bottleneck Sub-Bit Model Compression
Akira Sakai, Yuma Ichikawa

TL;DR
This paper investigates how sign bits in sub-bit model compression are largely inherited from initialization and introduces methods to reduce sign flips, improving compression efficiency with minimal impact on model performance.
Contribution
It formalizes sign lock-in behavior with a theoretical analysis and proposes a new initialization and regularizer to reduce sign flips in compressed models.
Findings
Most weights retain their initial signs during training.
Sign flips are rare and primarily occur near zero boundary crossings.
Proposed methods reduce flip rate to about 10^{-3} with minimal perplexity increase.
Abstract
Sub-bit model compression seeks storage below one bit per weight; as magnitudes are aggressively compressed, the sign bit becomes a fixed-cost bottleneck. Across Transformers, CNNs, and MLPs, learned sign matrices resist low-rank approximation and are spectrally indistinguishable from an i.i.d. Rademacher baseline. Despite this apparent randomness, most weights retain their initialization signs; flips primarily occur via rare near-zero boundary crossings, suggesting that sign-pattern randomness is largely inherited from initialization. We formalize this behavior with sign lock-in theory, a stopping-time analysis of sign flips under SGD noise. Under bounded updates and a rare re-entry condition into a small neighborhood around zero, the number of effective sign flips exhibits a geometric tail. Building on this mechanism, we introduce a gap-based initialization and a lightweight…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsStochastic Gradient Optimization Techniques · Generative Adversarial Networks and Image Synthesis · Model Reduction and Neural Networks
