Hybrid Gated Flow (HGF): Stabilizing 1.58-bit LLMs via Selective Low-Rank Correction
David Alejandro Trejo Pizzo

TL;DR
This paper introduces Hybrid Gated Flow (HGF), a novel architecture combining a low-rank FP16 correction with a ternary backbone to stabilize and improve 1.58-bit quantized LLMs, significantly reducing the quality gap with minimal memory overhead.
Contribution
HGF is a dual-stream architecture that effectively stabilizes 1.58-bit LLMs and recovers much of their lost accuracy with minimal additional memory, demonstrating scalability to large models.
Findings
HGF achieves a validation loss of 0.9306, outperforming BitNet's 1.0294.
HGF recovers approximately 55% of the quality gap between ternary and FP16 models.
Quantization acts as a form of structural regularization, enhancing training stability.
Abstract
The deployment of Large Language Models (LLMs) on edge devices is fundamentally constrained by the "Memory Wall" -- a hardware limitation where memory bandwidth, not compute, becomes the bottleneck. Recent 1.58-bit quantization techniques (e.g., BitNet b1.58) dramatically reduce memory footprint but typically incur a perplexity degradation of 20-25% compared to FP16 baselines. In this work, we introduce Hybrid Gated Flow (HGF), a dual-stream architecture that couples a 1.58-bit ternary backbone with a learnable, low-rank FP16 correction path controlled by adaptive gates. Through extensive experiments on the TinyStories dataset across two training regimes (2500 and 3500 steps), we demonstrate that HGF 5.4 achieves a validation loss of 0.9306 compared to BitNet's 1.0294, recovering approximately 55% of the quality gap between pure ternary quantization and the FP16 baseline (0.8490).…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsParallel Computing and Optimization Techniques · Advanced Neural Network Applications · Natural Language Processing Techniques
