Hybrid Gated Flow (HGF): Stabilizing 1.58-bit LLMs via Selective Low-Rank Correction

David Alejandro Trejo Pizzo

arXiv:2602.05269·cs.LG·February 6, 2026

Hybrid Gated Flow (HGF): Stabilizing 1.58-bit LLMs via Selective Low-Rank Correction

David Alejandro Trejo Pizzo

PDF

Open Access 1 Models

TL;DR

This paper introduces Hybrid Gated Flow (HGF), a novel architecture combining a low-rank FP16 correction with a ternary backbone to stabilize and improve 1.58-bit quantized LLMs, significantly reducing the quality gap with minimal memory overhead.

Contribution

HGF is a dual-stream architecture that effectively stabilizes 1.58-bit LLMs and recovers much of their lost accuracy with minimal additional memory, demonstrating scalability to large models.

Findings

01

HGF achieves a validation loss of 0.9306, outperforming BitNet's 1.0294.

02

HGF recovers approximately 55% of the quality gap between ternary and FP16 models.

03

Quantization acts as a form of structural regularization, enhancing training stability.

Abstract

The deployment of Large Language Models (LLMs) on edge devices is fundamentally constrained by the "Memory Wall" -- a hardware limitation where memory bandwidth, not compute, becomes the bottleneck. Recent 1.58-bit quantization techniques (e.g., BitNet b1.58) dramatically reduce memory footprint but typically incur a perplexity degradation of 20-25% compared to FP16 baselines. In this work, we introduce Hybrid Gated Flow (HGF), a dual-stream architecture that couples a 1.58-bit ternary backbone with a learnable, low-rank FP16 correction path controlled by adaptive gates. Through extensive experiments on the TinyStories dataset across two training regimes (2500 and 3500 steps), we demonstrate that HGF 5.4 achieves a validation loss of 0.9306 compared to BitNet's 1.0294, recovering approximately 55% of the quality gap between pure ternary quantization and the FP16 baseline (0.8490).…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Models

🤗
OpenCoresAI/HGF-60M-TinyStories
model· 5 dl· ♡ 1
5 dl♡ 1

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsParallel Computing and Optimization Techniques · Advanced Neural Network Applications · Natural Language Processing Techniques