NativeTernary: A Self-Delimiting Binary Encoding with Unary Run-Length Hierarchy Markers for Ternary Neural Network Weights, Structured Data, and General Computing Infrastructure
Maharshi Savdhariya

TL;DR
NativeTernary introduces a compact, efficient binary encoding for ternary neural network weights, significantly reducing storage size and overhead compared to existing formats, with fast encoding and decoding on standard hardware.
Contribution
It presents a novel self-delimiting binary encoding with run-length markers specifically designed for ternary weights, enabling smaller and more resilient model storage.
Findings
Encodes ternary weights at exactly 2 bits per weight, 1.31x smaller than GGUF Q2_K.
Reduces tensor header overhead by 460x compared to GGUF.
Achieves encoding throughput of 47-69 MB/s and decoding of 35-45 MB/s.
Abstract
BitNet b1.58 (Ma et al., 2024) demonstrates that large language models can operate entirely on ternary weights {-1, 0, +1}, yet no native binary wire format exists for such models. NativeTernary closes this gap. Benchmarked against GGUF on the real BitNet b1.58 2B4T architecture (24 layers, ~170 tensors, 2B parameters): NativeTernary encodes ternary weights at exactly 2.000 bits per weight -- 1.31x smaller than GGUF Q2_K and 4.0x smaller than GGUF int8 -- while reducing boundary and framing overhead by 460x (91 bytes vs ~42KB of GGUF tensor headers). Encode throughput: 47--69 MB/s. Decode throughput: 35--45 MB/s on commodity hardware. The decoder is a 10-line stateless state machine resilient to bitstream corruption.
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
