TL;DR
This paper introduces a novel multi-Boolean architecture for large language models that allows direct finetuning in the Boolean domain, significantly reducing complexity and outperforming existing low-bit methods.
Contribution
It proposes a new framework for LLMs using multi-kernel Boolean parameters, enabling direct Boolean finetuning without latent weights, enhancing efficiency and capacity.
Findings
Outperforms recent ultra low-bit quantization techniques
Enables direct finetuning in the Boolean domain
Reduces complexity during finetuning and inference
Abstract
Weight binarization has emerged as a promising strategy to reduce the complexity of large language models (LLMs). Existing approaches fall into post-training binarization, which is simple but causes severe performance loss, and training-aware methods, which depend on full-precision latent weights, adding complexity and limiting efficiency. We propose a novel framework that represents LLMs with multi-kernel Boolean parameters and, for the first time, enables direct finetuning LMMs in the Boolean domain, eliminating the need for latent weights. This enhances representational capacity and dramatically reduces complexity during both finetuning and inference. Extensive experiments across diverse LLMs show our method outperforms recent ultra low-bit quantization and binarization techniques.
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
