FlexiBit: Fully Flexible Precision Bit-parallel Accelerator Architecture for Arbitrary Mixed Precision AI
Faraz Tahmasebi, Yian Wang, Benji Y.H. Huang, Hyoukjun Kwon

TL;DR
FlexiBit is a novel accelerator architecture that supports arbitrary mixed precision and formats for AI, enabling efficient hardware utilization and significantly improving performance and area efficiency for large language models like GPT-3.
Contribution
The paper introduces FlexiBit, a fully flexible, bit-parallel accelerator architecture capable of supporting arbitrary precisions and formats, overcoming hardware constraints of existing designs.
Findings
Achieves 1.66x higher performance per area on GPT-3 compared to Tensor Core-like architecture.
Attains 1.62x higher performance per area than BitFusion.
Realizes 3.9x higher performance/area than bit-serial architectures.
Abstract
Recent research has shown that large language models (LLMs) can utilize low-precision floating point (FP) quantization to deliver high efficiency while maintaining original model accuracy. In particular, recent works have shown the effectiveness of non-power-of-two precisions, such as FP6 and FP5, and diverse sensitivity to low-precision arithmetic of LLM layers, which motivates mixed precision arithmetic including non-power-of-two precisions in LLMs. Although low-precision algorithmically leads to low computational overheads, such benefits cannot be fully exploited due to hardware constraints that support a limited set of power-of-two precisions (e.g., FP8, 16, 32, and 64 in NVIDIA H100 Tensor Core). In addition, the hardware compute units are designed to support standard formats (e.g., E4M3 and E5M2 for FP8). Such practices require re-designing the hardware whenever new precision and…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsCCD and CMOS Imaging Sensors · Image Processing Techniques and Applications · Parallel Computing and Optimization Techniques
