DHFP-PE: Dual-Precision Hybrid Floating Point Processing Element for AI Acceleration
Shubham Kumar, and Vijay Pratap Sharma, and Vaibhav Neema, and Santosh Kumar Vishvakarma

TL;DR
This paper introduces a dual-precision floating-point MAC unit optimized for AI workloads, combining innovative bit-partitioning with high efficiency in area and power for low-power edge computing.
Contribution
A novel dual-precision floating-point MAC architecture supporting FP8 and FP4 formats with a bit-partitioning technique for high hardware utilization.
Findings
Achieves 1.94 GHz frequency in 28 nm technology.
Reduces area by up to 60.4% and power by 86.6% compared to existing designs.
Supports energy-efficient AI inference with high throughput.
Abstract
The rapid adoption of low-precision arithmetic in artificial intelligence and edge computing has created a strong demand for energy-efficient and flexible floating-point multiply-accumulate (MAC) units. This paper presents a dual-precision floating-point MAC processing element supporting FP8 (E4M3, E5M2) and FP4 (2 x E2M1, 2 x E1M2) formats, specifically optimized for low-power and high-throughput AI workloads. The proposed architecture employs a novel bit-partitioning technique that enables a single 4-bit unit multiplier to operate either as a standard 4 x 4 multiplier for FP8 or as two parallel 2 x 2 multipliers for 2-bit operands, achieving maximum hardware utilization without duplicating logic. Implemented in 28 nm technology, the proposed PE achieves an operating frequency of 1.94 GHz with an area of 0.00396 mm^2 and power consumption of 2.13 mW, resulting in up to 60.4% area…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
