DHFP-PE: Dual-Precision Hybrid Floating Point Processing Element for AI Acceleration

Shubham Kumar; and Vijay Pratap Sharma; and Vaibhav Neema; and Santosh Kumar Vishvakarma

arXiv:2604.04507·cs.AR·April 10, 2026

DHFP-PE: Dual-Precision Hybrid Floating Point Processing Element for AI Acceleration

Shubham Kumar, and Vijay Pratap Sharma, and Vaibhav Neema, and Santosh Kumar Vishvakarma

PDF

TL;DR

This paper introduces a dual-precision floating-point MAC unit optimized for AI workloads, combining innovative bit-partitioning with high efficiency in area and power for low-power edge computing.

Contribution

A novel dual-precision floating-point MAC architecture supporting FP8 and FP4 formats with a bit-partitioning technique for high hardware utilization.

Findings

01

Achieves 1.94 GHz frequency in 28 nm technology.

02

Reduces area by up to 60.4% and power by 86.6% compared to existing designs.

03

Supports energy-efficient AI inference with high throughput.

Abstract

The rapid adoption of low-precision arithmetic in artificial intelligence and edge computing has created a strong demand for energy-efficient and flexible floating-point multiply-accumulate (MAC) units. This paper presents a dual-precision floating-point MAC processing element supporting FP8 (E4M3, E5M2) and FP4 (2 x E2M1, 2 x E1M2) formats, specifically optimized for low-power and high-throughput AI workloads. The proposed architecture employs a novel bit-partitioning technique that enables a single 4-bit unit multiplier to operate either as a standard 4 x 4 multiplier for FP8 or as two parallel 2 x 2 multipliers for 2-bit operands, achieving maximum hardware utilization without duplicating logic. Implemented in 28 nm technology, the proposed PE achieves an operating frequency of 1.94 GHz with an area of 0.00396 mm^2 and power consumption of 2.13 mW, resulting in up to 60.4% area…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.