H-FA: A Hybrid Floating-Point and Logarithmic Approach to Hardware Accelerated FlashAttention

Kosmas Alexandridis; Giorgos Dimitrakopoulos

arXiv:2511.00295·cs.AR·February 10, 2026

H-FA: A Hybrid Floating-Point and Logarithmic Approach to Hardware Accelerated FlashAttention

Kosmas Alexandridis, Giorgos Dimitrakopoulos

PDF

Open Access

TL;DR

H-FA introduces a hybrid floating-point and logarithmic approach to hardware-accelerated FlashAttention, significantly reducing area and power consumption while maintaining performance in transformer attention computations.

Contribution

It proposes a novel hybrid computation method combining floating-point and fixed-point logarithmic representations for efficient hardware implementation of FlashAttention.

Findings

01

Achieves 26.5% area reduction in hardware

02

Reduces power consumption by 23.4%

03

Maintains performance comparable to existing architectures

Abstract

Transformers have significantly advanced AI and machine learning through their powerful attention mechanism. However, computing attention on long sequences can become a computational bottleneck. FlashAttention mitigates this by fusing the softmax and matrix operations into a tiled computation pattern that decouples performance from sequence length. Though designed for GPUs, its simplicity also makes it well suited for direct hardware acceleration. To improve hardware implementation, we compute FlashAttention using a mixture of floating-point and fixed-point logarithm domain representations. Floating-point is used to compute attention scores from query and key matrices, while logarithmic computation simplifies the fused computation of softmax normalization and the multiplication with the value matrix. This transformation, called H-FA, replaces vector-wide floating-point multiplication…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsNumerical Methods and Algorithms · Parallel Computing and Optimization Techniques · Advanced Data Storage Technologies