Refining Datapath for Microscaling ViTs
Can Xiao, Jianyi Cheng, Aaron Zhao

TL;DR
This paper introduces a novel FPGA-based ViT accelerator utilizing the MXInt data format, enabling all operations to be mapped onto hardware for improved efficiency and significant speedups with minimal accuracy loss.
Contribution
It presents the first ViT accelerator that maps all operations onto FPGA using the MXInt format, optimizing for accuracy and hardware performance.
Findings
Achieves at least 93× speedup over Float16 implementations.
Maintains within 1% accuracy loss.
Demonstrates high area efficiency with MXInt quantization.
Abstract
Vision Transformers (ViTs) leverage the transformer architecture to effectively capture global context, demonstrating strong performance in computer vision tasks. A major challenge in ViT hardware acceleration is that the model family contains complex arithmetic operations that are sensitive to model accuracy, such as the Softmax and LayerNorm operations, which cannot be mapped onto efficient hardware with low precision. Existing methods only exploit parallelism in the matrix multiplication operations of the model on hardware and keep these complex operations on the CPU. This results in suboptimal performance due to the communication overhead between the CPU and accelerator. Can new data formats solve this problem? In this work, we present the first ViT accelerator that maps all operations of the ViT models onto FPGAs. We exploit a new arithmetic format named Microscaling Integer…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsCCD and CMOS Imaging Sensors · Advanced Image and Video Retrieval Techniques · Advanced Neural Network Applications
