UbiMoE: A Ubiquitous Mixture-of-Experts Vision Transformer Accelerator With Hybrid Computation Pattern on FPGA
Jiale Dong, Wenqi Lou, Zhendong Zheng, Yunji Qin, Lei Gong, Chao Wang,, Xuehai Zhou

TL;DR
UbiMoE is an FPGA-based accelerator designed specifically for Mixture-of-Experts Vision Transformers, optimizing performance and resource use through tailored kernels and a heuristic hardware tuning algorithm, significantly outperforming existing designs.
Contribution
The paper introduces UbiMoE, a novel FPGA accelerator for MoE-ViT that employs specialized kernels and a heuristic search for hardware tuning, achieving superior throughput and energy efficiency.
Findings
Achieves 1.34x and 3.35x throughput improvements on two FPGA platforms.
Enhances energy efficiency by 1.75x and 1.54x compared to state-of-the-art.
Develops a latency-optimized streaming attention kernel and a resource-efficient linear kernel.
Abstract
Compared to traditional Vision Transformers (ViT), Mixture-of-Experts Vision Transformers (MoE-ViT) are introduced to scale model size without a proportional increase in computational complexity, making them a new research focus. Given the high performance and reconfigurability, FPGA-based accelerators for MoE-ViT emerge, delivering substantial gains over general-purpose processors. However, existing accelerators often fall short of fully exploring the design space, leading to suboptimal trade-offs between resource utilization and performance. To overcome this problem, we introduce UbiMoE, a novel end-to-end FPGA accelerator tailored for MoE-ViT. Leveraging the unique computational and memory access patterns of MoE-ViTs, we develop a latency-optimized streaming attention kernel and a resource-efficient reusable linear kernel, effectively balancing performance and resource consumption.…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsCCD and CMOS Imaging Sensors · Image Processing Techniques and Applications · Advanced Memory and Neural Computing
