UniFormer: Unified and Efficient Transformer for Reasoning Across General and Custom Computing
Zhuoheng Ran, Chong Wu, Renjie Xu, Maolin Che, and Hong Yan

TL;DR
UniFormer is a novel Transformer architecture designed to perform efficiently across both general-purpose GPUs and customised hardware like FPGAs, achieving high accuracy and low latency while simplifying deployment.
Contribution
It introduces the first unified Transformer architecture optimized for both general and customised computing platforms, enhancing transferability and efficiency.
Findings
Achieves SOTA accuracy and latency on GPUs.
Demonstrates strong adaptability on FPGAs.
Enables higher parallelism and compute-storage fusion.
Abstract
The success of neural networks such as convolutional neural networks (CNNs) has been largely attributed to their effective and widespread deployment on customised computing platforms, including field-programmable gate arrays (FPGAs) and application-specific integrated circuits (ASICs). In the current era, Transformer-based architectures underpin the majority of state-of-the-art (SOTA) larger models that are also increasingly deployed on customised computing hardware for low-power and real-time applications. However, the fundamentally different parallel computation paradigms between general-purpose and customised computing often lead to compromises in model transfer and deployability, which typically come at the cost of complexity, efficiency or accuracy. Moreover, many cross-platform optimisation principles have also remained underexplored in existing studies. This paper introduces…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsAdvanced Neural Network Applications · Embedded Systems Design Techniques · Advanced Memory and Neural Computing
