TL;DR
FEATHER is a reconfigurable ML accelerator that supports seamless dataflow switching with minimal overhead, significantly improving latency and energy efficiency on FPGA and edge devices.
Contribution
It introduces a novel spatial array and multi-stage reduction network enabling flexible dataflow reconfiguration with negligible overhead.
Findings
FEATHER achieves up to 2.89x latency speedup over state-of-the-art accelerators.
It improves energy efficiency by up to 6.43x on benchmark models.
On FPGA, FEATHER outperforms Xilinx DPU and Gemmini in throughput.
Abstract
The inference of ML models composed of diverse structures, types, and sizes boils down to the execution of different dataflows (i.e. different tiling, ordering, parallelism, and shapes). Using the optimal dataflow for every layer of workload can reduce latency by up to two orders of magnitude over a suboptimal dataflow. Unfortunately, reconfiguring hardware for different dataflows involves on-chip data layout reordering and datapath reconfigurations, leading to non-trivial overhead that hinders ML accelerators from exploiting different dataflows, resulting in suboptimal performance. To address this challenge, we propose FEATHER, an innovative accelerator that leverages a novel spatial array termed Nest and a novel multi-stage reduction network called BIRRD for performing flexible data reduction with layout reordering under the hood, enabling seamless switching between optimal dataflows…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
