A Multi-level Compiler Backend for Accelerated Micro-kernels Targeting RISC-V ISA Extensions
Alexandre Lopoukhine, Federico Ficarelli, Christos Vasiladiotis, Anton, Lydike, Josse Van Delm, Alban Dutilleul, Luca Benini, Marian Verhelst, Tobias, Grosser

TL;DR
This paper presents a multi-level compiler backend for RISC-V accelerators that exploits hardware features at various abstraction levels, achieving high utilization of DNN micro-kernels and demonstrating the benefits of structured IRs.
Contribution
It introduces a novel multi-level backend design that leverages structured IRs and hardware knowledge to optimize code generation for RISC-V accelerators, surpassing traditional compiler limitations.
Findings
Achieved up to 90% FPU utilization in DNN kernels
Demonstrated effective use of hardware loops and streaming registers
Showed benefits of breaking the traditional compiler backend hourglass model
Abstract
High-performance micro-kernels must fully exploit today's diverse and specialized hardware to deliver peak performance to DNNs. While higher-level optimizations for DNNs are offered by numerous compilers (e.g., MLIR, TVM, OpenXLA), performance-critical micro-kernels are left to specialized code generators or handwritten assembly. Even though widely-adopted compilers (e.g., LLVM, GCC) offer tuned backends, their CPU-focused input abstraction, unstructured IR, and general-purpose best-effort design inhibit tailored code generation for innovative hardware. We think it is time to widen the classical hourglass backend and embrace progressive lowering across a diverse set of structured abstractions to bring domain-specific code generation to compiler backends. We demonstrate this concept by implementing a custom backend for a RISC-V-based accelerator with hardware loops and streaming…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
