A Multi-level Compiler Backend for Accelerated Micro-kernels Targeting   RISC-V ISA Extensions

Alexandre Lopoukhine; Federico Ficarelli; Christos Vasiladiotis; Anton; Lydike; Josse Van Delm; Alban Dutilleul; Luca Benini; Marian Verhelst; Tobias; Grosser

arXiv:2502.04063·cs.PL·February 7, 2025

A Multi-level Compiler Backend for Accelerated Micro-kernels Targeting RISC-V ISA Extensions

Alexandre Lopoukhine, Federico Ficarelli, Christos Vasiladiotis, Anton, Lydike, Josse Van Delm, Alban Dutilleul, Luca Benini, Marian Verhelst, Tobias, Grosser

PDF

TL;DR

This paper presents a multi-level compiler backend for RISC-V accelerators that exploits hardware features at various abstraction levels, achieving high utilization of DNN micro-kernels and demonstrating the benefits of structured IRs.

Contribution

It introduces a novel multi-level backend design that leverages structured IRs and hardware knowledge to optimize code generation for RISC-V accelerators, surpassing traditional compiler limitations.

Findings

01

Achieved up to 90% FPU utilization in DNN kernels

02

Demonstrated effective use of hardware loops and streaming registers

03

Showed benefits of breaking the traditional compiler backend hourglass model

Abstract

High-performance micro-kernels must fully exploit today's diverse and specialized hardware to deliver peak performance to DNNs. While higher-level optimizations for DNNs are offered by numerous compilers (e.g., MLIR, TVM, OpenXLA), performance-critical micro-kernels are left to specialized code generators or handwritten assembly. Even though widely-adopted compilers (e.g., LLVM, GCC) offer tuned backends, their CPU-focused input abstraction, unstructured IR, and general-purpose best-effort design inhibit tailored code generation for innovative hardware. We think it is time to widen the classical hourglass backend and embrace progressive lowering across a diverse set of structured abstractions to bring domain-specific code generation to compiler backends. We demonstrate this concept by implementing a custom backend for a RISC-V-based accelerator with hardware loops and streaming…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.