Occamy: A 432-Core 28.1 DP-GFLOP/s/W 83% FPU Utilization Dual-Chiplet,   Dual-HBM2E RISC-V-based Accelerator for Stencil and Sparse Linear Algebra   Computations with 8-to-64-bit Floating-Point Support in 12nm FinFET

Gianna Paulin; Paul Scheffler; Thomas Benz; Matheus Cavalcante; Tim; Fischer; Manuel Eggimann; Yichao Zhang; Nils Wistoff; Luca Bertaccini; Luca; Colagrande; Gianmarco Ottavi; Frank K. G\"urkaynak; Davide Rossi; Luca Benini

arXiv:2406.15068·cs.AR·June 24, 2024

Occamy: A 432-Core 28.1 DP-GFLOP/s/W 83% FPU Utilization Dual-Chiplet, Dual-HBM2E RISC-V-based Accelerator for Stencil and Sparse Linear Algebra Computations with 8-to-64-bit Floating-Point Support in 12nm FinFET

Gianna Paulin, Paul Scheffler, Thomas Benz, Matheus Cavalcante, Tim, Fischer, Manuel Eggimann, Yichao Zhang, Nils Wistoff, Luca Bertaccini, Luca, Colagrande, Gianmarco Ottavi, Frank K. G\"urkaynak, Davide Rossi, Luca Benini

PDF

TL;DR

Occamy is a highly efficient 432-core RISC-V accelerator system optimized for sparse linear algebra and stencil computations, achieving high utilization and performance with flexible floating-point support in a 12nm process.

Contribution

This paper introduces Occamy, a novel 432-core RISC-V-based accelerator with a dual-chiplet 2.5D design, optimized for diverse linear algebra workloads with high utilization and flexible FP support.

Findings

01

Achieves 83% FPU utilization on stencil computations

02

Delivers 42% and 49% efficiency on sparse-dense and sparse-sparse matrix multiply

03

Reaches 28.1 DP-GFLOP/s/W energy efficiency

Abstract

We present Occamy, a 432-core RISC-V dual-chiplet 2.5D system for efficient sparse linear algebra and stencil computations on FP64 and narrow (32-, 16-, 8-bit) SIMD FP data. Occamy features 48 clusters of RISC-V cores with custom extensions, two 64-bit host cores, and a latency-tolerant multi-chiplet interconnect and memory system with 32 GiB of HBM2E. It achieves leading-edge utilization on stencils (83 %), sparse-dense (42 %), and sparse-sparse (49 %) matrix multiply.

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.