Occamy: A 432-Core Dual-Chiplet Dual-HBM2E 768-DP-GFLOP/s RISC-V System for 8-to-64-bit Dense and Sparse Computing in 12nm FinFET
Paul Scheffler, Thomas Benz, Viviane Potocnik, Tim Fischer, Luca, Colagrande, Nils Wistoff, Yichao Zhang, Luca Bertaccini, Gianmarco Ottavi,, Manuel Eggimann, Matheus Cavalcante, Gianna Paulin, Frank K. G\"urkaynak,, Davide Rossi, Luca Benini

TL;DR
Occamy is a high-performance, energy-efficient RISC-V system with 432 cores designed to efficiently handle dense and sparse ML and HPC workloads, outperforming state-of-the-art processors in various benchmarks.
Contribution
This paper introduces Occamy, a novel 432-core RISC-V system with dual-chiplet design and hierarchical interconnect, optimized for heterogeneous dense and sparse computations in ML and HPC.
Findings
Achieves 89% FPU utilization on dense linear algebra
Surpasses state-of-the-art in stencil code compute density by 1.7x
Outperforms in sparse-dense linear algebra with 5.2x higher compute density
Abstract
ML and HPC applications increasingly combine dense and sparse memory access computations to maximize storage efficiency. However, existing CPUs and GPUs struggle to flexibly handle these heterogeneous workloads with consistently high compute efficiency. We present Occamy, a 432-Core, 768-DP-GFLOP/s, dual-HBM2E, dual-chiplet RISC-V system with a latency-tolerant hierarchical interconnect and in-core streaming units (SUs) designed to accelerate dense and sparse FP8-to-FP64 ML and HPC workloads. We implement Occamy's compute chiplets in 12 nm FinFET, and its passive interposer, Hedwig, in a 65 nm node. On dense linear algebra (LA), Occamy achieves a competitive FPU utilization of 89%. On stencil codes, Occamy reaches an FPU utilization of 83% and a technology-node-normalized compute density of 11.1 DP-GFLOP/s/mm2,leading state-of-the-art (SoA) processors by 1.7x and 1.2x, respectively. On…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
