Manticore: A 4096-core RISC-V Chiplet Architecture for Ultra-efficient Floating-point Computing
Florian Zaruba, Fabian Schuiki, Luca Benini

TL;DR
Manticore is a novel 4096-core chiplet architecture designed for ultra-efficient floating-point computing, demonstrating significant energy efficiency improvements over CPUs and GPUs for data-parallel workloads.
Contribution
The paper introduces a new chiplet-based architecture with custom ISA extensions that optimize floating-point performance and energy efficiency.
Findings
Over 5x energy efficiency improvement on FP workloads
FPU utilization exceeds 90% with custom ISA extensions
More than 40% of core area dedicated to FPU
Abstract
Data-parallel problems demand ever growing floating-point (FP) operations per second under tight area- and energy-efficiency constraints. In this work, we present Manticore, a general-purpose, ultra-efficient chiplet-based architecture for data-parallel FP workloads. We have manufactured a prototype of the chiplet's computational core in Globalfoundries 22FDX process and demonstrate more than 5x improvement in energy efficiency on FP intensive workloads compared to CPUs and GPUs. The compute capability at high energy and area efficiency is provided by Snitch clusters containing eight small integer cores, each controlling a large FPU. The core supports two custom ISA extensions: The SSR extension elides explicit load and store instructions by encoding them as register reads and writes. The FREP extension decouples the integer core from the FPU allowing floating-point instructions to be…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
