Mixed-Precision Training and Compilation for RRAM-based Computing-in-Memory Accelerators

Rebecca Pelke; Joel Klein; Jose Cubero-Cascante; Nils Bosbach; Jan Moritz Joseph; Rainer Leupers

arXiv:2601.21737·cs.LG·March 20, 2026

Mixed-Precision Training and Compilation for RRAM-based Computing-in-Memory Accelerators

Rebecca Pelke, Joel Klein, Jose Cubero-Cascante, Nils Bosbach, Jan Moritz Joseph, Rainer Leupers

PDF

Open Access

TL;DR

This paper presents a mixed-precision training and compilation framework for RRAM-based CIM accelerators, using reinforcement learning to optimize quantization, resulting in significant speedups with minimal accuracy loss.

Contribution

It introduces a novel reinforcement learning-based method to optimize mixed-precision quantization for CIM accelerators, addressing the large search space challenge.

Findings

01

Achieves up to 2.48x speedup over state-of-the-art methods.

02

Maintains an accuracy loss of only 0.086%.

03

Effectively balances latency and accuracy through learned quantization configurations.

Abstract

Computing-in-Memory (CIM) accelerators are a promising solution for accelerating Machine Learning (ML) workloads, as they perform Matrix-Vector Multiplications (MVMs) on crossbar arrays directly in memory. Although the bit widths of the crossbar inputs and cells are very limited, most CIM compilers do not support quantization below 8 bit. As a result, a single MVM requires many compute cycles, and weights cannot be efficiently stored in a single crossbar cell. To address this problem, we propose a mixed-precision training and compilation framework for CIM architectures. The biggest challenge is the massive search space, that makes it difficult to find good quantization parameters. This is why we introduce a reinforcement learning-based strategy to find suitable quantization configurations that balance latency and accuracy. In the best case, our approach achieves up to a 2.48x speedup…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsParallel Computing and Optimization Techniques · Advanced Neural Network Applications · Embedded Systems Design Techniques