# Instant Quantization of Neural Networks using Monte Carlo Methods

**Authors:** Gon\c{c}alo Mordido, Matthijs Van Keirsbilck, Alexander Keller

arXiv: 1905.12253 · 2020-01-08

## TL;DR

This paper introduces Monte Carlo Quantization (MCQ), a method that efficiently converts pre-trained neural networks into low bit-width integer networks using importance sampling, without retraining, maintaining accuracy and reducing complexity.

## Contribution

The paper presents a novel Monte Carlo-based approach for quantizing neural networks without retraining, offering configurable precision and sparsity with minimal accuracy loss.

## Key findings

- Minimal accuracy loss compared to full-precision networks
- Outperforms or matches existing quantization methods on benchmarks
- Linear time and space complexity for the quantization process

## Abstract

Low bit-width integer weights and activations are very important for efficient inference, especially with respect to lower power consumption. We propose Monte Carlo methods to quantize the weights and activations of pre-trained neural networks without any re-training. By performing importance sampling we obtain quantized low bit-width integer values from full-precision weights and activations. The precision, sparsity, and complexity are easily configurable by the amount of sampling performed. Our approach, called Monte Carlo Quantization (MCQ), is linear in both time and space, with the resulting quantized, sparse networks showing minimal accuracy loss when compared to the original full-precision networks. Our method either outperforms or achieves competitive results on multiple benchmarks compared to previous quantization methods that do require additional training.

## Full text

_Full body text omitted from this summary view._ Fetch the complete paper as Markdown: https://tomesphere.com/paper/1905.12253/full.md

## Figures

59 figures with captions in the complete paper: https://tomesphere.com/paper/1905.12253/full.md

## References

44 references — full list in the complete paper: https://tomesphere.com/paper/1905.12253/full.md

---
Source: https://tomesphere.com/paper/1905.12253