True 4-Bit Quantized Convolutional Neural Network Training on CPU: Achieving Full-Precision Parity

Shivnath Tathe

arXiv:2603.13931·cs.LG·March 17, 2026

True 4-Bit Quantized Convolutional Neural Network Training on CPU: Achieving Full-Precision Parity

Shivnath Tathe

PDF

Open Access 1 Models

TL;DR

This paper introduces a practical method for training 4-bit quantized convolutional neural networks on standard CPUs, achieving accuracy comparable to full-precision models while significantly reducing memory usage.

Contribution

The authors present a novel 4-bit training technique using only CPU operations, matching full-precision accuracy without specialized hardware or post-training quantization.

Findings

01

Achieves 92.34% accuracy on CIFAR-10 with 4-bit training on CPU

02

Maintains full-precision parity accuracy without GPU or specialized kernels

03

Demonstrates generalization to CIFAR-100 and rapid convergence on mobile devices

Abstract

Low-precision neural network training has emerged as a promising direction for reducing computational costs and democratizing access to deep learning research. However, existing 4-bit quantization methods either rely on expensive GPU infrastructure or suffer from significant accuracy degradation. In this work, we present a practical method for training convolutional neural networks at true 4-bit precision using standard PyTorch operations on commodity CPUs. We introduce a novel tanh-based soft weight clipping technique that, combined with symmetric quantization, dynamic per-layer scaling, and straight-through estimators, achieves stable convergence and competitive accuracy. Training a VGG-style architecture with 3.25 million parameters from scratch on CIFAR-10, our method achieves 92.34% test accuracy on Google Colab's free CPU tier -- matching full-precision baseline performance…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Models

🤗
shivnathtathe/T4NT-0.5B
model· 206 dl· ♡ 2
206 dl♡ 2

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsAdvanced Neural Network Applications · Parallel Computing and Optimization Techniques · Network Packet Processing and Optimization