Phoenix: A Low-Precision Floating-Point Quantization Oriented Architecture for Convolutional Neural Networks
Chen Wu, Mingyu Wang, Xiayu Li, Jicheng Lu, Kun Wang, and Lei He

TL;DR
Phoenix introduces a low-precision floating-point architecture for CNNs that reduces accuracy loss and improves hardware efficiency, enabling faster processing without extensive calibration or re-training.
Contribution
The paper presents a normalization-oriented 8-bit floating-point quantization method and a specialized hardware processor, significantly enhancing CNN performance and efficiency.
Findings
8-bit floating-point quantization outperforms fixed-point in accuracy
Normalization before quantization reduces accuracy loss without calibration
Phoenix achieves 3.32x to 7.45x performance improvements over state-of-the-art accelerators
Abstract
Convolutional neural networks (CNNs) achieve state-of-the-art performance at the cost of becoming deeper and larger. Although quantization (both fixed-point and floating-point) has proven effective for reducing storage and memory access, two challenges -- 1) accuracy loss caused by quantization without calibration, fine-tuning or re-training for deep CNNs and 2) hardware inefficiency caused by floating-point quantization -- prevent processors from completely leveraging the benefits. In this paper, we propose a low-precision floating-point quantization oriented processor, named Phoenix, to address the above challenges. We primarily have three key observations: 1) 8-bit floating-point quantization incurs less error than 8-bit fixed-point quantization; 2) without using any calibration, fine-tuning or re-training techniques, normalization before quantization further reduces accuracy…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsImage Processing Techniques and Applications · Advanced Vision and Imaging · Advanced Image Processing Techniques
