Quantizing YOLOv7: A Comprehensive Study
Mohammadamin Baghbanbashi, Mohsen Raji, Behnam Ghavami

TL;DR
This paper investigates the impact of various quantization schemes on YOLOv7, demonstrating significant memory savings with minimal accuracy loss, thus aiding deployment on resource-constrained devices.
Contribution
It provides an in-depth analysis of quantization effects on YOLOv7, a recent state-of-the-art object detection model, which was not thoroughly studied before.
Findings
4-bit quantization achieves ~3.9x memory reduction.
Minimal accuracy loss of 1-2.5% with quantization.
Effective combination of granularities enhances compression.
Abstract
YOLO is a deep neural network (DNN) model presented for robust real-time object detection following the one-stage inference approach. It outperforms other real-time object detectors in terms of speed and accuracy by a wide margin. Nevertheless, since YOLO is developed upon a DNN backbone with numerous parameters, it will cause excessive memory load, thereby deploying it on memory-constrained devices is a severe challenge in practice. To overcome this limitation, model compression techniques, such as quantizing parameters to lower-precision values, can be adopted. As the most recent version of YOLO, YOLOv7 achieves such state-of-the-art performance in speed and accuracy in the range of 5 FPS to 160 FPS that it surpasses all former versions of YOLO and other existing models in this regard. So far, the robustness of several quantization schemes has been evaluated on older versions of YOLO.…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
MethodsSPEED: Separable Pyramidal Pooling EncodEr-Decoder for Real-Time Monocular Depth Estimation on Low-Resource Settings
