Sensitivity-Aware Mixed-Precision Quantization and Width Optimization of   Deep Neural Networks Through Cluster-Based Tree-Structured Parzen Estimation

Seyedarmin Azizi; Mahdi Nazemi; Arash Fayyazi; Massoud Pedram

arXiv:2308.06422·cs.LG·August 12, 2024·2 cites

Sensitivity-Aware Mixed-Precision Quantization and Width Optimization of Deep Neural Networks Through Cluster-Based Tree-Structured Parzen Estimation

Seyedarmin Azizi, Mahdi Nazemi, Arash Fayyazi, Massoud Pedram

PDF

Open Access

TL;DR

This paper presents a novel, efficient method for optimizing neural network bit-width and layer width using a cluster-based tree-structured Parzen estimator, significantly reducing search time and model size while maintaining accuracy.

Contribution

It introduces a new search mechanism combining Hessian-based pruning and surrogate modeling for rapid, effective neural network architecture optimization.

Findings

01

20% reduction in model size without accuracy loss

02

12x faster search compared to existing methods

03

Effective optimization on standard datasets

Abstract

As the complexity and computational demands of deep learning models rise, the need for effective optimization methods for neural network designs becomes paramount. This work introduces an innovative search mechanism for automatically selecting the best bit-width and layer-width for individual neural network layers. This leads to a marked enhancement in deep neural network efficiency. The search domain is strategically reduced by leveraging Hessian-based pruning, ensuring the removal of non-crucial parameters. Subsequently, we detail the development of surrogate models for favorable and unfavorable outcomes by employing a cluster-based tree-structured Parzen estimator. This strategy allows for a streamlined exploration of architectural possibilities and swift pinpointing of top-performing designs. Through rigorous testing on well-known datasets, our method proves its distinct advantage…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsAdvanced Neural Network Applications · Machine Learning and Data Classification · Domain Adaptation and Few-Shot Learning