Constructing Energy-efficient Mixed-precision Neural Networks through   Principal Component Analysis for Edge Intelligence

Indranil Chakraborty; Deboleena Roy; Isha Garg; Aayush Ankit and; Kaushik Roy

arXiv:1906.01493·cs.LG·February 18, 2020

Constructing Energy-efficient Mixed-precision Neural Networks through Principal Component Analysis for Edge Intelligence

Indranil Chakraborty, Deboleena Roy, Isha Garg, Aayush Ankit and, Kaushik Roy

PDF

1 Repo

TL;DR

This paper introduces a PCA-driven method to design mixed-precision neural networks that significantly improve accuracy over binary networks while maintaining high energy efficiency for edge computing applications.

Contribution

It presents a novel PCA-based approach to identify important layers and create mixed-precision networks, enhancing performance of compressed neural networks on edge devices.

Findings

01

Over 10% accuracy improvement over binary networks like XNOR-Net.

02

Achieves up to 94% of the energy efficiency of XNOR-Nets.

03

Effective on ResNet and VGG architectures for CIFAR-100 and ImageNet.

Abstract

The `Internet of Things' has brought increased demand for AI-based edge computing in applications ranging from healthcare monitoring systems to autonomous vehicles. Quantization is a powerful tool to address the growing computational cost of such applications, and yields significant compression over full-precision networks. However, quantization can result in substantial loss of performance for complex image classification tasks. To address this, we propose a Principal Component Analysis (PCA) driven methodology to identify the important layers of a binary network, and design mixed-precision networks. The proposed Hybrid-Net achieves a more than 10% improvement in classification accuracy over binary networks such as XNOR-Net for ResNet and VGG architectures on CIFAR-100 and ImageNet datasets while still achieving up to 94% of the energy-efficiency of XNOR-Nets. This work furthers the…

Figures23

Click any figure to enlarge with its caption.

Tables8

Table 1. Table 1: Networks architectures for ImageNet classification task

ResNet - 18
7 $\times$ 7 conv 64 stride 2
3 $\times$ 3 maxpool stride 2
3 $\times$ 3 conv 64 stride 1 ( $\times$ 4)
3 $\times$ 3 conv 128 stride 2
3 $\times$ 3 conv 128 stride 1 ( $\times$ 3)
3 $\times$ 3 conv 256 stride 2
3 $\times$ 3 conv 256 stride 1 ( $\times$ 3)
3 $\times$ 3 conv 512 stride 2
3 $\times$ 3 conv 512 stride 1 ( $\times$ 3)
Linear 1000

Table 2. Table 2: Significant Layers identified by PCA analysis

CIFAR-100
Network Arch	Significant layers
ResNet-20 ( $Δ = 1$ )	8, 9, 10, 14, 15, 16, 18
ResNet-20 ( $Δ = 2$ )	8, 9, 14, 15
ResNet-32 ( $Δ = 4$ )	12, 13, 22, 23, 24
VGG-15 ( $Δ = 3$ )	3, 5, 8, 11, 12
VGG-15 ( $Δ = 10$ )	3, 5, 8
ImageNet
Network Arch	Significant layers
ResNet-18 ( $Δ = 30$ )	6, 10, 14, 15
ResNet-18 ( $Δ = 20$ )	6, 10, 11, 14, 15, 16
ResNet-18 ( $Δ = 10$ )	6, 7, 10, 11, 14, 15, 16

Table 3. Table 3: Comparison of different networks on CIFAR-100

ResNet-20

FP Accuracy - 69.49%

E . E ​ (X ​ N ​ O ​ R)

- 16.35,

M . C ​ (X ​ N ​ O ​ R)

- 17.26

Network Type

Best Accuracy (%)

Mean

\pm

SD Accuracy (%)

E . E_{N ​ o ​ r ​ m}

M . C_{N ​ o ​ r ​ m}

XNOR

50.50

50.23

\pm

0.21

1

Binary- Shortcut 1

54.16

53.92

\pm

0.21

0.99

1

Hybrid-Net (2,2) (

Δ

=1)

62.84

62.5

\pm

0.24

0.87

0.77

Hybrid-Net (2,2) (

Δ

=2)

60.93

60.53

\pm

0.39

0.93

0.88

Hybrid-Net (4,4) (

Δ

=1)

63.88

63.38

\pm

0.49

0.7

0.53

Hybrid-Net (4,4) (

Δ

=2)

61.62

61.53

\pm

0.1

0.82

0.7

Quantize(2,2)

65.81

65.19

\pm

0.49

0.73

0.65

Hybrid-Comp A (2,2)(k=6)

62.36

61.79

\pm

0.34

0.88

0.71

XNOR2x

63.03

62.81

\pm

0.14

0.39

0.33

Resnet-32

FP Accuracy - 70.62%

E . E ​ (X ​ N ​ O ​ R)

- 18.42,

M . C_{N ​ o ​ r ​ m}

- 20.44

Network Type

Best Accuracy (%)

Mean

\pm

SD Accuracy (%)

E . E_{N ​ o ​ r ​ m}

M . C_{N ​ o ​ r ​ m}

XNOR

53.89

53.48

\pm

0.27

1

Binary- Shortcut 1

58.98

58.23

\pm

0.61

0.99

1

Hybrid-Net (2,2) (

Δ

=4)

64.34

63.75

\pm

0.39

0.94

0.87

Hybrid-Net (4,4) (

Δ

=4)

64.45

64.28

\pm

0.18

0.84

0.69

Quantize(2,2)

68.04

67.73

\pm

0.21

0.7

0.61

Hybrid-Comp A (2,2)

62.41

62.15

\pm

0.2

0.91

0.76

XNOR2x

65.20

65.11

\pm

0.07

0.38

0.31

VGG-15

FP Accuracy - 68.31%

E . E ​ (X ​ N ​ O ​ R)

- 21.77,

M . C_{N ​ o ​ r ​ m}

- 26.24

Network Type

Best Accuracy (%)

Mean

\pm

SD Accuracy (%)

E . E_{N ​ o ​ r ​ m}

M . C_{N ​ o ​ r ​ m}

XNOR

54.30

54.23

\pm

0.1

1

Hybrid-Net (2,2) (

Δ

=3)

61.81

61.67

\pm

0.08

0.84

0.75

Hybrid-Net (2,2) (

Δ

=10)

60.13

59.87

\pm

0.25

0.93

0.92

Hybrid-Net (4,4) (

Δ

=3)

63.38

63.12

\pm

0.15

0.64

0.5

Hybrid-Net (4,4) (

Δ

=10)

60.37

60.06

\pm

0.18

0.81

0.8

Quantize(2,2)

68.90

68.63

\pm

0.28

0.65

0.55

Hybrid-Comp A (2,2) (k=3)

58.01

57.46

\pm

0.38

0.85

0.72

XNOR2x

58.24

57.35

\pm

0.54

0.29

0.3

Table 4. Table 4: Comparison of different networks on ImageNet

Resnet-18
FP Accuracy - 69.15%
$E . E (X N O R)$ - 8.57, $M . C (X N O R)$ - 13.35
Network Type		Best Accuracy (%)	Mean $\pm$ SD Accuracy (%)	E.E_Norm	M.C_Norm
XNOR		50.33	–	1	1
Binary-Shortcut 1		54.36	54.15 $\pm$ 0.15	1	1
Bi-Real Net [20]		56.9	–	1	1
Hybrid-Net (2,2) ( $Δ = 30$ )		60.38	59.75 $\pm$ 0.44	0.96	0.87
Hybrid-Net (2,2) ( $Δ = 20$ )		61.95	61.89 $\pm$ 0.04	0.93	0.8
Hybrid-Net (2,2) ( $Δ = 10$ )		62.73	–	0.92	0.8
Hybrid-Net (4,4) ( $Δ = 30$ )		61.70	60.54 $\pm$ 0.84	0.89	0.7
Quantize (2,2)	XNOR-kbit	64.51	–	0.84	0.71
	DoReFA [21]	62.6	–
	PACT [25]	67	–
	LQ-Nets [23]	64.9	–
Hybrid [27]		54.9	–	0.68	1
Hybrid-Comp A (2,2) (k=4)		59.47	–	0.94	0.77

Table 5. Table 5: Analysis of effect of random initialization on varying optimal Hybrid-Net architecture (ResNet-20 on CIFAR-100)

$Δ$	Initialization	Significant layers	Best Accuracy (%)	Mean $\pm$ SD Accuracy (%)	$E . E_{N o r m}$	$M . C_{N o r m}$
1	1	8,9,10,14,15,16,18	62.84	62.46 $\pm$ 0.23	0.88	0.77
	2	5,8,9,14,15,16	62.14	61.59 $\pm$ 0.29	0.89	0.82
	3	7,8,9,14,15,16	62.17	61.85 $\pm$ 0.19	0.89	0.82
	4	2,8,9,10,14,15,16,18	63.4	63.15 $\pm$ 0.13	0.86	0.77
2	1	8,9,14,15	61.49	60.72 $\pm$ 0.52	0.93	0.88
	2	8,9,14,15,16	61.29	61.48 $\pm$ 0.29	0.91	0.83
	3	6,8,9,14,15,16	61.70	61.82 $\pm$ 0.53	0.89	0.82
	4	8,9,14,15,16	61.87	61.48 $\pm$ 0.29	0.91	0.83

Table 6. Table 6: Network Configurations with randomly chosen layers as k b subscript 𝑘 𝑏 k_{b} -bit precision

Network index	Network Configurations	Best Accuracy (%)	Mean $\pm$ SD Accuracy (%)	Energy
N1	Hybrid-Net (2,2) (Delta=4)	64.34	63.75 $\pm$ 0.39	0.94
N2	Hybrid-Net (25, 26, 27, 28, 29, 30, 31)	62.70	62.62 $\pm$ 0.09	0.90
N3	Hybrid-Net (2, 11, 12, 20, 21, 23, 29 )	63.43	63.00 $\pm$ 0.37	0.91
N4	Hybrid-Net (12, 17, 18, 20, 24, 25, 26)	63.81	63.46 $\pm$ 0.21	0.91
N5	Hybrid-Net (2, 5, 6, 7, 20, 28)	61.26	61.04 $\pm$ 0.24	0.92
N6	Hybrid-Net (2, 17, 22, 25, 28, 30)	63.57	63.43 $\pm$ 0.12	0.93

Table 7. Table 7: Analysis of quantization of first and last layers in Hybrid-Net (2,2) (Delta=4) on ResNet-32 for CIFAR-100

Last Layer Configuration	Best Accuracy (%)	Mean $\pm$ SD Accuracy (%)	Energy Consumption	Memory
Full-Precision	64.34	63.75 $\pm$ 0.39	1	1
Binary weights and activations	56.93	56.83 $\pm$ 0.09	0.98	0.75
Binary weights and full-precision activations	61.07	60.36 $\pm$ 0.44	0.98	0.75
2-bit weights and activations	62.41	62.35 $\pm$ 0.05	0.98	0.76
First Layer Configuration	Best Accuracy (%)	Mean $\pm$ SD Accuracy (%)	Energy Consumption	Memory
Binary weights and activations	44.79	44.16 $\pm$ 0.74	0.85	0.98
Binary weights and full-precision activations	59.94	59.29 $\pm$ 0.57	0.93	0.98
2-bit weights and activations	60.87	60.46 $\pm$ 0.25	0.85	0.98

Table 8. Table 8: Number of operations in a k b subscript 𝑘 𝑏 k_{b} -bit layer

Operations in neural networks
Operation	Number of Operations
Input Read	$N^{2} \times I$
Weight Read	$k^{2} \times I \times O$
Computations (MAC)	$M^{2} \times I \times k^{2} \times O$
Memory Write	$M^{2} \times O$
Number of operations of $k_{b}$ -bit layer
Operation	Term	Number of Operations
k-bit Memory Access	$N_{A - k}$	$N^{2} \times I$ + $k^{2} \times I \times O$
k-bit Computations (MAC)	$N_{C - k}$	$M^{2} \times I \times k^{2} \times O$
FP Memory Access	$N_{A - F}$	$O$
FP Computations	$N_{C - F}$	$M^{2} \times O$
Energy Consumption Chart
Operation	Term	Energy (pJ)
k-b Memory Access	$E_{A - k}$	2.5 $k$
32-b MULT FP	$E_{M - F}$	3.7
32-b MULT INT	$E_{M - I}$	3.1
32-b ADD FP	$E_{A D - F}$	0.9
32-b ADD INT	$E_{A D - I}$	0.1
k-bit MAC INT	$E_{C - k I}$	((3.1* $k$ )/32+0.1)
k-bit MAC FP	$E_{C - k F}$	4.6

Equations20

V a r = i \sum O (σ_{ii}^{2}) = T r (Y_{L}^{T} Y_{L})

V a r = i \sum O (σ_{ii}^{2}) = T r (Y_{L}^{T} Y_{L})

\frac{\sum _{i = 1}^{k} ( λ _{i}^{2} )}{\sum _{i = 1}^{O} ( λ _{i}^{2} )} = T

\frac{\sum _{i = 1}^{k} ( λ _{i}^{2} )}{\sum _{i = 1}^{O} ( λ _{i}^{2} )} = T

\centering q_{k - b} (x) = 2 (\frac{⌊ ( 2 ^{k_{b}} - 1 ) ( x + 1 ) /2 ⌋}{2 ^{k_{b}} - 1} - \frac{1}{2}) \@add@centering

\centering q_{k - b} (x) = 2 (\frac{⌊ ( 2 ^{k_{b}} - 1 ) ( x + 1 ) /2 ⌋}{2 ^{k_{b}} - 1} - \frac{1}{2}) \@add@centering

X * W

X * W

k_{b}

P e na l t y = \frac{i \in / S i g _{L a y er} \sum B _{i} + i \in S i g _{L a y er} \sum B _{i} \times p}{i = 1 \sum N B _{i}}

P e na l t y = \frac{i \in / S i g _{L a y er} \sum B _{i} + i \in S i g _{L a y er} \sum B _{i} \times p}{i = 1 \sum N B _{i}}

Energy Efficiency (E . E)

Energy Efficiency (E . E)

Memory Compression (M . C)

E . E (A) E . E_{N or m} (A) M . C (A) M . C_{N or m} (A) = \frac{\sum _{i} E _{i} ( F P )}{\sum _{i} E _{i} ( A )} = \frac{E . E ( A )}{E . E ( X N O R )} = \frac{\sum _{i} M _{i} ( F P )}{\sum _{i} M _{i} ( A )} = \frac{M . C ( A )}{M . C ( X N O R )}

E . E (A) E . E_{N or m} (A) M . C (A) M . C_{N or m} (A) = \frac{\sum _{i} E _{i} ( F P )}{\sum _{i} E _{i} ( A )} = \frac{E . E ( A )}{E . E ( X N O R )} = \frac{\sum _{i} M _{i} ( F P )}{\sum _{i} M _{i} ( A )} = \frac{M . C ( A )}{M . C ( X N O R )}

E_{i} = N_{A - F} E_{A - 32} + N_{A - k} E_{A - k} + N_{C - F} E_{C - k F} + N_{C - k} E_{C - k I}

E_{i} = N_{A - F} E_{A - 32} + N_{A - k} E_{A - k} + N_{C - F} E_{C - k F} + N_{C - k} E_{C - k I}

N_{w - i} = I_{i} \times O_{i} \times k^{2}

N_{w - i} = I_{i} \times O_{i} \times k^{2}

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

ichakra2/pca-hybrid
pytorchOfficial

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

MethodsAverage Pooling · Dropout · 1x1 Convolution · Batch Normalization · Bottleneck Residual Block · Global Average Pooling · Residual Block · Dense Connections · *Communicated@Fast*How Do I Communicate to Expedia? · Kaiming Initialization

Full text

Constructing Energy-efficient Mixed-precision Neural Networks through Principal Component Analysis for Edge Intelligence

Indranil Chakraborty