CC-Net: Image Complexity Guided Network Compression for Biomedical Image   Segmentation

Suraj Mishra; Peixian Liang; Adam Czajka; Danny Z. Chen; X. Sharon Hu

arXiv:1901.01578·cs.CV·September 10, 2019

CC-Net: Image Complexity Guided Network Compression for Biomedical Image Segmentation

Suraj Mishra, Peixian Liang, Adam Czajka, Danny Z. Chen, X. Sharon Hu

PDF

Open Access 1 Repo

TL;DR

CC-Net is a novel image complexity-guided approach for efficiently compressing CNNs in biomedical image segmentation, predicting accuracy for different sizes to optimize network compression while maintaining high accuracy.

Contribution

It introduces a method that predicts network accuracy based on image complexity, enabling rapid compression of CNNs without extensive retraining.

Findings

01

Retains up to 95% of original segmentation accuracy.

02

Uses only 0.1% of trainable parameters of the full network.

03

Effective for generating compressed biomedical segmentation networks.

Abstract

Convolutional neural networks (CNNs) for biomedical image analysis are often of very large size, resulting in high memory requirement and high latency of operations. Searching for an acceptable compressed representation of the base CNN for a specific imaging application typically involves a series of time-consuming training/validation experiments to achieve a good compromise between network size and accuracy. To address this challenge, we propose CC-Net, a new image complexity-guided CNN compression scheme for biomedical image segmentation. Given a CNN model, CC-Net predicts the final accuracy of networks of different sizes based on the average image complexity computed from the training data. It then selects a multiplicative factor for producing a desired network with acceptable network accuracy and size. Experiments show that CC-Net is effective for generating compressed segmentation…

Tables3

Table 1. Table 1 : Datasets and properties.

Dataset	Size	Type	J	B	Source
Glands (GL)	165	RGB	0.2401	0.5711	[13]
Lymph Nodes (LN)	74	Ultrasound	0.2445	0.0715	in-house
Melanoma (ME)	2750	RGB	0.1505	0.3055	[14]
C2DH-HeLa (CH)	20	Gray	0.1403	0.4607	[15]
Wing Discs (WD)	996	Gray	0.0925	0.1348	in-house
C2DH-U373 (CU)	34	Gray	0.1473	0.0699	[15]
C2DL-PSC (CP)	4	Gray	0.2296	0.3066	[15]

Table 2. Table 2 : Segmentation accuracy and network parameters on the C2DH-U373 and C2DL-PSC datasets.

			U-Net [2]			CUMedVision [3]			FCN [1]
	Method	Dataset	F1	IU	log(#P)	F1	IU	log(#P)	F1	IU	log(#P)
	Base Network	C2DH-U373	0.896	0.900	7.492	0.891	0.895	6.887	0.891	0.894	7.552
	Base Network	C2DL-PSC	0.801	0.820	7.492	0.793	0.814	6.887	0.755	0.788	7.552
Compressed Networks	Base Network + Squeeze [6]	C2DH-U373	0.819	0.854	7.049	0.832	0.863	6.669	0.844	0.875	7.369
	Base Network + Squeeze [6]	C2DL-PSC	0.752	0.781	7.049	0.751	0.781	6.669	0.697	0.753	7.369
	Base Network + Prune [9]	C2DH-U373	0.858	0.867	7.491	0.848	0.861	6.886	0.809	0.837	7.551
	Base Network + Prune [9]	C2DL-PSC	0.749	0.785	7.491	0.744	0.768	6.886	0.691	0.738	7.552
	CC-Net-case1	C2DH-U373	0.863	0.890	5.436	0.868	0.866	5.378	0.880	0.885	5.939
	CC-Net-case1	C2DL-PSC	0.775	0.818	6.640	0.763	0.794	6.341	0.720	0.766	6.949
	CC-Net-case1 + Squeeze	C2DH-U373	0.806	0.840	5.243	0.820	0.853	5.245	0.824	0.860	5.915
	CC-Net-case1 + Squeeze	C2DL-PSC	0.681	0.735	6.197	0.629	0.705	6.176	0.663	0.728	6.786
	CC-Net-case1 + Prune	C2DH-U373	0.834	0.847	5.435	0.834	0.847	5.377	0.830	0.843	5.938
	CC-Net-case1 + Prune	C2DL-PSC	0.772	0.800	6.639	0.750	0.786	6.341	0.678	0.730	6.949
	CC-Net-case1- $ϵ$	C2DH-U373	0.841	0.872	5.277	0.816	0.849	5.297	0.817	0.844	5.847
	CC-Net-case1- $ϵ$	C2DL-PSC	0.751	0.781	6.603	0.759	0.785	6.315	0.713	0.742	6.922
	CC-Net-case2	C2DH-U373	0.832	0.863	5.097	0.807	0.837	5.097	0.803	0.834	5.097
	CC-Net-case2	C2DL-PSC	0.698	0.745	5.097	0.711	0.743	5.097	0.644	0.719	5.097

Table 3. Table 3 : Training time consideration (test case 1)

Approach	Dataset	Pre-training	Training	Post-training
U-Net+[9]	C2DH-U373	-	10781 ms	160
U-Net+[9]	C2DL-PSC	-	2348 ms	30
Ours (new)	C2DH-U373	O	4786 ms	-
Ours (new)	C2DL-PSC	O	1282 ms	-
Ours (existing)	C2DH-U373	Negligible	4786 ms	-
Ours (existing)	C2DL-PSC	Negligible	1282 ms	-

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

aczajka/iris-recognition---pm-diseased-human-driven-bsif
pytorch

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsAI in cancer detection · Medical Image Segmentation Techniques · Advanced Neural Network Applications

Full text

CC-Net: Image Complexity Guided Network Compression for Biomedical Image Segmentation

Abstract

Convolutional neural networks (CNNs) for biomedical image analysis are often of very large size, resulting in high memory requirement and high latency of operations. Searching for an acceptable compressed representation of the base CNN for a specific imaging application typically involves a series of time-consuming training/validation experiments to achieve a good compromise between network size and accuracy. To address this challenge, we propose CC-Net, a new image complexity-guided CNN compression scheme for biomedical image segmentation. Given a CNN model, CC-Net predicts the final accuracy of networks of different sizes based on the average image complexity computed from the training data. It then selects a multiplicative factor for producing a desired network with acceptable network accuracy and size. Experiments show that CC-Net is effective for generating compressed segmentation networks, retaining up to $\approx 95\%$ of the base network segmentation accuracy and utilizing only $\approx 0.1\%$ of trainable parameters of the full-sized networks in the best case.

**Index Terms— ** Biomedical image segmentation, Deep neural networks, Network compression, Image complexity

1 Introduction

CNNs are often of very large size, resulting in high memory requirement and high latency of operations, and thus not suitable for resource-constrained applications (e.g., edge computing). To find a good compromise between network size and performance, a series of time-consuming training/validation experiments is often used for a specific imaging application. To address this challenge, we propose a new network compression scheme targeting biomedical image segmentation in resource-constrained application settings (e.g., low cost and easy-to-carry imaging devices for disaster/emergency response and military rescue).

Since the inception of FCNs [1], various improved segmentation networks [2, 3, 4, 5] were developed. To compress CNNs, various pre-training [6, 7] and post-training compression [8, 9] schemes were suggested. In these techniques, compression thresholds often need to be set manually in multiple pruning iterations.

In contrast with natural scene images, in biomedical or healthcare application settings, images are often for a specific type of disease/injury and captured by specific imaging devices; hence, their objects and settings are quite “stable”, making the image characteristics and complexity much more specific to analyze. In this paper, we leverage this observation to introduce CC-Net.

Based on the image complexity measure, target CNN, and user constraints (e.g., desired accuracy or available memory), CC-Net determines for the given dataset the most suitable multiplicative factor to compress the original CNN. The resulting compressed network is then trained, with much less effort and memory compared to the original network. Experiments using 5 public and 2 in-house datasets and 3 commonly-used CNN segmentation models as representative networks show that CC-Net is effective for compressing segmentation networks, retaining up to $\approx 95\%$ of the base network segmentation accuracy and utilizing only $\approx 0.1\%$ of trainable parameters of the full-sized networks in the best case.

2 Methodology

Feature-map (filter output) energy is a good indicator of filter’s feature extraction capability. We have conducted a large set of experiments to study the relationship between feature-map energy and training datasets. Fig. 1 depicts 3 example energy distribution for the first convolution layer of U-Net [2]. One can observe that (i) a significant number of filter outputs have very low energy, and (ii) less “complex” (to be defined more precisely later) datasets have more low-energy filter outputs. These suggest that U-Net [2] may be unnecessarily large for some biomedical datasets, and in these cases, filters can be pruned without significantly deteriorating the accuracy.

Based on above observations, we develop CC-Net, depicted in Fig. 2. Inputs and internal operations of CC-Net are shown in parallelograms and rectangles. Existing architectures are the 3 CNNs studied and parameterized in our work. Colored boxes highlights the key contributions of this paper. We elaborate the major components in CC-Net below.

2.1 Image Complexity Computation

We seek an image complexity metric that can (i) indicate the trends of segmentation accuracy and (ii) be easily computed. Our work examined the following candidate metrics: (i) signal energy, (ii) edge information (Sobel and Scharr filters along with image pyramid), (iii) local key-point detection using SURF [10], (iv) visual clutter information [11], (v) JPEG complexity [12] and (vi) blob density. To obtain a single complexity value for an entire dataset, we take the average of complexity values over all the images in the dataset.

Out of 7 datasets shown in Table. 1, 5 datasets (train-set, top 5 rows) are used to formulate the methodology, while the remaining 2 datasets (test-set) are used for blind evaluation. Fig. 4 plots average complexities (normalized to the range [0,1]) against the train-set datasets arranged as their F1 and IU score degradation (two most popular segmentation accuracy metrics). Among these complexity measures, the JPEG complexity better follows the trend of F1 score degradation (i.e., higher complexity leads to lower F1). Since IU is related to both feature variety and quantity, to represent it, we linearly combine the JPEG complexity $J$ and blob density $B$ ( $B=\sum_{i}fg\_pixel/\sum_{i}img\_pixel$ , see Table 1), as $JB=\omega J+(1-\omega)B$ , where $\omega$ is a value in $[0,1]$ . The value of $\omega$ is determined by inspecting the optimal regression fitting on the training datasets in our experiments. We consider J and JB for multiplier determination explained as follows.

2.2 Multiplier Determination and Network Compression

Keeping all other variables unchanged, we can express the relationship between the segmentation accuracy ( $A$ ) and data complexity ( $C$ ) as $A=f(\theta,C)$ , where $\theta$ is the number of trainable parameters in a CNN. For general networks, the function $f(\theta,C)$ can be rather complicate. But in general, segmentation accuracy is monotonically non-decreasing with respect to $\theta$ and $C$ , i.e., $\frac{\partial f}{\partial\theta}\geq 0$ and $\frac{\partial f}{\partial C}\geq 0$ .

For CNNs (see Fig. 3), we observe (as discussed in Section 3) that $\frac{\partial f}{\partial\log\theta}$ can be approximated by a linear function of $C$ . That is, $\frac{\partial f}{\partial\log\theta}\approx\lambda C+\delta$ for a constant $\lambda$ that reflects the degree of degradation. Given the linear dependency, if $\lambda$ and $\delta$ are known, then it is straightforward to compute the change in accuracy or in the number of parameters, when the other is provided. The value of $\lambda$ is network-dependent, and can be obtained by performing systematic analysis on network compression and tracking the change in accuracy.

A simple way of compression is to uniformly scale down the number of feature maps in every convolution layer using a single multiplier ( $\alpha\in(0,1]$ ). Existing work has shown that it performs very well [7, 16]. The number of trainable parameters after scaling becomes $\theta^{*}=\alpha FM_{i}$ $\times F_{i}^{X}\times F_{i}^{Y}\times\alpha FM_{i+1}$ , where $FM_{i}$ and $FM_{i+1}$ are the numbers of input and output feature maps, and $F_{i}^{X}$ and $F_{i}^{Y}$ are filter dimensions. However, finding a good $\alpha$ is challenging. We employ complexity measures to determine $\alpha$ .

When producing compressed networks, we consider two practical scenarios: (1) memory-constrained best possible accuracy, and (2) accuracy-guided least memory usage. For (1), two sub-cases are: (1.a) disk space budget and (1.b) main memory budget. For case (1.a), given a disk space budget in MB, we first determine $\theta^{*}$ , based on the number of bits for each parameter. Then $\alpha$ can be computed as $\alpha=\sqrt{\frac{\theta^{*}}{\theta}}$ . For case (1.b), sizes of feature-maps are considered along with the number of bits for $\theta^{*}$ , and the value of $\alpha$ can be determined as $\alpha=\frac{\theta^{*}}{\theta}$ . For (2), given the lowest acceptable accuracy $A_{min}$ and the original base network accuracy $A_{org}$ , using the linear model, $A_{org}-A_{min}=(\lambda C+\delta)(\log\theta-\log\theta^{*})$ , $\theta^{*}$ and so as $\alpha$ can be readily computed. Using $\alpha$ , a compressed network is produced, which then can be trained.

3 Experimental Evaluation

5 train-set datasets (Glands, Lymph Nodes, Melanoma, C2DH-HeLa, Wing Discs) are used to determine $\frac{\partial A}{\partial\log\theta}$ for 3 CNN models (Fig. 3), which is then mapped to J & JB to determine $\lambda$ . For simple calculations maintaining integer filter values, $\alpha\in\{1,0.75,0.5,0.25$ , $0.1875$ , $0.125,0.0625,0.03125\}$ , are considered (Fig. 6 & Fig. 7 (a), (c) X-axis). 2 test-set datasets (C2DH-U373, C2DL-PSC) are used to validate our method. We use a standard back-propagation implementing Adam (learning rate = 0.00005) and cross entropy as loss function using data augmentation. Experiments are performed on NVIDIA-TITAN and Tesla P100 GPUs, using the Torch framework.

Fig. 5 shows some segmentation output. Fig. 6 and 7 show the calculated degree of degradation ( $\lambda$ ) for FCN [1], U-Net [2], and CUMedVision [3] networks. In these figures, (a) and (c) give the degradation in the relative F1 and IU accuracy (i.e., $\frac{Acc_{\alpha}}{Acc_{\alpha=1}}$ ) with respect to changes in the number of parameters expressed in logarithmic values. The slopes of regression lines for each dataset in (a) and (c) are plotted against the respective complexities in (b) and (d).

Test case 1 (accuracy-guided least memory usage). We consider an example constraint of $F1_{compressed}\geq 95\%F1_{base}$ . The $\Delta\log\theta$ is estimated using $\lambda$ and $\delta$ and complexity (Table 1). Using the ceiling $\alpha$ values, compressed networks are trained and analyzed. As shown in Table 2, a significant compression is achieved (best 113x for C2DH-U373 on U-net and least 3.5x for C2DL-PSC on CUMed) with much better accuracy compared to compression achieved using only [6] or [9]. To validate the effectiveness in estimating $\alpha$ , we introduce a small reduction in $\alpha$ value ( $\epsilon=\frac{1}{64}$ , smallest possible keeping integer filters); the accuracy degrades below $95\%$ (Table 2, row CC-Net-case1- $\epsilon$ ). CC-Net compression does not show much improvement when pruned further, indicating few remaining ineffective filters.

Test case 2 (memory-constrained best possible accuracy). We consider a disk space budget of 1 MB. Using ceiling of $\alpha=\sqrt{\theta^{*}/\theta}$ , compressed networks are produced as shown in Table 2, whose accuracy satisfies the accuracy prediction made by our method (Fig. 8).

The overall reduction (R = $\frac{base}{compressed}$ ) in trainable parameters (PR) and evaluation latency (LR) for all 7 datasets (for test case 1) is plotted in Fig. 9. Larger complexity results in less compression, indicating a higher requirement in trainable parameters for extracting features. CC-Net achieves parameter and latency reduction in the range of $1000x$ to $2x$ and $17x$ to $1.5x$ for different datasets.

Table 3 shows training time for [9] and CC-Net on U-Net for test case 1 (on P100 GPU). Per epoch training time (in ms) is provided along with number of pruning epochs (column Post-training). We have used fewer fine-tuning iterations per pruning epoch, however, pruning is expensive and can exceed original network training by a factor of 3 [8, 9]. One time $\lambda$ determination (‘O’ in Table 3) for any CNN is a bottleneck for CC-Net. Yet, after this process, significant reduction in training time can be achieved for any dataset, trained on the same network. We consider ‘O’ can be computed under 2x training time of base architecture, with a sufficient degree of accuracy, using 2 datasets with two $\alpha$ points ( $\alpha\in\{0.25,0.03125\}$ .

4 Conclusions

In this paper, we presented a new image complexity-guided network compression scheme, CC-Net, for biomedical image segmentation. Instead of compressing CNNs after training, we focused on pre-training network size reduction, exploiting image complexity of the training data. Our method is effective in quickly generating compressed networks with target accuracy, outperforming state-of-the-art network compression methods. Our scheme accommodates practical applied design constraints for compressing CNNs for biomedical image segmentation.

5 Acknowledgement

This work was supported in part by the National Science Foundation under Grants CNS-1629914, CCF-1640081, and CCF-1617735, and by the Nanoelectronics Research Corporation, a wholly-owned subsidiary of the Semiconductor Research Corporation, through Extremely Energy Efficient Collective Electronics, an SRC-NRI Nanoelectronics Research Initiative under Research Task ID 2698.004 and 2698.005.

Bibliography16

The reference list from the paper itself. Each links out to its DOI / PubMed record.

1[1] Jonathan Long, Evan Shelhamer, and Trevor Darrell, “Fully convolutional networks for semantic segmentation,” Co RR , vol. abs/1411.4038, 2014.
2[2] O. Ronneberger, P. Fischer, and T. Brox, “U-Net: Convolutional networks for biomedical image segmentation,” Ar Xiv e-prints , May 2015.
3[3] Hao Chen, Xiaojuan Qi, Jie-Zhi Cheng, and Pheng-Ann Heng, “Deep contextual networks for neuronal structure segmentation,” in AAAI , 2016, pp. 1167–1173.
4[4] L. Yang, Y. Zhang, J. Chen, S. Zhang, and D. Z. Chen, “Suggestive annotation: A deep active learning framework for biomedical image segmentation,” in 20th International Conference on Medical Image Computing and Computer Assisted Intervention (MICCAI) , 2017, vol. III, pp. 399–407.
5[5] L. Wu, Y. Xin, S. Li, T. Wang, P. A. Heng, and D. Ni, “Cascaded fully convolutional networks for automatic prenatal ultrasound image segmentation,” in 14th IEEE International Symposium on Biomedical Imaging (ISBI) , April 2017, pp. 663–666.
6[6] F. N. Iandola, S. Han, et al., “Squeeze Net: Alex Net-level accuracy with 50x fewer parameters and < < 0.5MB model size,” Ar Xiv e-prints , Feb. 2016.
7[7] A. G. Howard, M. Zhu, et al., “Mobile Nets: Efficient convolutional neural networks for mobile vision applications,” Ar Xiv e-prints , Apr. 2017.
8[8] S. Han, H. Mao, et al., “Deep compression: Compressing deep neural networks with pruning, trained quantization and Huffman coding,” Ar Xiv e-prints , Oct. 2015.