Efficient Neural Architecture Search on Low-Dimensional Data for OCT   Image Segmentation

Nils Gessert; Alexander Schlaefer

arXiv:1905.02590·eess.IV·July 29, 2019

Efficient Neural Architecture Search on Low-Dimensional Data for OCT Image Segmentation

Nils Gessert, Alexander Schlaefer

PDF

TL;DR

This paper introduces an efficient neural architecture search method for medical image segmentation, using low-dimensional data to reduce search time significantly while maintaining high performance on high-dimensional OCT images.

Contribution

It proposes a novel NAS approach that searches on low-dimensional data and transfers architectures to high-dimensional data, saving time in medical imaging tasks.

Findings

01

Search on 1D data reduces search time by 87.5%.

02

Final models on 2D data achieve similar performance to those searched directly on 2D.

03

Method is effective for OCT layer segmentation.

Abstract

Typically, deep learning architectures are handcrafted for their respective learning problem. As an alternative, neural architecture search (NAS) has been proposed where the architecture's structure is learned in an additional optimization step. For the medical imaging domain, this approach is very promising as there are diverse problems and imaging modalities that require architecture design. However, NAS is very time-consuming and medical learning problems often involve high-dimensional data with high computational requirements. We propose an efficient approach for NAS in the context of medical, image-based deep learning problems by searching for architectures on low-dimensional data which are subsequently transferred to high-dimensional data. For OCT-based layer segmentation, we demonstrate that a search on 1D data reduces search time by 87.5% compared to a search on 2D data while…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

MethodsSigmoid Activation · Tanh Activation · Softmax · Long Short-Term Memory

Full text

\jmlrvolume

– Accepted \jmlryear2019 \jmlrworkshopExtended Abstract – MIDL 2019 submission

\midlauthor\NameNils Gessert\nametag1 \[email protected]

\NameAlexander Schlaefer\nametag1 \[email protected]

\addr1Institute of Medical Technology, Hamburg University of Technology, Germany

Efficient Neural Architecture Search on Low-Dimensional Data for OCT Image Segmentation

Abstract

Typically, deep learning architectures are handcrafted for their respective learning problem. As an alternative, neural architecture search (NAS) has been proposed where the architecture’s structure is learned in an additional optimization step. For the medical imaging domain, this approach is very promising as there are diverse problems and imaging modalities that require architecture design. However, NAS is very time-consuming and medical learning problems often involve high-dimensional data with high computational requirements. We propose an efficient approach for NAS in the context of medical, image-based deep learning problems by searching for architectures on low-dimensional data which are subsequently transferred to high-dimensional data. For OCT-based layer segmentation, we demonstrate that a search on 1D data reduces search time by $87.5\text{\,}\mathrm{\char 37\relax}$ compared to a search on 2D data while the final 2D models achieve similar performance.

keywords:

Neural Architecture Search, Deep Learning, Segmentation, OCT

††editors: Accepted at MIDL 2019

1 Introduction

Over the last years, manual feature engineering has been replaced by deep learning approaches such as convolutional neural networks (CNNs) for numerous medical, image-based learning problems [Litjens et al.(2017)Litjens, Kooi, Bejnordi, Setio, Ciompi, Ghafoorian, Van Der Laak, Van Ginneken, and Sánchez]. CNNs itself are often difficult to design and it is unclear what kind of architecture is suitable for which learning problem. Therefore, neural architecture search (NAS) has been proposed. Typical NAS approaches include grid search, genetic algorithms, bayesian optimization or random search [Kandasamy et al.(2018)Kandasamy, Neiswanger, Schneider, Poczos, and Xing]. Recently, reinforcement learning (RL) methods have been proposed where a recurrent controller is trained to predict an architecture’s structure by maximizing the architecture’s expected validation performance as a reward [Zoph and Le(2016)]. This approach has been successful for 2D image classification problems [Liu et al.(2018)Liu, Zoph, Neumann, Shlens, Hua, Li, Fei-Fei, Yuille, Huang, and Murphy, Zoph et al.(2018)Zoph, Vasudevan, Shlens, and Le].

The concept of NAS is also very promising for the medical image domain as there is a vast amount of imaging modalities and learning problems that require architecture design. However, NAS can be very time-consuming which is even more problematic for medical image data which is often 3D or 4D in nature [Li et al.(2008)Li, Citrin, Camphausen, Mueller, Burman, Mychalczak, Miller, and Song]. Some approaches have used lower dimensional data representations such as 2D slices instead of full 3D volumes in order to reduce computational effort [Litjens et al.(2017)Litjens, Kooi, Bejnordi, Setio, Ciompi, Ghafoorian, Van Der Laak, Van Ginneken, and Sánchez]. However, many approaches have shown that considering higher dimensional context can improve performance [Kamnitsas et al.(2017)Kamnitsas, Ledig, Newcombe, Simpson, Kane, Menon, Rueckert, and Glocker, Gessert et al.(2018a)Gessert, Beringhoff, Otte, and Schlaefer, Gessert et al.(2018b)Gessert, Schlüter, and Schlaefer].

We propose an efficient NAS approach for segmentation with mutlidimensional medical image data. To overcome long architecture search times, we perform the search on lower dimensional data which leads to shorter search times. Then, we transfer the learned architecture to the higher, target dimension. We show the concept for the example task of retinal layer segmentation with optical coherence tomography (OCT) data as the problem can be addressed in 1D (A-Scan segmentation) and 2D (B-Scan segmentation). Adopting the efficient neural architecture search (ENAS) framework [Pham et al.(2018)Pham, Guan, Zoph, Le, and Dean], we learn submodules for a U-Net-like [Ronneberger et al.(2015)Ronneberger, Fischer, and Brox] architecture. We demonstrate that our learned architecture outperforms a ResNet-inspired [He et al.(2016)He, Zhang, Ren, and Sun] baseline and that an architecture learned on 1D data transfers well to 2D data.

2 Methods

Dataset. We use a publicly available OCT dataset with images from patients with mild age-related macular degeneration (AMD) and normal subjects [Farsiu et al.(2014)Farsiu, Chiu, O’Connell, Folgar, Yuan, Izatt, Toth, Group, et al.]. Experts provided layer boundaries for the inner limiting membrane (ILM), retinal pigment epithelium drusen complex (RPEDC) and Bruchs membrane (BM). We generate pixel-wise annotations by assigning classes to tissue layers in between boundaries, i.e., ILM to RPEDC is class 1, RPEDC to BM is class 2 and BM to the end is class 3. The image space above the ILM is treated as background. Note that directly learning the boundaries can be beneficial for this problem [Roy et al.(2017)Roy, Conjeti, Karri, Sheet, Katouzian, Wachinger, and Navab]. We chose a pixel-wise encoding to have a representative medical segmentation task that can be addressed with a standard U-Net.

Baseline Model. As a baseline we use a U-Net-like model. The model takes a 1D A-Scan or a 2D B-Scan as its input and predicts a segmentation map with the same size as the input. For the long-range connections we use summation, following [Yu et al.(2017)Yu, Yang, Chen, Qin, and Heng]. We use ResNet blocks in the network. Convolutions use a kernel size of $3$ and extensions from 1D to 2D are performed by extending all kernels isotropically by an additional dimension.

ENAS U-Net. Next, we adopt the ENAS framework [Pham et al.(2018)Pham, Guan, Zoph, Le, and Dean] for image classification to image segmentation with a U-Net. To simplify the architecture search space, we keep the general U-Net structure fixed and only learn new module blocks, similar to the micro search space in ENAS. The input/output and downsampling/upsampling layers also stay fixed. For the module search space, we let the controller learn the properties of $2$ cells each containing $2$ subcells. The cells’ output is the summation of the subcells’ output. For each subcell, the controller defines its input (the module input or another cell’s output) and its operation. Similar to ENAS, we allow five basic operations for the controller to choose from: convolutions with kernel size $3$ or $5$ , average- and max-pooling with kernel size $3$ and the identity transform.

Training and Evaluation. We consider a training set of $150$ volumes (model training), a reward set of $56$ volumes (controller training), a validation set of $2$ volumes and a test set of $60$ volumes. We follow ENAS with interleaved training of the model (dice loss) and the controller (dice score reward). After training for $200$ epochs, we sample $20$ architecture configurations from the controller and evaluate them on the validation set. Then, we select the best-performing configuration and retrain the model from scratch on the training set. Finally, we evaluate the model’s performance on the test set. For the baseline model, we train on the training set for $200$ epochs and evaluate on the test set afterwards.

3 Results and Discussion

The architecture and the learned modules are shown in \figurereffig:model. The results are shown in \tablereftab:results. Both the 1D and 2D architectures learned with ENAS on 1D data outperform the ResNet baseline. Notably, the increase is achieved without altering fundamental and potentially more impactful U-Net properties such as the encoder-decoder structure or the long-range connections. As a next step, these properties could be included in the search space which was successful for segmentation in the natural image domain with DeepLab-based architectures [Liu et al.(2019)Liu, Chen, Schroff, Adam, Hua, Yuille, and Fei-Fei].

Performing a search on 1D data substantially decreases the search time by $87.5\text{\,}\mathrm{\char 37\relax}$ compared to a search on 2D data while performance differences are marginal. This is particularly interesting as the OCT data is not isotropic and the spatial dimensions are quite different. This indicates that learning on low-dimensional, less resource demanding data representations is a viable approach for NAS. Thus, extension to other problems such as brain segmentation might be feasible, e.g., by performing NAS on axial slices before applying the discovered architectures on 3D volume data.

Summarized, we propose an efficient approach for NAS in the context of multidimensional medical image data. We demonstrate that searching for an architecture on low-dimensional data transfers well to high-dimensional data. An architecture discovered on 1D data performs similar to one discovered on 2D data while substantially reducing search time. Our approach could enable efficient NAS for a variety of medical learning problems.

\midlacknowledgments

This work was partially funded by the TUHH $I^{3}$ -Labs initiative.

Bibliography16

The reference list from the paper itself. Each links out to its DOI / PubMed record.

1[Farsiu et al.(2014)Farsiu, Chiu, O’Connell, Folgar, Yuan, Izatt, Toth, Group, et al.] Sina Farsiu, Stephanie J Chiu, Rachelle V O’Connell, Francisco A Folgar, Eric Yuan, Joseph A Izatt, Cynthia A Toth, Age-Related Eye Disease Study 2 Ancillary Spectral Domain Optical Coherence Tomography Study Group, et al. Quantitative classification of eyes with and without intermediate age-related macular degeneration using optical coherence tomography. Ophthalmology , 121(1):162–172, 2014.
2[Gessert et al.(2018 a)Gessert, Beringhoff, Otte, and Schlaefer] Nils Gessert, Jens Beringhoff, Christoph Otte, and Alexander Schlaefer. Force estimation from oct volumes using 3d cnns. International journal of computer assisted radiology and surgery , 13(7):1073–1082, 2018 a.
3[Gessert et al.(2018 b)Gessert, Schlüter, and Schlaefer] Nils Gessert, Matthias Schlüter, and Alexander Schlaefer. A deep learning approach for pose estimation from volumetric oct data. Medical image analysis , 46:162–179, 2018 b.
4[He et al.(2016)He, Zhang, Ren, and Sun] Kaiming He, Xiangyu Zhang, Shaoqing Ren, and Jian Sun. Deep residual learning for image recognition. In Proceedings of the IEEE conference on computer vision and pattern recognition , pages 770–778, 2016.
5[Kamnitsas et al.(2017)Kamnitsas, Ledig, Newcombe, Simpson, Kane, Menon, Rueckert, and Glocker] Konstantinos Kamnitsas, Christian Ledig, Virginia FJ Newcombe, Joanna P Simpson, Andrew D Kane, David K Menon, Daniel Rueckert, and Ben Glocker. Efficient multi-scale 3d cnn with fully connected crf for accurate brain lesion segmentation. Medical image analysis , 36:61–78, 2017.
6[Kandasamy et al.(2018)Kandasamy, Neiswanger, Schneider, Poczos, and Xing] Kirthevasan Kandasamy, Willie Neiswanger, Jeff Schneider, Barnabas Poczos, and Eric P Xing. Neural architecture search with bayesian optimisation and optimal transport. In Advances in Neural Information Processing Systems , pages 2016–2025, 2018.
7[Li et al.(2008)Li, Citrin, Camphausen, Mueller, Burman, Mychalczak, Miller, and Song] Guang Li, Deborah Citrin, Kevin Camphausen, Boris Mueller, Chandra Burman, Borys Mychalczak, Robert W Miller, and Yulin Song. Advances in 4d medical imaging and 4d radiation therapy. Technology in Cancer Research & Treatment , 7(1):67–81, 2008.
8[Litjens et al.(2017)Litjens, Kooi, Bejnordi, Setio, Ciompi, Ghafoorian, Van Der Laak, Van Ginneken, and Sánchez] Geert Litjens, Thijs Kooi, Babak Ehteshami Bejnordi, Arnaud Arindra Adiyoso Setio, Francesco Ciompi, Mohsen Ghafoorian, Jeroen Awm Van Der Laak, Bram Van Ginneken, and Clara I Sánchez. A survey on deep learning in medical image analysis. Medical Image Analysis , 42:60–88, 2017.