Solving Electrical Impedance Tomography with Deep Learning

Yuwei Fan; Lexing Ying

arXiv:1906.03944·physics.comp-ph·January 29, 2020

Solving Electrical Impedance Tomography with Deep Learning

Yuwei Fan, Lexing Ying

PDF

TL;DR

This paper presents a novel deep learning approach to efficiently solve the high-dimensional, nonlinear inverse problem of electrical impedance tomography, improving reconstruction accuracy and computational speed.

Contribution

It introduces compact neural network architectures for both forward and inverse EIT maps, leveraging low-rank properties for 2D and 3D problems.

Findings

01

Neural networks accurately reconstruct conductivity from DtN maps.

02

Proposed methods are computationally efficient.

03

Effective for both 2D and 3D EIT problems.

Abstract

This paper introduces a new approach for solving electrical impedance tomography (EIT) problems using deep neural networks. The mathematical problem of EIT is to invert the electrical conductivity from the Dirichlet-to-Neumann (DtN) map. Both the forward map from the electrical conductivity to the DtN map and the inverse map are high-dimensional and nonlinear. Motivated by the linear perturbative analysis of the forward map and based on a numerically low-rank property, we propose compact neural network architectures for the forward and inverse maps for both 2D and 3D problems. Numerical results demonstrate the efficiency of the proposed neural networks.

Equations87

- div (γ (x) \nabla ϕ (x)) = 0,

- div (γ (x) \nabla ϕ (x)) = 0,

ϕ (x) = ψ (x),

Λ_{γ} : H^{\frac{1}{2}} (\partial Ω) \to H^{- \frac{1}{2}} (\partial Ω), ψ (x) ∣_{\partial Ω} \to γ (x) \frac{\partial ϕ ( x )}{\partial n ( x )} ∣_{\partial Ω},

Λ_{γ} : H^{\frac{1}{2}} (\partial Ω) \to H^{- \frac{1}{2}} (\partial Ω), ψ (x) ∣_{\partial Ω} \to γ (x) \frac{\partial ϕ ( x )}{\partial n ( x )} ∣_{\partial Ω},

(- Δ + η (x)) u (x) = 0,

(- Δ + η (x)) u (x) = 0,

u (x) = f (x),

Λ_{η} : H^{\frac{1}{2}} (\partial Ω) \to H^{- \frac{1}{2}} (\partial Ω), f (x) ∣_{\partial Ω} \to \frac{\partial u ( x )}{\partial n ( x )} ∣_{\partial Ω} .

Λ_{η} : H^{\frac{1}{2}} (\partial Ω) \to H^{- \frac{1}{2}} (\partial Ω), f (x) ∣_{\partial Ω} \to \frac{\partial u ( x )}{\partial n ( x )} ∣_{\partial Ω} .

(Λ_{η} f) (r) = \frac{\partial u}{\partial n} (r) = \int_{\partial Ω} λ_{η} (r, s) f (s) d S (s) .

(Λ_{η} f) (r) = \frac{\partial u}{\partial n} (r) = \int_{\partial Ω} λ_{η} (r, s) f (s) d S (s) .

0 = \int_{\partial Ω} \frac{\partial u}{\partial n ( y )} (y) G (x, y) d S (y) = \int_{Ω} div_{y} (\nabla_{y} u (y) \cdot G (x, y)) d y = \int_{Ω} (Δ_{y} u \cdot G + \nabla_{y} G \nabla_{y} u) d y .

0 = \int_{\partial Ω} \frac{\partial u}{\partial n ( y )} (y) G (x, y) d S (y) = \int_{Ω} div_{y} (\nabla_{y} u (y) \cdot G (x, y)) d y = \int_{Ω} (Δ_{y} u \cdot G + \nabla_{y} G \nabla_{y} u) d y .

\int_{\partial Ω} \frac{\partial G}{\partial n ( y )} (x, y) f (y) d S (y)

\int_{\partial Ω} \frac{\partial G}{\partial n ( y )} (x, y) f (y) d S (y)

= \int_{Ω} (Δ_{y} G (x, y) \cdot u (y) + \nabla_{y} G (x, y) \nabla_{y} u (y)) d y

= \int_{Ω} (Δ_{y} G (x, y) \cdot u (y) - Δ_{y} u (y) \cdot G (x, y)) d y

= \int_{Ω} (- (- Δ_{y} + η (y)) G (x, y) \cdot u (y) + (- Δ_{y} + η (y)) u (y) \cdot G (x, y)) d y

= - u (x) .

\frac{\partial u}{\partial n} (x) = - \int_{\partial Ω} \frac{\partial ^{2} G}{\partial n ( x ) n ( y )} (x, y) f (y) d S (y), x \in \partial Ω,

\frac{\partial u}{\partial n} (x) = - \int_{\partial Ω} \frac{\partial ^{2} G}{\partial n ( x ) n ( y )} (x, y) f (y) d S (y), x \in \partial Ω,

λ_{η} (r, s) = - \frac{\partial ^{2} G}{\partial n ( r ) n ( s )} (r, s), r, s \in \partial Ω.

λ_{η} (r, s) = - \frac{\partial ^{2} G}{\partial n ( r ) n ( s )} (r, s), r, s \in \partial Ω.

G = (L_{0} - E)^{- 1} = G_{0} + G_{0} E G_{0} + G_{0} E G_{0} E G_{0} + \dots .

G = (L_{0} - E)^{- 1} = G_{0} + G_{0} E G_{0} + G_{0} E G_{0} E G_{0} + \dots .

μ (r, s) := (λ_{η} - λ_{0}) (r, s) = - \frac{\partial ^{2} ( G - G _{0} )}{\partial n ( r ) \partial n ( s )} (r, s) \approx \int_{Ω} (\frac{\partial G _{0}}{\partial n ( r )} (r, p) \frac{\partial G _{0}}{\partial n ( s )} (p, s)) η (p) d p,

μ (r, s) := (λ_{η} - λ_{0}) (r, s) = - \frac{\partial ^{2} ( G - G _{0} )}{\partial n ( r ) \partial n ( s )} (r, s) \approx \int_{Ω} (\frac{\partial G _{0}}{\partial n ( r )} (r, p) \frac{\partial G _{0}}{\partial n ( s )} (p, s)) η (p) d p,

μ ((r_{1}, Z), (s_{1}, Z)) \approx \int_{Ω} \frac{\partial G _{0}}{\partial n ( r )} ((r_{1}, Z), (x, z)) \frac{\partial G _{0}}{\partial n ( s )} ((s_{1}, Z), (x, z)) η (x, z) d x d z .

μ ((r_{1}, Z), (s_{1}, Z)) \approx \int_{Ω} \frac{\partial G _{0}}{\partial n ( r )} ((r_{1}, Z), (x, z)) \frac{\partial G _{0}}{\partial n ( s )} ((s_{1}, Z), (x, z)) η (x, z) d x d z .

μ (m, h) := μ ((m + h, Z), (m - h, Z)) \approx \int_{Ω} K (m, h, x, z) η (x, z) d x d z,

μ (m, h) := μ ((m + h, Z), (m - h, Z)) \approx \int_{Ω} K (m, h, x, z) η (x, z) d x d z,

K (m, h, x, z) := \frac{\partial G _{0}}{\partial n} ((m + h, Z), (x, z)) \frac{\partial G _{0}}{\partial n} ((m - h, Z), (x, z)) .

K (m, h, x, z) := \frac{\partial G _{0}}{\partial n} ((m + h, Z), (x, z)) \frac{\partial G _{0}}{\partial n} ((m - h, Z), (x, z)) .

G_{0} (p, q) = ℓ \in Z^{2} \sum (Γ (p - q + (ℓ_{1}, 2 ℓ_{2} Z)) - Γ (p - q^{*} + (ℓ_{1}, 2 ℓ_{2} Z))),

G_{0} (p, q) = ℓ \in Z^{2} \sum (Γ (p - q + (ℓ_{1}, 2 ℓ_{2} Z)) - Γ (p - q^{*} + (ℓ_{1}, 2 ℓ_{2} Z))),

\frac{\partial G _{0}}{\partial n} ((m \pm h, Z), (x, z)) = \frac{\partial G _{0}}{\partial n} ((\pm h, Z), (x - m, z)) .

\frac{\partial G _{0}}{\partial n} ((m \pm h, Z), (x, z)) = \frac{\partial G _{0}}{\partial n} ((\pm h, Z), (x - m, z)) .

\frac{\partial G _{0, \pm h, z}}{\partial n} (x - m) := \frac{\partial G _{0}}{\partial n} ((\pm h, Z), (x - m, z),

\frac{\partial G _{0, \pm h, z}}{\partial n} (x - m) := \frac{\partial G _{0}}{\partial n} ((\pm h, Z), (x - m, z),

k_{h, z} (m) := \frac{\partial G _{0, + h, z}}{\partial n} (m) \frac{\partial G _{0, - h, z}}{\partial n} (m),

μ_{h} (m) \approx \int_{- Z}^{Z} (k_{h, z} * η_{z}) (m) d z = \int_{- (Z - δ)}^{Z - δ} (k_{h, z} * η_{z}) (m) d z,

μ_{h} (m) \approx \int_{- Z}^{Z} (k_{h, z} * η_{z}) (m) d z = \int_{- (Z - δ)}^{Z - δ} (k_{h, z} * η_{z}) (m) d z,

k_{h, z} (m) \approx \hat{h} \sum \overset{z}{^} \sum R_{h, \hat{h}} k_{\hat{h}, \overset{z}{^}} (m) R_{z, \overset{z}{^}},

k_{h, z} (m) \approx \hat{h} \sum \overset{z}{^} \sum R_{h, \hat{h}} k_{\hat{h}, \overset{z}{^}} (m) R_{z, \overset{z}{^}},

μ_{h} (m) \approx \hat{h} \sum R_{h, \hat{h}} (\overset{z}{^} \sum k_{\hat{h}, \overset{z}{^}} * (\int_{- (Z - δ)}^{Z - δ} R_{z, \overset{z}{^}} η_{z} d z)) (m) .

μ_{h} (m) \approx \hat{h} \sum R_{h, \hat{h}} (\overset{z}{^} \sum k_{\hat{h}, \overset{z}{^}} * (\int_{- (Z - δ)}^{Z - δ} R_{z, \overset{z}{^}} η_{z} d z)) (m) .

\tilde{η}_{\overset{z}{^}} (x) := \int_{- (Z - δ)}^{Z - δ} R_{z, \overset{z}{^}} η_{z} (x) d z;

\tilde{η}_{\overset{z}{^}} (x) := \int_{- (Z - δ)}^{Z - δ} R_{z, \overset{z}{^}} η_{z} (x) d z;

\tilde{μ}_{\hat{h}} (m) := (\overset{z}{^} \sum k_{\hat{h}, \overset{z}{^}} * \tilde{η}_{\overset{z}{^}}) (m);

\tilde{μ}_{\hat{h}} (m) := (\overset{z}{^} \sum k_{\hat{h}, \overset{z}{^}} * \tilde{η}_{\overset{z}{^}}) (m);

μ_{h} (m) = \hat{h} \sum R_{h, \hat{h}} \tilde{μ}_{\hat{h}} (m) .

μ_{h} (m) = \hat{h} \sum R_{h, \hat{h}} \tilde{μ}_{\hat{h}} (m) .

μ_{h} (m) \approx \hat{h} \sum R_{h, \hat{h}} (\overset{z}{^} \sum k_{\hat{h}, \overset{z}{^}} * (z \sum R_{z, \overset{z}{^}} η_{z})) (m) .

μ_{h} (m) \approx \hat{h} \sum R_{h, \hat{h}} (\overset{z}{^} \sum k_{\hat{h}, \overset{z}{^}} * (z \sum R_{z, \overset{z}{^}} η_{z})) (m) .

μ \approx K η,

μ \approx K η,

η \approx (K^{T} K + ε I)^{- 1} K^{T} μ,

η \approx (K^{T} K + ε I)^{- 1} K^{T} μ,

(K^{T} μ)_{z} (x) \approx \overset{z}{^} \sum R_{z, \overset{z}{^}} (\overset{z}{^} \sum k_{\hat{h}, \overset{z}{^}} * (h \sum R_{h, \hat{h}} μ_{h})) (x) .

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Full text

Solving Electrical Impedance Tomography with Deep Learning

Yuwei Fan , Lexing Ying Department of Mathematics, Stanford University, Stanford, CA 94305. Email: [email protected] of Mathematics and ICME, Stanford University, Stanford, CA 94305. Email: [email protected]

Abstract

This paper introduces a new approach for solving electrical impedance tomography (EIT) problems using deep neural networks. The mathematical problem of EIT is to invert the electrical conductivity from the Dirichlet-to-Neumann (DtN) map. Both the forward map from the electrical conductivity to the DtN map and the inverse map are high-dimensional and nonlinear. Motivated by the linear perturbative analysis of the forward map and based on a numerically low-rank property, we propose compact neural network architectures for the forward and inverse maps for both 2D and 3D problems. Numerical results demonstrate the efficiency of the proposed neural networks.

Keywords: Dirichlet-to-Neumann map; Electrical impedance tomography; Forward problem; Inverse problem; Neural networks; BCR-Net; Convolutional neural network.

1 Introduction

Electrical impedance tomography (EIT) is the problem of determining the electrical conductivity distribution of an unknown medium by making voltage and current measurements at the boundary of the object. As a radiation-free imaging technique, EIT allows repeated, non-invasive measurements of regional changes in the object; thus it has been used as a monitoring tool in a variety of applications in critical care medicine, for instance, monitoring of ventilation distribution [52], assessment of lung overdistension [41] and detection of pneumothorax[18], and many industrial applications [53].

Background.

At the center of the mathematical formulations of EIT is the Dirichlet-to-Neumann (DtN) map, a critical object in the analysis of elliptic partial differential equations that plays a significant role in the classical Calderón problem [14, 50, 8].

The governing equation of EIT, or equivalently the inverse conductivity problem, is

[TABLE]

where $\Omega$ is a bounded Lipschitz domain, $\phi(x)$ is the voltage, $\gamma(x)>0$ is the conductivity distribution, and $\psi(x)$ is the voltage applied on the boundary. The corresponding DtN map is defined by

[TABLE]

where $n(x)$ is the outer normal vector. Here $H^{\frac{1}{2}}(\partial\Omega)$ is the space of $L^{2}(\partial\Omega)$ functions that are traces of functions in $H^{1}(\Omega)$ and $H^{-\frac{1}{2}}(\partial\Omega)$ is its dual. We refer the readers to [48] for more details of the DtN map.

A closely related inverse conductivity problem involves the DtN map of the Schrödinger equation at zero energy [48], which takes the following form

[TABLE]

The DtN map for the Schrödinger equation is then defined by

[TABLE]

These two DtN maps $\Lambda_{\eta}$ and $\Lambda_{\gamma}$ are closely related. If $\phi$ is the solution of 1.1, then $u=\sqrt{\gamma}\phi$ is the solution of 1.2 with $\eta=\frac{\Delta\sqrt{\gamma}}{\sqrt{\gamma}}$ and $f=\sqrt{\gamma}\psi$ . Moreover, $\Lambda_{\eta}=\gamma^{-1/2}\Lambda_{\gamma}\gamma^{-1/2}+\dfrac{1}{2\gamma}\dfrac{\partial{\gamma}}{\partial{n}}$ . Actually, the two maps $\Lambda_{\eta}$ and $\Lambda_{\gamma}$ carry the same information and they can be determined from each other [48]. This paper shall focus on the DtN map $\Lambda_{\eta}$ for the Schrödinger equation. All the results can be extended to the DtN map $\Lambda_{\gamma}$ without many difficulties.

Since the DtN map $\Lambda_{\eta}$ is linear [8] for a fixed $\eta$ , there exists a distribution kernel $\lambda_{\eta}(r,s)$ for $r,s\in\partial\Omega$ such that

[TABLE]

The forward problem for the DtN map is that, given $\eta(x)$ , to solve for the kernel $\lambda_{\eta}(r,s)$ , i.e., $\eta\to\lambda_{\eta}$ .

The task of the inverse problem is to recover $\eta(x)$ in $\Omega$ based on the observation data, which is typically a collection of pairs $(f,\Lambda_{\eta}f)$ of the Dirichlet boundary condition $f$ and the corresponding Neumann data $\Lambda_{\eta}f$ . Under the assumption that the Dirichlet boundary condition is sufficiently sampled, it is possible to assume that the kernel $\lambda_{\eta}$ is known and, therefore, the inverse problem is to recover $\eta$ from $\lambda_{\eta}$ , i.e., $\lambda_{\eta}\to\eta$ . Since $\lambda_{\eta}(r,s)\mid_{r,s\in\partial\Omega}$ is a function of $2(d-1)$ variables while $\eta(x)$ is a function of $d$ variables, the inverse problem is not solvable if $d=1$ due to a simple dimension counting. For $d\geq 2$ , in principle, the solution of the inverse problem exists and is unique under certain conditions [51]. However, due to the elliptic nature of EIT, the inverse problem is severely ill-conditioned [2, 3, 4, 12] even for $d\geq 2$ .

Numerical solution of the forward and inverse problems can be challenging. The forward problem is a map from a $d$ dimensional function to a $2(d-1)$ dimensional function. For 3D problems, computing and representing the whole DtN map $\Lambda_{\eta}$ for a fixed $\eta$ can be quite expensive. For the inverse problem, the inverse map $\Lambda_{\eta}\to\eta$ is numerically unstable [2, 3, 4, 12] due to ill-conditioning. In order to avoid instability, an application-dependent regularization term is often required in order to stabilize the inverse problem, see, for instance, [30, 16, 33]. Algorithmically, the inverse problem is usually solved with iterative methods [30, 27, 11, 12], which often requires a significant number of iterations.

In the last few years, deep neural networks (DNNs) have achieved great successes in computer vision, image processing, speech recognition, and many other artificial intelligence applications [31, 37, 26, 43, 39, 47, 38, 46]. More recently, methods based on DNNs have also been applied to solving PDEs [34, 9, 28, 23, 22, 6, 44, 21, 36]. These attempts can be classified into two categories. The first category [45, 15, 28, 35, 20] aims to represent the solutions of high-dimensional PDEs with DNNs (rather than the classical methods such as finite element and finite difference methods). The second category [42, 29, 34, 23, 22, 21, 36, 40, 7] works with parameterized PDE problems and uses the DNNs to represent the map from the high-dimensional parameters of the PDE to the solution of the PDE.

Contributions.

Deep neural networks have several advantages when applied to solve the forward and inverse problems. For the forward problem, since applying neural network to input data can be carried out rapidly due to novel software and hardware architectures, the forward problem can be significantly accelerated when the forward map is represented with a DNN. For the inverse problem, the choices of the solution algorithm and the regularization term are two critical issues. Fortunately, deep neural networks can help in both aspects. First, concerning the solution algorithm, due to its flexibility in representing high-dimensional functions, DNN can potentially be used to approximate the full inverse map, thus avoiding the iterative solution process. Second, concerning the regularization term, recent work in machine learning shows that DNNs often can automatically extract features from the data and offer a data-driven regularization prior.

This paper applies the deep learning approach to the EIT problem by representing the inverse map from $\Lambda_{\eta}$ to $\eta$ using a novel neural network architecture. The motivation of the new architecture comes from a perturbative analysis of the linear approximation of both the forward and inverse maps of the EIT problem. The analysis shows that the maps between $\eta$ and $\Lambda_{\eta}$ are locally numerically low-rank after a reparameterization of the DtN map $\Lambda_{\eta}$ . This observation allows us to reduce the map between the $d$ -dimensional $\eta$ and $2(d-1)$ -dimensional $\Lambda_{\eta}$ to a map between two (quasi) $(d-1)$ -dimensional functions. Being translation-invariant and global, this new map is represented with the recently proposed BCR-Net [10], which is a multiscale neural network based on the nonstandard form of the wavelet decomposition. This neural network architecture is used to approximate both the forward and inverse maps. For the test problems being considered, the resulting neural networks have only $10^{4}\sim 10^{5}$ parameters for the 2D case and $10^{5}\sim 10^{6}$ parameters for the 3D case, thanks to the dimension reduction and the compact structure of the BCR-Net. The rather small number of parameters allow for training on rather limited data sets, which are often the case for EIT problems.

Organization.

This rest of the paper is outlined as follows. The mathematical background on the DtN map is studied in Section 2. The design and architecture of the DNNs of the forward and inverse maps for the 2D case are discussed in Section 3, along with numerical tests. The result is extended to the 3D case in Section 4.

2 Mathematical analysis of the DtN map

This section summarizes the necessary mathematical background of the DtN map. Let us denote $\mathcal{L}=-\Delta+\eta$ and $\mathcal{G}=\mathcal{L}^{-1}$ with $\mathcal{G}f(x)=\int_{\Omega}G(x,y)f(y)\,\mathrm{d}y$ , where $G$ is the Green function of the operator $\mathcal{L}$ with the Dirichlet boundary condition. An application of the divergence theorem shows that

[TABLE]

Analogously, a second application of the divergence theorem to the above result leads to

[TABLE]

Here the last equality uses the fact that $G$ is the Green function of $\mathcal{L}=-\Delta+\eta$ and $u$ is the solution of 1.2. Taking the normal derivative of two sides of 2.2 with respect to $n(x)$ for $x\in\partial\Omega$ gives rise to

[TABLE]

which describes the kernel of the DtN map $\Lambda_{\eta}(\psi)$ in terms of the Green function $G$ :

[TABLE]

In order to avoid confusion, we use $r,s$ to represent the points on the boundary and $p,q$ for the points in the domain hereafter.

In order to understand how the DtN map depends on the potential $\eta$ , we conduct a perturbative analysis of the map from $\eta$ to $\lambda_{\eta}$ for $\eta>0$ close to a fixed $\eta_{0}$ . For simplicity, assume $\eta_{0}=0$ . Let us introduce $\mathcal{E}=-\eta\mathcal{I}$ with $\mathcal{I}$ the identity operator, $\mathcal{L}_{0}=-\Delta$ , and $\mathcal{G}_{0}=\mathcal{L}_{0}^{-1}$ (with kernel denoted by $G_{0}$ ) as the Green function of $\mathcal{L}_{0}$ with the Dirichlet boundary condition. When $\eta>0$ is sufficiently small, $\mathcal{G}$ can be expanded via a Neumann series

[TABLE]

By introducing $\lambda_{0}(r,s)=\lambda_{\eta}(r,s)\mid_{\eta=\eta_{0}}$ , which can be calculated by the knowledge of the background case $\eta=\eta_{0}$ , it is equivalent to focus on the difference $\lambda_{\eta}-\lambda_{0}$ (often called difference imaging, see [13] for details), which is also the kernel of $\mathcal{G}-\mathcal{G}_{0}$ . For a sufficiently small $\eta$ , the operator $\mathcal{G}-\mathcal{G}_{0}$ can be approximated by its first term $\mathcal{G}_{0}\mathcal{E}\mathcal{G}_{0}$ , which is linear in $\mathcal{E}$ . Using the fact that $\mathcal{E}=-\eta\mathcal{I}$ leads to the following approximation for the difference DtN map $\mu$ ,

[TABLE]

which serves as the motivation of the design of the NN architectures.

3 Neural network for the 2D case

Consider the domain $\Omega=[0,1]\times[-Z,Z]$ where $Z$ is a fixed constant. The periodic boundary condition is specified at the left and right boundaries for simplicity. As illustrated in Fig. 1, the electrodes are allowed to be placed on either only the top boundary (one-sided detection) or both the top and bottom boundaries (two-sided detection). For the one-sided detection, the zero Dirichlet boundary condition is assumed at the bottom for simplicity, though other boundary conditions are also relevant. In what follows, we shall first consider the forward and inverse maps for the one-sided detection. The architecture is then extended to the two-sided detection case.

In most of the EIT problems, the electrical conductivity is known near the domain boundary. This implies that there exists a constant $\delta>0$ such that $\eta(p)$ is supported in $[0,1]\times[-(Z-\delta),Z-\delta]$ .

3.1 Forward map for the one-sided detection

For the one-sided detection, the DtN map is limited on the top boundary. Let $r=(r_{1},Z)$ , $s=(s_{1},Z)$ and $p=(x,z)$ , where $x$ is for the horizontal coordinate and $z$ is the depth coordinate. The map 2.6 can be rewritten as

[TABLE]

Note that $p$ and $q$ are used for the points in the domain $\Omega$ , $r$ and $s$ for the points on the boundary, and $x$ and $z$ for the horizontal and depth coordinates.

A key step for both analysis and architecture design is to introduce new horizontal variables $m$ and $h$ such that $r_{1}=m+h$ and $s_{1}=m-h$ . Reparameterizing the difference DtN map $\mu$ with the new variable yields

[TABLE]

with the kernel $K$ given by

[TABLE]

Here $n=(0,1)$ and $\dfrac{\partial{G_{0}}}{\partial{n}}(\cdot,\cdot)$ is the directional derivative of $G_{0}$ in the first variable. Noticing that $G_{0}$ is the Green function of the operator $-\Delta$ on the domain $\Omega$ with the periodic boundary condition on left and right and the Dirichlet boundary condition on top and bottom, one can write down $G_{0}$ explicitly as [25]

[TABLE]

where $q^{\ast}=(q_{1},2Z+q_{2})$ and $\Gamma$ is the Green function of the operator $-\Delta$ on the whole space $\mathbb{R}^{2}$ . Since $G_{0}$ as the Green function for the case $\eta=\eta_{0}$ is translation-invariant in the horizontal direction,

[TABLE]

For the rest of the discussion, it is convenient to treat $h$ and $z$ as parameters and introduce

[TABLE]

With the new notations, 3.2 can be reformulated as

[TABLE]

where the convolution is in $m$ . The last equality holds due to the consideration that $\eta$ is supported between $-(Z-\delta)$ and $Z-\delta$ in the depth direction.

Low-rank approximation and dimension reduction.

A key observation is that

the kernel $k_{h,z}(m)$ is smooth in $h$ for $h\in[0,1]$ and $z\in(-(Z-\delta),Z-\delta)$ .

An inspection of the definition of $K$ in 3.3 shows that $k_{h,z}(m)$ is only singular when $z=Z$ . Therefore, the kernel $k_{h,z}(m)$ is uniformly smooth for $h\in[0,1]$ , $m\in[0,1]$ , and $z\in(-(Z-\delta),Z-\delta)$ . The smoothness in the $h$ and $z$ variable indicates that $k_{h,z}(m)$ can be well-approximated in $h$ and $z$ by an approximation scheme with a small number of terms. To simplify the discussion, assume without loss of generality that a stable interpolation scheme (such as Chebyshev interpolation) is adopted. By denoting the sets of interpolation points in the $h$ variable and $z$ variable as $\{\hat{h}\}$ and $\{\hat{z}\}$ , such an interpolation reads

[TABLE]

where $R_{h,\hat{h}}$ and $R_{z,\hat{z}}$ are the interpolation operators in the $h$ and $z$ variables, respectively.

This approximation for $k_{h,z}$ naturally implies an approximation for 3.6

[TABLE]

Algorithmically, this approximation allows one to factorize the forward map into three steps:

Compress the two-dimensional function $\eta_{z}=\eta(x,z)$ to a set of one-dimensional function

[TABLE] 2. 2.

Convolve with $k_{\hat{h},\hat{z}}$ in the one-dimensional space to obtain

[TABLE] 3. 3.

Interpolate the set of one-dimensional functions $\tilde{\mu}_{\hat{h}}(m)$ to a two-dimensional function

[TABLE]

This effectively reduces the forward map to a number of 1D convolutions. This dimension reduction in 3.8 is fundamental in the construction of the neural network.

Remark 1.

The assumption that $\eta(p)$ is supported in $[0,1]\times[-(Z-\delta),Z-\delta]$ can be removed. Actually, we can split $[-Z,Z]$ into three intervals $[-Z,-(Z-\delta)]$ , $[-(Z-\delta),Z-\delta]$ and $[Z-\delta,Z]$ with $\delta\ll Z$ , and then study the property of the kernel $k_{h,z}(m)$ restricted to each interval one by one. Since $\delta\ll Z$ , the low-rank approximation 3.8 is still valid.

Discretization.

The analysis till now is in the continuous setting. A simple numerical treatment discretizes the domain $\Omega$ by a uniform Cartesian grid, with the Laplacian approximated by a $5$ -point central difference scheme and the directional derivative on the boundary replaced by the one-sided first-order difference. The numerical Green function is defined to be the inverse of the discrete Laplacian operator with zero boundary condition. Let $N_{r}$ be the number of electrodes. The DtN map is evaluated by solving 1.2 $N_{r}$ times with $f(x)$ set as a delta function at one electrode each time. With a slight abuse of notation, the same letters are used to denote the continuous kernels and their discretizations. The discrete version of 3.8 reads

[TABLE]

Neural network architecture.

The perturbative analysis shows that, if $\eta>0$ is sufficiently small, the forward map $\eta\to\mu$ can be approximated by 3.9. As detailed below, the three steps of computing 3.9 can be naturally formulated as a neural network with three modules:

•

an encoding module that compresses the two-dimensional data $\eta$ to a set of one-dimensional data $\tilde{\eta}_{\hat{z}}$ ;

•

an intermediate module that convolves $k_{\hat{h},\hat{z}}$ with the one-dimensional data $\tilde{\eta}_{\hat{z}}$ to obtain $\tilde{\mu}_{\hat{h}}$ ;

•

a decoding module that extends the set of one-dimensional data $\tilde{\mu}_{\hat{h}}$ to two-dimensional data $\mu$ .

When $\eta$ fails to be sufficiently small, the linear approximation for the forward map $\eta\to\mu$ is not accurate. In order to extend the neural network of 3.9 to the nonlinear case, a straightforward solution is to include nonlinear activation functions and increase the number of layers, for instance in [23, 21]. For simplicity, we assume that the size $N_{\hat{z}}$ of $\{\hat{z}\}$ and the size $N_{\hat{h}}$ of $\{\hat{h}\}$ are both equal to a constant parameter $c$ .

The resulting neural network architecture for the forward map for the one-sided detection is summarized in Algorithm 1 and illustrated in Fig. 2. Let us explain these three components of the neural network one by one.

•

Encoding module. $\tilde{\eta}={{\sf{Encoding}}}[c](\eta)$ compresses the data $\eta\in\mathbb{R}^{N_{x}\times N_{z}}$ to $\tilde{\eta}\in\mathbb{R}^{N_{x}\times c}$ by compressing only in the $z$ -dimension. It can be implemented with a one-dimensional convolutional layer Conv1d with window size $1$ and channel number $c$ by taking the second dimension of $\eta$ as channels. The linear activation function is sufficient for the Conv1d layer used here.

•

Intermediate module. Since the kernel $k_{\hat{h},\hat{z}}$ for the linear case in 3.9 is a convolution, it can be implemented by a one-dimensional convolutional layer Conv1d with window size $N_{x}$ , channel number $c$ and linear activation function. For nonlinear case, a natural extension is to use multiple convolution layers and to add a nonlinear activation function such as a rectified-linear unit (ReLU) function after each layer.

For problems with fine discretizations, a convolution layer with window size $N_{x}$ might have many parameters. Recently, several multiscale NNs with fewer parameters have been proposed as an efficient alternative to full-width convolution layers. Examples include the ones based on hierarchical matrices in [23, 22] and the BCR-Net [21]. Here, the BCR-Net is used to represent the intermediate module. BCR-Net is motivated by the data-sparse nonstandard wavelet representation of the pseudo-differential operators [10]. It processes the information at different scale separately and each scale can be understood as a local convolutional neural network. The one-dimensional $\tilde{\mu}={\text{{\sf{BCR-Net}}}}{{\sf{1d}}}[c,n_{\mathrm{cnn}}](\tilde{\eta})$ maps $\tilde{\eta}\in\mathbb{R}^{N_{x}\times c}$ to $\tilde{\mu}\in\mathbb{R}^{N_{x}\times c}$ , where the number of channels and layers in the local convolutional neural network in each scale are $c$ and $n_{\mathrm{cnn}}$ , respectively. The readers are referred to [21] for more details on the BCR-Net.

•

Decoding module. $\mu={{\sf{Decoding}}}[N_{h}](\tilde{\mu})$ decodes the set of one-dimensional data $\tilde{\mu}\in\mathbb{R}^{N_{m}\times c}$ to the two-dimensional data $\mu\in\mathbb{R}^{N_{m}\times N_{h}}$ .

In the implementation, this decoding module is implemented by the one-dimensional convolutional layer Conv1d with window size $1$ , channel number $N_{h}$ , and linear activation function.

3.2 Inverse map for the one-sided detection

The perturbative analysis shows that if $\eta$ is sufficiently small, the forward map can be well-approximated by

[TABLE]

which is the operator notation of the discretization 3.2. Here, $\eta$ is a vector indexed by $(x,z)$ , $\mu$ is indexed by $(m,h)$ , and $K$ is a matrix with rows indexed by $(m,h)$ and columns indexed by $(x,z)$ . The usual filtered back-projection algorithm [32] takes the form

[TABLE]

where $\varepsilon$ is a regularization parameter.

Following the above discussion, the dimension reduction approximation applied to $K$ is also valid for $K^{\mathsf{T}}$

[TABLE]

As a result, one obtains a similar three-step algorithm for applying $K^{\mathsf{T}}$ to $\mu$ and this algorithm can also be formulated as a neural network with three modules:

•

Encode from $\mu$ to $\tilde{\mu}_{\hat{h}}=\sum_{h}R_{h,\hat{h}}\mu_{h}$ .

•

Convolve to form $\tilde{\eta}_{\hat{z}}=\sum_{\hat{h}}k_{\hat{h},\hat{z}}*\tilde{\mu}_{\hat{h}}$ .

•

Decode from $\tilde{\eta}_{\hat{z}}$ to $(K^{\mathsf{T}}\mu)_{z}=\sum_{\hat{z}}R_{z,\hat{z}}\tilde{\eta}_{\hat{z}}$

The part $(K^{\mathsf{T}}K+\varepsilon I)^{-1}$ can be viewed as a post-processing of $K^{\mathsf{T}}\mu$ . The definition of $K$ 3.3 implies that the operator $(K^{\mathsf{T}}K+\varepsilon I)$ is a convolution operator. As a deconvolution operator, $(K^{\mathsf{T}}K+\varepsilon I)^{-1}$ can also be implemented with a convolution neural network.

Combining these two components suggests that for the inverse map a suitable architecture is the NN architecture of the forward map followed by a 2d convolutional neural network. The resulting neural network architecture for the inverse map is outlined in Algorithm 2 and illustrated in Fig. 3. The layers in Algorithm 2 share the same definitions as those in Algorithm 1 except the ${{\sf{CNN2d}}}$ layer, which is defined as follows.

•

Post-processing module. $\eta={{\sf{CNN2d}}}[w,n_{\mathrm{cnn}2}](\bar{\eta})$ that maps $\bar{\eta}\in\mathbb{R}^{N_{x}\times N_{z}}$ to $\eta\in\mathbb{R}^{N_{x}\times N_{z}}$ is a two-dimensional convolutional neural network with $n_{\mathrm{cnn}2}$ convolutional layers and $w$ as the window size. ReLU is used as the activation function for all intermediate layers. However, as $\eta$ can take any real number, the last layer uses a linear activation function.

3.3 Inverse map for the two-sided detection

For the two-sided detection, the electrodes are placed on both the top and bottom boundaries. The DtN map hence contains four parts: top-to-top (T2T), top-to-bottom (T2B), bottom-to-top (B2T), and bottom-to-bottom (B2B). Since the top boundary corresponds to $z=Z$ and the bottom corresponds to $z=-Z$ , the superscripts $+$ and $-$ are used to identify the top and bottom boundaries, respectively. Following the derivation for the one-sided detection, when $\eta$ is sufficiently small one can approximate the linearized map from $\eta$ to $\mu^{\pm\pm}$ as

[TABLE]

where the first and second $\pm$ in $\mu^{\pm\pm}$ corresponds to first and second $\pm$ on the right hand side, respectively. After the discretization, the vector form reads

[TABLE]

Following the discussion in Section 3.1, one can factorize each of the four components $K^{\pm\pm}$ using dimension reduction into three steps. Hence, the forward map $\eta\to\mu^{\pm\pm}$ can be split into four independent forward problems for the one-sided detection, and we shall not repeat the study here.

3.3.1 Architecture for the inverse map

When $\eta$ is small, the filtered back-projection algorithm for the inverse problem from $\mu^{\pm\pm}$ to $\eta$ takes the form

[TABLE]

Following the discussion in Section 3.2, an NN architecture for the inverse map of the two-sided detection would be to repeat the main part of Algorithm 2 expect the post-processing module for each of $\mu^{++},\mu^{+-},\mu^{-+},\mu^{--}$ , and then to sum the results together, and to apply the post-processing at last.

Due to the nonlinearity of the inverse problem, a slightly different approach gives better performance. Instead of summing the decoded results, one combines, before the decoding step, the results of $\mu^{++},\mu^{+-},\mu^{-+},\mu^{--}$ into a single array of size $N_{x}\times 4c$ and then perform a decoding step together. The resulting neural network architecture is detailed in Algorithm 3 and also illustrated in Fig. 4. The modules ${{\sf{Encoding}}}$ , ${\text{{\sf{BCR-Net}}}}{{\sf{1d}}}$ , ${{\sf{Encoding}}}$ and ${{\sf{CNN2d}}}$ are same as those in Algorithm 2. The only new layers are the ${{\sf{Concatenate}}}$ layer: $\eta\leftarrow{{\sf{Concatenate}}}(\eta_{1},\eta_{2},\eta_{3},\eta_{4})$ , which concatenates the matrices $\eta_{i}\in\mathbb{R}^{N_{x}\times c}$ , $i=1,2,3,4$ to a matrix with size $\eta\in\mathbb{R}^{N_{x}\times 4c}$ on the column direction, and the ${{\sf{Conv1d}}}[c,w]$ layer: one-dimensional convolutional layer ${{\sf{Conv1d}}}$ with channel number $c$ and window size $w$ .

Let us denote $\tilde{\eta}^{\pm,\pm}$ by $\tilde{\eta}_{i}$ , $i=1,2,3,4$ in Algorithm 3, respectively. Due to the symmetry of the domain $\Omega$ , the map $\mu^{+,+}\to\tilde{\eta}^{+,+}$ is mirror symmetry to the map $\mu^{-,-}\to\tilde{\eta}^{-,-}$ . Hence, the B2B part should share the weights with the T2T part. In the implementation, one can use the same layers for the two maps and then flip the output for the B2B part to achieve the mirror symmetry. Analogously, the T2B part is also mirror symmetry to the B2T part. We use the same way to carry out it.

3.4 Numerical results

Below we report some numerical results of the neural network proposed above for the 2D EIT problem. The NN is implemented with Keras [17] (running on top of TensorFlow [1]). Nadam is chosen as the optimizer [19] with a step size $10^{-3}$ and the mean squared error is used as the loss function. The parameters of the network are initialized randomly from the normal distribution, and the batch size is set as two percent of the size of the training set. The number of layers in the BCR-Net is set as $n_{\mathrm{cnn}}=6$ . For the ${{\sf{CNN2d}}}$ in Algorithms 2 and 3, the number of convolutional layers is set as $n_{\mathrm{cnn}2}=6$ with window size $w=3$ . For the one-dimensional convolutional part in Algorithm 3, the number of convolutional layers is set as $n_{\mathrm{cnn}3}=3$ with window size equal to $w_{2}=3$ . The right value for the channel number $c$ will be studied for each test.

In this section, the half width of the domain $Z$ is set to be $1/4$ and the domain $\Omega$ is discretized by a $160\times 80$ Cartesian grid. Thus, $N_{x}=N_{m}=160$ , $N_{h}=40$ and $N_{z}=80$ . Both the training data and test data are generated by numerically solving 1.2. In each test, $10$ K pairs of $(\eta,\mu)$ are used to train the neural network and another $10$ K pairs are used as the test data.

For each sample of the training and test data, $\eta$ is randomly sampled and $\mu$ (or $\mu^{\pm\pm}$ ) denote the exact difference DtN kernel solved by numerical discretization of 1.2. The predictions of the NNs for the forward and inverse maps are denoted by $\mu_{NN}$ and $\eta_{NN}$ , respectively. The accuracy is measured by the relative error in the $\ell^{2}$ norm:

[TABLE]

For each experiment, the test error is then obtained by averaging 3.13 over a given set of test samples. The numerical results presented below are obtained by repeating the training process five times, using different random seeds.

3.4.1 Smooth potential case.

We first study the smooth potential case, where the potential $\eta(x)$ is assumed to take the form

[TABLE]

with $\rho=1000$ . Each matrix $\Theta^{(i)}\in\mathbb{R}^{2\times 2}$ is generated with the eigenvalues uniformly sampled in $[0.0125,0.05]$ and the eigenvectors uniformly sampled in the unit circle $\mathbb{S}^{1}$ . Two types of data sets are generated to test the neural networks.

•

Shallow inclusions. The centers of Gaussians are sampled as $c^{(i)}\in\mathcal{U}([0,1]\times[0.05,0.2])$ . This data is used to test the forward and inverse problem for the one-sided detection.

•

Deep inclusions. The centers of Gaussians are sampled as $c^{(i)}\in\mathcal{U}([0,1]\times[-0.2,0.2])$ . This data can be used to show the instability of the inverse map: the one-sided detection would fail to resolve the inverse problem well, while the two-sided detection works.

One-sided detection for shallow inclusions.

Figure 5 gives the test error and the number of parameters for both the forward and inverse maps with different values of $c$ (the channel number) and $n_{g}$ (the number of Gaussians). The NN predictions $\mu_{NN}$ and $\eta_{NN}$ along with the exact solutions are illustrated in Fig. 6. For the forward problem, the test error is relatively small even for $c=6$ , where only $28$ K parameters are used in the neural network (compared with the size $12,800=160\times 80$ of $\eta$ ). As the number of channels $c$ increases, the test error decays first and then stagnates. The choice of $c=8$ is a balance between accuracy and efficiency for this forward problem.

For the inverse problem, the test error is relatively small when $c=10$ , where $78$ K parameters are used in the neural network. Judging from the plots, the neural network produces accurate results in term of the location, the shape, and the magnitude of the Gaussians. These results demonstrate that the neural network architectures proposed in this section are capable of representing the forward and inverse maps for shallow inclusions.

One-sided detection for deep inclusions.

Fig. 7 plots the test error and number of parameters for different values of $c$ and $n_{g}$ . The predicted $\mu_{NN}$ and $\eta_{NN}$ and the reference solution of one specific test sample are presented in Fig. 8. For the forward map, the test error is comparable with the case of shallow inclusions. However, the neural network for the inverse map fails to produce a good prediction. In Fig. 8, the prediction is pretty close near the top boundary but gives significant error near the bottom boundary. This result agrees with the conclusion in [4] that the resolution near the boundary is much better than deep in the interior, which is caused by the instability of the inverse problem [3, 4, 50]. To avoid it, more information on the object must be provided, for instance, add electrodes on the bottom boundary, i.e., the two-sided detection.

Two-sided detection for deep inclusions.

As we have seen, due to the instability of the inverse problem, the one-sided detection fails to resolve the problem with deep inclusions. Here we test the neural network for the two-sided detection for deep inclusions.

Figure 9 shows the test error and number of parameters for different values of $c$ and $n_{g}$ . The NN predictions $\mu_{NN}$ and $\eta_{NN}$ along with the reference solution of the same sample in Fig. 8 are summarized in Fig. 10. The test error is significantly decreased and is even slightly less than that in Fig. 5 of the one-sided problem for the shallow inclusions. Notice that the test error is relatively small even for the case $c=10$ with $177$ K parameters in the neural network.

3.4.2 Shape reconstruction

Section 3.4.1 studied the behavior of the proposed NN architectures for smooth potentials and show that the proposed NN architectures have the ability to give a good prediction on both the forward and inverse problems. Here the focus is on shape reconstruction with noise measurement. The numerical results in Figs. 5 and 9 show that the choice of channel number $c=10$ is a good balance between accuracy and efficiency for the inverse problem. To simplify the discussion, we set $c=10$ in all the test in this subsection.

The potential $\eta(x)$ is assumed to be a piecewise constant. Four shapes are placed in $\Omega$ , where each can be a regular triangle, square, pentagon and hexagon. $\eta(x)$ is set to $1000$ in the shapes and is [math] otherwise. For each shape, the circumcircle radius is uniformly sampled in $[0.05,0.1]$ and the direction is uniformly sampled in the unit circle $\mathbb{S}^{1}$ . For the shallow inclusion case, the center the shape is uniformly sampled in $[0,1]\times[0.05,0.2]$ , while it is uniform from $[0,1]\times[-0.2,0.2]$ for the deep inclusion case.

To model the uncertainty in the measurement data, noises have been added to the DtN map in the data set by setting $\lambda_{\eta,i}^{\delta}\equiv(1+Z_{i}\delta)\lambda_{\eta,i}$ , where $Z_{i}$ is a Gaussian random variable with zero mean and unit variance and $\delta$ controls the signal-to-noise ratio. In the following tests, the noise level is chosen as $\delta=0,0.5\%$ and $1\%$ . In the numerical experiments, an independent NN is trained and tested with the noise data set $\{\mu_{i}^{\delta},\eta\}$ with $\mu_{i}^{\delta}\equiv\lambda_{\eta,i}^{\delta}-\lambda_{0}$ for each noise level. It is worth noting that the mean of $\frac{\|\lambda_{\eta}-\lambda_{0}\|}{\|\lambda_{\eta}\|}$ for all the samples for the shallow or deep inclusions cases are both about $10\%$ and hence the signal-to-noise ratio for the difference $\mu$ is about $10\cdot\delta$ .

One-sided detection for shallow inclusions

Figure 11 shows a sample in the test data for different noise level $\delta$ . When there is no noise in the measurement data, the NN provides a good prediction of the potential $\eta$ in terms of both shape and position. As is noted in Section 3.4.1, since the inverse problem is ill-posed, the resolution near the boundary is accurate while the shape boundaries deep in the interior is blurry. For the same reason, when there is noise in the measurement data, the shape boundaries become more blurry as the noise level grows. However, the positions and number of shapes are correctly predicted.

Two-sided detection for deep inclusions

Figure 12 presents a sample from the test data for different noise levels $\delta$ for the deep inclusion. The conclusions for the one-sided detection for shallow inclusions still hold for this case. Comparing the results in Figs. 11 and 12, one finds that the NN for the two-sided detection gives a better prediction for the shapes in the middle (i.e. close to $z=0$ ). This agrees with the fact that the two-sided detection utilizes more information (not only $\mu^{++}$ and $\mu^{--}$ , but also $\mu^{+-}$ and $\mu^{-+}$ ).

All the numerical tests show that the NNs in Algorithms 1, 2 and 3 are capable of learning the forward and inverse problem of EIT for various setups.

4 Neural network for 3D the case

For the 3D case, the domain is assumed to be $\Omega=[0,1]\times[0,1]\times[-Z,Z]$ . The periodic boundary condition is applied on the left, right, front, and back for simplicity. Similar to the 2D case, the electrodes are allowed to be placed on either the top (one-sided detection) or both top and bottom (two-sided detection), as shown in Fig. 13. For the one-sided detection, the zero Dirichlet boundary condition is applied on the bottom boundary.

4.1 Analysis and NN architecture

Most of the 2D analysis for the maps $\eta\to\mu$ and $\mu\to\eta$ can be extended to the 3D case. The main difference is that the data $\mu$ is a four-dimensional function while the potential $\eta$ is a three-dimensional function. Below we will first study the extension for the one-sided detection and then briefly discuss the two-sided detection.

One-sided detection.

The DtN map for the one-sided detection is limited on the top boundary. Let $r=(r_{1},r_{2},Z)$ , $s=(s_{1},s_{2},Z)$ , and $p=(x,y,z)$ , where $x,y$ are for the horizontal directions and $z$ is for the depth direction. The map 2.6 for the 3D case can be written as

[TABLE]

Introducing the new variables $m=(m_{1},m_{2})$ and $h=(h_{1},h_{2})$ with $r_{1}=m_{1}+h_{1}$ , $r_{2}=m_{2}+h_{2}$ , $s_{1}=m_{1}-h_{1}$ and $s_{2}=m_{2}-h_{2}$ yields

[TABLE]

where

[TABLE]

Applying the same argument for the 2D case shows that the factorization of $K$ in 3.7 can be extended to the 3D case. More precisely, the Green function $G_{0}$ can be directly obtained as

[TABLE]

where $q^{\ast}=(q_{1},q_{2},2Z+q_{3})$ and $\Gamma$ is the Green function of the operator $-\Delta$ on the whole space $\mathbb{R}^{3}$ . By taking $h=(h_{1},h_{2})$ and $z$ as parameters and using the similar notations as the 2D case, one can reformulate 4.2 as

[TABLE]

where $*$ stands for the two-dimensional convolution with respect to the variables $m_{1}$ and $m_{2}$ . The same argument used in 2D case shows that one can factorize $k_{h,z}$ as

[TABLE]

by choosing proper interpolating sets $\{\hat{z}\}$ and $\{\hat{h}\}$ , where $R_{h,\hat{h}}$ and $R_{z,\hat{z}}$ are the interpolation operators in the $h$ and $z$ variables, respectively.

With the help of this approximation, 4.2 can be simplified as

[TABLE]

This results the same three-step procedure for effectively approximating the forward map, with the minor differences that the interpolation over the $h$ variable is now two-dimensional and the convolution is for 2D.

Following the same reasoning for the 2D case, the NN architecture for the inverse map for the one-sided detection in Algorithm 4. Below we briefly comment on the layers used, focusing on the differences with the 2D case.

•

Encoding module. $\tilde{\mu}={{\sf{Encoding}}}{{\sf{3d}}}[c](\mu)$ compresses the data $\mu\in\mathbb{R}^{N_{1}\times N_{2}\times N_{3}}$ to $\tilde{\mu}\in\mathbb{R}^{N_{1}\times N_{2}\times c}$ locally with respect to the first and second dimensions. Similar as the 2D case, this layer can be implemented by a two-dimensional convolution ${{\sf{Conv2d}}}$ with window size $1\times 1$ and channel number $c$ by taking the third dimension of $\mu$ as the channel direction. Noticing the compressed layer is essentially a restriction operator, we only use one Conv2d layer with linear activation function.

•

${\text{{\sf{BCR-Net}}}}{{\sf{2d}}}$ . In Section 3.1, the network ${\text{{\sf{BCR-Net}}}}{{\sf{1d}}}[c,n_{\mathrm{cnn}}]$ can be treated as a compact form of the full-width convolutional layers with number of layer $n_{\mathrm{cnn}}$ and channel number $c$ . Here the ${\text{{\sf{BCR-Net}}}}{{\sf{2d}}}[c,n_{\mathrm{cnn}}]$ is the compact form of the full-width convolutional layers with number of layer $n_{\mathrm{cnn}}$ and channel number $c$ . We refer readers to [10] for more details.

•

Decoding module. $\eta={{\sf{Decoding}}}{{\sf{3d}}}[N_{z}](\tilde{\eta})$ extends the set of two-dimensional data $\tilde{\eta}\in\mathbb{R}^{N_{1}\times N_{2}\times c}$ to the three-dimensional data $\eta\in\mathbb{R}^{N_{1}\times N_{2}\times N_{3}}$ . This module can be implemented by the two-dimensional convolutional layer Conv2d with window size $1\times 1$ and channel number $N_{z}$ by one layer with linear activation function.

•

${{\sf{CNN3d}}}$ . In Section 3.2, $\eta={{\sf{CNN2d}}}[w,n_{\mathrm{cnn}}](\bar{\eta})$ is a post-processing module on the output of the decoding. $\eta={{\sf{CNN3d}}}[w,n_{\mathrm{cnn}}](\bar{\eta})$ which maps the data $\bar{\eta}\in\mathbb{R}^{N_{x}\times N_{y}\times N_{z}}$ to $\eta\in\mathbb{R}^{N_{x}\times N_{y}\times N_{z}}$ , is the three-dimensional analog of ${{\sf{CNN2d}}}$ . It is a three-dimensional convolutional neural network with $n_{\mathrm{cnn}}$ convolutional layers and $w$ as the window size. ReLU is used as the activation function for all intermediate layers and no activation function is applied after the last layer.

Two-sided detection.

Following the 2D case, the two-sided detection can be treated as a combination of four one-sided detections and a post-process to combine the four parts. The NN architecture is summarized in Algorithm 5. Below we briefly comment on the two new layers used.

•

Concatenate layer. $\eta\leftarrow{{\sf{Concatenate}}}{{\sf{2d}}}(\eta_{1},\eta_{2},\eta_{3},\eta_{4})$ concatenates the 3-tensors $\eta_{i}\in\mathbb{R}^{N_{1}\times N_{2}\times c}$ , $i=1,2,3,4$ to a 3-tensor with size $\eta\in\mathbb{R}^{N_{1}\times N_{2}\times 4c}$ on the third direction.

•

Convolutional layer. $\eta\leftarrow{{\sf{Conv2d}}}[c,w](\eta)$ is the two-dimensional convolutional layer with channel number $c$ and window size $w$ .

4.2 Numerical results

The setup of the neural network is almost the same as that for the 2D case in Section 3.4. The only difference is that the window size of the convolutional layers in Algorithms 4 and 5 is set as $w=w_{2}=(3,3)$ rather than $w=w_{2}=3$ for the 2D case. For each sample of the training and test data sets, $\eta(x)$ is of the form

[TABLE]

where $\rho=1000$ . The matrix $\Theta^{(i)}\in\mathbb{R}^{3\times 3}$ is generated with the eigenvalues uniformly distributed in $[0.0125,0.05]$ and the eigenvectors uniformly sampled from the unit sphere $\mathbb{S}^{2}$ . Similar as the 2D case, $Z=1/4$ . Two types of data are generated to test the performance of the proposed NNs:

•

Shallow inclusions. The location of Gaussians is $c^{(i)}\in\mathcal{U}([0,1]\times[0,1]\times[0.05,0.2])$ , which is used for the test of the one-sided detection.

•

Deep inclusions. The location of Gaussians is $c^{(i)}\in\mathcal{U}([0,1]\times[0,1]\times[-0.2,0.2])$ , used for the test of the two-sided detection.

The domain $\Omega$ is discretized by a $40\times 40\times 20$ Cartesian grid. Both the training data and test data are generated by numerically solving 1.2.

One-sided detection for shallow inclusions.

The NN for the inverse map in Algorithm 4 is tested with shallow inclusions. $10$ K pairs of $(\eta,\mu)$ are used to train the NN parameters and another $5$ K pairs are reserved as the test data. The data set is smaller compared to the 2D case due to the memory limitation of the current GPUs. Figure 14 plots the test error and the number of parameters for different choices of $c$ . The test error is comparable with that of the 2D case. As $c$ increases, the error decays first and then stagnates around $c=10$ . Fig. 15 illustrates the NN prediction and the reference solution of a specific sample from the test data set. The plots indicate that the NN produces accurate results in terms of the location, the shape, and the magnitude of the inclusions.

Two-sided detection for deep inclusions.

The neural network in Algorithm 5 for two-sided detection is tested with deep inclusions. $8$ K pairs of $(\eta,\mu)$ are used to train the neural network and another $2$ K pairs are reserved for testing. Fig. 16 summarizes the test error and the number of parameters for different values for $c$ . The test accuracy is comparable with that of the 2D two-sided detection case. Figure 17 plots the prediction solution and the reference solution of a specific sample in the test data.

5 Conclusions

This paper proposes novel neural network architectures for EIT problems. Mathematically, these NNs approximate the forward and inverse maps between the electrical conductivity and the resulting DtN map. A perturbative analysis for the weak-inclusion regime suggests a dimension-reduction approximation, which further inspires the NN architecture design. Numerical results demonstrate that the proposed NNs approximate the forward or inverse maps with reasonable accuracy.

Using neural networks for approximating the forward and inverse maps has several clear advantages. First, once the neural networks are well trained, they can produce rather accurate prediction efficiently; Second, the correct regularization for the inverse map can be automatically captured by the neural network from the training set; Third, the neural networks proposed in this paper are compact and easy to train, thus applicable to applications with rather limited data.

The discussion here focuses on the rectangle/cuboid domains. For the spherical domains, one can turn them into the rectangle/cuboid configuration by resorting to the polar/spherical transformations. For arbitrary convex bounded Lipschitz domains, it is possible to extend the current NN architectures by carefully reparametrizing the domain, though it could require technical efforts.

Recently in medical imaging applications, adversarial attack [49, 5, 24] has become an important issue for deep learning. As our NN architecture is quite compact, the resulting NN does not overfit the training data. It is expected that our NN could be less vulnerable to adversarial attacks. Detailed study in this direction would be part of future work.

Acknowledgments

The work of Y.F. and L.Y. is partially supported by the U.S. Department of Energy, Office of Science, Office of Advanced Scientific Computing Research, Scientific Discovery through Advanced Computing (SciDAC) program. The work of L.Y. is also partially supported by the National Science Foundation under award DMS-1818449. This work is also supported by the GCP Research Credits Program from Google and AWS Cloud Credits for Research program from Amazon.

Bibliography53

The reference list from the paper itself. Each links out to its DOI / PubMed record.

1[1] M. Abadi et al. Tensorflow: A system for large-scale machine learning. In OSDI , volume 16, pages 265–283, 2016.
2[2] G. Alessandrini. Stable determination of conductivity by boundary measurements. Applicable Analysis , 27(1-3):153–172, 1988.
3[3] G. Alessandrini. Examples of instability in inverse boundary-value problems. Inverse Problems , 13(4):887, 1997.
4[4] A. Allers and F. Santosa. Stability and resolution analysis of a linearized problem in electrical impedance tomography. Inverse problems , 7(4):515, 1991.
5[5] V. Antun, F. Renna, C. Poon, B. Adcock, and A. C. Hansen. On instabilities of deep learning in image reconstruction–does AI come at a cost? ar Xiv preprint ar Xiv:1902.05300 , 2019.
6[6] M. Araya-Polo, J. Jennings, A. Adler, and T. Dahlke. Deep-learning tomography. The Leading Edge , 37(1):58–66, 2018.
7[7] L. Bar and N. Sochen. Unsupervised deep learning algorithm for PDE-based forward and inverse problems. ar Xiv preprint ar Xiv:1904.05417 , 2019.
8[8] J. Behrndt and A. ter Elst. Dirichlet-to-Neumann maps on bounded Lipschitz domains. Journal of differential equations , 259(11):5903–5926, 2015.

TL;DR

Contribution

Findings

Abstract

Peer Reviews

Videos

Solving Electrical Impedance Tomography with Deep Learning

Abstract

1 Introduction

Background.

Contributions.

Organization.

2 Mathematical analysis of the DtN map

3 Neural network for the 2D case

3.1 Forward map for the one-sided detection

Low-rank approximation and dimension reduction.

Remark 1**.**

Discretization.

Neural network architecture.

3.2 Inverse map for the one-sided detection

3.3 Inverse map for the two-sided detection

3.3.1 Architecture for the inverse map

3.4 Numerical results

3.4.1 Smooth potential case.

One-sided detection for shallow inclusions.

One-sided detection for deep inclusions.

Two-sided detection for deep inclusions.

3.4.2 Shape reconstruction

One-sided detection for shallow inclusions

Two-sided detection for deep inclusions

4 Neural network for 3D the case

4.1 Analysis and NN architecture

One-sided detection.

Two-sided detection.

4.2 Numerical results

One-sided detection for shallow inclusions.

Two-sided detection for deep inclusions.

5 Conclusions

Acknowledgments

Remark 1.