Simultaneous reconstruction of the initial pressure and sound speed in photoacoustic tomography using a deep-learning approach
Hongming Shan, Christopher Wiedeman, Ge Wang, Yang Yang

TL;DR
This paper introduces a deep learning-based method for simultaneously reconstructing initial pressure and sound speed in photoacoustic tomography, overcoming the challenge of unknown sound speed distribution and improving image quality.
Contribution
A novel data-driven approach combining deep neural networks with model-based iteration for joint reconstruction in photoacoustic tomography.
Findings
Significant improvement in initial pressure image quality in simulations.
Effective simultaneous reconstruction of pressure and sound speed.
Potential to enhance practical photoacoustic imaging applications.
Abstract
Photoacoustic tomography seeks to reconstruct an acoustic initial pressure distribution from the measurement of the ultrasound waveforms. Conventional methods assume a-prior knowledge of the sound speed distribution, which practically is unknown. One way to circumvent the issue is to simultaneously reconstruct both the acoustic initial pressure and speed. In this article, we develop a novel data-driven method that integrates an advanced deep neural network through model-based iteration. The image of the initial pressure is significantly improved in our numerical simulation.
| Layer | Feature Extraction | Feature Fusion | Reconstruction |
|---|---|---|---|
| 1 | 3 3 conv, 32, stride 1 | Batch Normalization | 3 3 conv, 16, stride 1 |
| 2 | 3 3 conv, 32, stride 1 | 3 3 conv, 64, stride 1 | Batch Normalization |
| 3 | 3 3 conv, 32, stride 1 | 3 3 conv, 1, stride 1 |
| Index | Initial Pressure | Sound Speed [m/s] |
| 1 | 0.0 | 1480 |
| 2 | 0.2 | 1800 |
| 3 | 0.4 | 1530 |
| 4 | 0.6 | 1520 |
| 5 | 0.8 | 2600 |
| 6 | 1.0 | 3198 |
| Mean Absolute Error | Iter 1 | Iter 2 | Iter 3 | Iter 4 |
|---|---|---|---|---|
| (1.430.36) | (0.860.33) | (0.850.33) | (0.850.33) | |
| [m/s] | 3.130.64 | 1.330.52 | 1.250.52 | 1.240.52 |
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
MethodsSPEED: Separable Pyramidal Pooling EncodEr-Decoder for Real-Time Monocular Depth Estimation on Low-Resource Settings
Simultaneous reconstruction of the initial pressure and sound speed in photoacoustic tomography using a deep-learning approach
Hongming Shan
Contact: [email protected], [email protected], [email protected], [email protected]
Christopher Wiedeman
Contact: [email protected], [email protected], [email protected], [email protected]
Ge Wang
Contact: [email protected], [email protected], [email protected], [email protected]
Yang Yang
Contact: [email protected], [email protected], [email protected], [email protected]
Abstract \vskip7.22743pt
Photoacoustic tomography seeks to reconstruct an acoustic initial pressure distribution from the measurement of the ultrasound waveforms. Conventional methods assume a-prior knowledge of the sound speed distribution, which practically is unknown. One way to circumvent the issue is to simultaneously reconstruct both the acoustic initial pressure and speed. In this article, we develop a novel data-driven method that integrates an advanced deep neural network through model-based iteration. The image of the initial pressure is significantly improved in our numerical simulation.
1 Introduction
Photoacoustic tomography (PAT) is an emerging non-invasive modality that has manifested an enormous prospect for clinical practices [1]. In PAT, the tissue is illuminated with near-infrared light of wavelength 650-900nm. The absorbed optical energy is transformed into acoustic energy through the photoacoustic effect, and the generated ultrasound is measured by transducer arrays around the tissue in order to retrieve the optical internal properties. The coupling mechanism of the optical and ultrasound waves gives multiple advantages over conventional standalone modalities. For instance, the acoustic wave experiences less scattering in tissue compared to optical wave, allowing PAT to break the optical diffusion limit and generate high-resolution images [2] while preserving intrinsic optical contrast [3].
Typical photo-acoustic signal generation comprises three steps: (1) the tissue absorbs light; (2) the absorbed optical energy heats the tissue and raises the temperature; (3) thermo-elastic expansion occurs and generates ultrasound. The image formation in PAT is to recover the distribution of the deposited energy, known as the local optical fluence, from the ultrasound signals that are recorded by the sensors deployed around the tissue. As the initial ultrasound pressure is approximately proportional to the optical fluence, it is sufficient to reconstruct the initial pressure from the recorded ultrasound signals. Conventional reconstructive schemes, such as back-projection based methods [4, 5, 6, 7, 8, 9, 10, 11] and time-reversal based methods [12, 13, 14, 15, 16, 17, 18, 19, 20, 21, 22, 23, 24], typically assume the sound speed is precisely known. This assumption however is not always fulfilled, and the precise sound speed distribution is often unknown. It is therefore of practical significance to study simultaneous reconstruction of both the sound speed distribution and the acoustic initial pressure.
The purpose of this article is to address this issue using a data-driven approach. Data-driven approaches such as machine learning and deep learning have gained increasing attention recently in medical image reconstruction [25, 26]. The powerful computational capacity of modern super computers has enabled these methods to enhance medical image reconstruction by extracting the hidden patterns in the data that would otherwise be lost. In PAT, data driven methods have achieved state-of-the-art results in PAT image reconstruction in the scenarios of sparse data [27, 28, 29], limited view [30, 31, 32], artifacts removal [33, 34, 35], as well as other applications [36, 37, 38]. In addition to PAT, data-driven approaches have also found broad application in the image reconstruction of many other imaging modalities such as Computed Tomography (CT), Magnetic Resonance Imaging (MRI), as well as other related fields like radiomics and radiotheraphy, see [39, 40, 41, 42, 43, 44, 45, 46, 47, 48, 49, 50] for some examples and applications along these directions.
2 Preliminary
In this section, we formulate the mathematical model of the simultaneous reconstruction problem in PAT and review the recent advances based on model-driven approaches.
Propagation of ultrasound in PAT is typically modeled as an initial value problem for the wave equation. This can be written as
[TABLE]
where is the dimension, is the acoustic velocity, is the acoustic pressure, is the stoppage time of the measurement, is the mass density, is the sound speed, and is the initial pressure distribution. We remark that another popular model in PAT is to deal with the second order acoustic equation rather than the above first order system. These models are more or less equivalent under suitable assumptions.
Let be a bounded domain in representing the tissue, with the boundary of denoted by . The measurement in PAT is the detected ultrasound signals, that is the temporal boundary data . We further introduce a measurement map which is defined as
[TABLE]
where is the solution of (1). Assuming the mass density is known, then the simultaneous reconstruction problem in PAT can be recast as the inverse problem to recover the pair from ; in other words, to invert the operator . Note that is linear in but non-linear in .
We recall some known results regarding the simultaneous reconstruction in PAT. Linearization of the map (2) at sufficiently smooth pairs is studied in [51] and the linearized inversion is shown to be unstable in any scale of Sobolev spaces. Invertibility of is established in [52] for radially symmetric sound speeds in any odd dimension higher than three; in [53, 54] for sound speeds that satisfy certain orthogonality relation with harmonic functions. Numerical simulations have been implemented in [55] using the finite element model, and in [56, 57, 58] for parametrized sound speeds.
Our data-driven approach is inspired by the work of Matthews et al. [56], where the authors proposed a direct optimization approach towards the simultaneous reconstruction, with the sound speed being confined in a parametrized lower-dimensional space. We denote the measurement by for simplicity. With slight abuse of notations, the discretized versions of and are still denoted by themselves, which are two matrices. For a fixed sound speed , the operator is linear, hence its discretization is another matrix, say . The idea in Ref [56] is to minimize the objective function
[TABLE]
where the data fidelity term is defined as and denotes the regularization term for the initial pressure. This objective function can be solved using the proximal optimization method that allows constraints and nonsmooth regularization function for the initial pressure distribution. The initial pressure distribution and sound speed distribution are updated iteratively using gradient descent. The algorithm in Ref [56] is summarized in Algorithm 1 below.
3 Simultaneous Reconstruction via Deep Learning
We describe our data-driven method in this section. The motivation stems from some limitations of the iterative algorithm 1. For instance, repeated selection of step sizes to update the initial pressure and sound speed is time-consuming; selection of the parameter and the regularization term is highly empirical; tuning can be tedious. To overcome these limitations, we propose a simultaneous reconstruction network (SR-Net) to update the initial pressure and the sound speed for each iteration.
The proposed SR-Net is shown in Fig. 1. It contains three steps - feature extraction, feature fusion, and reconstruction.
Feature extraction: We used two convolutional layers to extract features from the initial pressure distribution, the negative gradient of data fidelity with respect to initial pressure, and current sound speed distribution. 2. 2.
Feature fusion: The feature maps extracted from above three branches are concatenated along the channel dimension, which are then fed into the batch normalization (BN) layer. Since the feature maps from the initial pressure and the sound speed are in different scale, this batch normalization layer can normalize feature maps into the same scale for subsequent feature fusion. The feature fusion part also contains two convolutional layers followed by ReLU. 3. 3.
Reconstruction: After the feature fusion, we use two convolutional layers to reconstruct the updated initial pressure and sound speed, respectively. A batch normalization layer is used after first convolutional layer to scale the fused feature back to the original scale. A skip connection after the second convolutional layer enables the network to learn the residual just like the traditional iterative algorithm (see step 5-6 in Algorithm 1). The last ReLU activation function guarantees the updated initial pressure and speed distribution are non-negative.
A detailed network structure is shown in Table 1. Note that zero-padding is used such that all feature maps have the same as the inputs. For each iteration, we trained SR-Net using the simulated dataset. At the -th iteration, the loss function is chosen as follows:
[TABLE]
where represents the parameters of the SR-Net. and are the ground-truth labels. We employ the Adam algorithm [59] to update the parameters in . The gradients of the parameters are computed using a back-propagation algorithm. The iterative reconstruction algorithm is summarized in Algorithm 2.
Note that our proposed network is different from the model in Ref [30] because 1) our network aims to reconstruct the initial pressure distribution and sound speed distribution simultaneously, while the model in Ref [30] assumes the sound speed is precisely known; 2) our network is a 2D network, while the model in Ref [30] is a 3D network; and 3) the proposed network used the batch normalization layer to scale the feature maps, while the model in Ref [30] used an explicit coefficient to represent the step length.
4 RESULTS
4.1 Data generation
In this study, we used a simplified Shepp-logan phantom to train and test our network. Fig. 2 shows the phantom along with the initial pressure and the sound speed distribution. We randomly changed the sizes and positions of the region 1, 3, 4, and 5 in order to increase the diversity of dataset. The corresponding initial pressure and sound speed are shown in Table 2. We put the phantom into a image, which is surrounded by 252 detectors in the rectangle form. For the training purpose, we randomly generated 5,120 images with acoustic signal of the size . The mass density is assumed to be constant 1 throughout. Then we selected 1,024 images randomly to test the trained model.
4.2 Experimental results
Throughout the experiments, the initial pressure was initialized uniformly as zero and the sound speed was initialized uniformly as 1,500 m/s. The maximum number of iteration was set to 4. The gradient of the data fidelity with respect to the initial pressure is computed by the adjoint state method using the -wave package [60]. The proposed SR-Net was implemented by Pytorch and trained on four NVIDIA 1080Ti GPUs.
We show two testing images in Figs. 3 and 4, which were not used in training stage. It is observed that the proposed method can reconstruct the initial pressure and sound speed simultaneously with very high accuracy. Mean absolute error for sound speed and initial pressure were calculated for each reconstructed image at each iteration. The average value and standard deviations of these errors are contained in Table 3.
4.3 Generalization ability of the trained SR-Net on a new phantom
All Shepp-Logan phantoms used to train the model featured all six regions listed in Table 2. To test the generalizability of the model, we reconstructed four phantoms with region 5 removed. Figures 5 and 6 illustrate two cases of this test. From these cases, it can be observed that the model still accurately reconstructs phantoms that contain less regions than the phantoms used to train the model.
5 Discussions and Conclusion
The main weakness of this study is the numerical nature, which may not fully reflect all the physical factors in real clinical/pre-clinical applications. We plan to perform physical phantom experiments as a next step. Also, we are interested in animal studies in vivo. Nevertheless, the numerical results and the network trained with simulated data may serve a baseline for further improvements. Hopefully, this imaging modality may find practical use in some important tasks such as breast imaging.
In conclusion, we have developed a simultaneous reconstruction network (SR-Net) to jointly recover the sound speed and initial pressure in photoacoustic tomography. The proposed SR-Net has several merits: automatically learning the step size, regularization term, and the trade-off between data fidelity and regularization in a data-driven manner, and not requiring the gradient of the data fidelity with respect to sound speed. In the future, we will further improve the structure of the network and also validate it on more complicated datasets.
6 Acknowledgment
The research of Y. Yang is partly supported by the NSF grant DMS-1715178, the AMS-Simons travel grant, and the startup fund from Michigan State University. The authors would like to thank NVIDIA Corporation for the donation of GPU used for this research.
The reference list from the paper itself. Each links out to its DOI / PubMed record.
- 1[1] Jun Xia, Junjie Yao, and Lihong V Wang. Photoacoustic tomography: principles and advances. Electromagnetic waves (Cambridge, Mass.) , 147:1, 2014.
- 2[2] Lihong V Wang and Gene K Beare. Breaking the optical diffusion limit: Photoacoustic tomography. In Frontiers in Optics , page FWY 2. Optical Society of America, 2010.
- 3[3] Junjie Yao and Lihong V Wang. Photoacoustic tomography: fundamentals, advances and prospects. Contrast media & molecular imaging , 6(5):332–345, 2011.
- 4[4] David Finch, Rakesh, and Leonid Kunyansky. Recovering a function from its spherical mean values in two and three dimensions. Photoacoustic Imaging and Spectroscopy , 2009.
- 5[5] David Finch, Markus Haltmeier, and Rakesh. Inversion of spherical means and the wave equation in even dimensions. SIAM Journal on Applied Mathematics , 68(2):392–412, 2007.
- 6[6] Markus Haltmeier, Thomas Schuster, and Otmar Scherzer. Filtered backprojection for thermoacoustic computed tomography in spherical geometry. Mathematical methods in the applied sciences , 28(16):1919–1937, 2005.
- 7[7] Robert A Kruger, Daniel R Reinecke, and Gabe A Kruger. Thermoacoustic computed tomography–technical considerations. Medical physics , 26(9):1832–1837, 1999.
- 8[8] Robert A Kruger, William L Kiser Jr, Daniel R Reinecke, and Gabe A Kruger. Thermoacoustic computed tomography using a conventional linear transducer array. Medical physics , 30(5):856–860, 2003.
