TL;DR
This paper introduces an autoencoder-based approach for faster and efficient parameter estimation in terahertz image reconstruction, significantly reducing computation time compared to traditional methods.
Contribution
It presents a novel model-based autoencoder that predicts parameters directly from data, enabling unsupervised training and faster convergence in THz imaging.
Findings
Network is over 140 times faster than classical optimization.
Predictions serve as effective initializations for local optimization.
Achieves near-optimal solutions with reduced computational effort.
Abstract
Terahertz (THz) sensing is a promising imaging technology for a wide variety of different applications. Extracting the interpretable and physically meaningful parameters for such applications, however, requires solving an inverse problem in which a model function determined by these parameters needs to be fitted to the measured data. Since the underlying optimization problem is nonconvex and very costly to solve, we propose learning the prediction of suitable parameters from the measured data directly. More precisely, we develop a model-based autoencoder in which the encoder network predicts suitable parameters and the decoder is fixed to a physically meaningful model function, such that we can train the encoding network in an unsupervised way. We illustrate numerically that the resulting network is more than 140 times faster than classical optimization techniques while making…
| Dataset (Region) | Measurement | TRA | AE | AE+TRA |
|---|---|---|---|---|
| MetalPCB (All) | Average Loss | 693.9 | 886.3 | 442.2 |
| MetalPCB (PCB) | Average Loss | 589.0 | 872.6 | 589.0 |
| MetalPCB (Metal) | Average Loss | 519.6 | 446.1 | 115.7 |
| StepChart (All) | Average Loss | 3815.1 | 5148.3 | 3675.3 |
| StepChart (Edges) | Average Loss | 4860.4 | 6309.1 | 2015.7 |
| StepChart (Steps) | Average Loss | 1152.5 | 2015.7 | 1150.3 |
| MetalPCB | Training time (sec.) | none | 9312.8 | 9312.8 |
| MetalPCB | Run time (sec.) | 10391.2 | †73.5 | ∗4854.7 |
| StepChart | Run time (sec.) | 3463.9 | †22.8 | ∗1712.4 |
| † Inference time | ||||
| ∗ Run time is the sum of AE inference and TRA optimization time | ||||
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
MethodsSolana Customer Service Number +1-833-534-1729
Training Auto-Encoder-Based Optimizers for Terahertz Image Reconstruction
Tak Ming Wong
Center for Sensor Systems (ZESS), University of Siegen, 57076 Siegen, Germany
Computer Graphics and Multimedia Systems Group, University of Siegen, 57076 Siegen, Germany
Matthias Kahl
Center for Sensor Systems (ZESS), University of Siegen, 57076 Siegen, Germany
Institute for High Frequency and Quantum Electronics (HQE), University of Siegen, 57068 Siegen, Germany
Peter Haring Bolívar
Center for Sensor Systems (ZESS), University of Siegen, 57076 Siegen, Germany
Institute for High Frequency and Quantum Electronics (HQE), University of Siegen, 57068 Siegen, Germany
Andreas Kolb
Center for Sensor Systems (ZESS), University of Siegen, 57076 Siegen, Germany
Computer Graphics and Multimedia Systems Group, University of Siegen, 57076 Siegen, Germany
Michael Möller
Center for Sensor Systems (ZESS), University of Siegen, 57076 Siegen, Germany
Computer Vision Group, University of Siegen, 57076 Siegen, Germany
(July 2, 2019)
Abstract
Terahertz (THz) sensing is a promising imaging technology for a wide variety of different applications. Extracting the interpretable and physically meaningful parameters for such applications, however, requires solving an inverse problem in which a model function determined by these parameters needs to be fitted to the measured data. Since the underlying optimization problem is nonconvex and very costly to solve, we propose learning the prediction of suitable parameters from the measured data directly. More precisely, we develop a model-based autoencoder in which the encoder network predicts suitable parameters and the decoder is fixed to a physically meaningful model function, such that we can train the encoding network in an unsupervised way. We illustrate numerically that the resulting network is more than 140 times faster than classical optimization techniques while making predictions with only slightly higher objective values. Using such predictions as starting points of local optimization techniques allows us to converge to better local minima about twice as fast as optimizing without the network-based initialization.
1 Introduction
Terahertz (THz) imaging is an emerging sensing technology with a great potential for hidden object imaging, contact-free analysis, non-destructive testing and stand-off detection in various application fields, including semi-conductor industry, biological and medical analysis, material and quality control, safety and security [1, 2, 3]. The physically interpretable quantities relevant to the aforementioned applications, however, cannot always be measured directly. Instead, in THz imaging systems, each pixel contains implicit information about such quantities, making the inverse problem of inferring these physical quantities a challenging problem with high practical relevance.
As we will discuss in Sec. 2, at each pixel location the relation between the desired (unknown) parameters , i.e., the electric field amplitude , the position of the surface , the width of the reflected pulse , and the phase , and the actual measurements can be modelled via the equation , where
[TABLE]
and is a device-dependent sampling grid . More details of the THz model are described in [4]. Thus, the crucial step in THz imaging is the solution of optimization problem of the form
[TABLE]
at each pixel , possibly along with additional regularizers on the unknown parameters. Even with simple choices of the loss function such as an -squared loss, the resulting fitting problem is highly nonconvex and global solutions become rather expensive. Considering that the number of pixels, i.e., of optimization problem (3) to be solved, typically is in the order of hundred thousands to millions, even local first order or quasi-Newton methods become quite costly: For example, running the build-in Trust-Region solver of MATLAB® to reconstruct a THz image takes over 170 minutes.
In this paper, we propose to train a neural network to solve the per-pixel optimization problem (3) directly. We formulate the training of the network as a model-based autoencoder (AE), which allows us to train the corresponding network with real data in an unsupervised way, i.e., without ground truth. We demonstrate that the resulting optimization network yields parameters that result in only slightly higher losses than actually running an optimization algorithm, despite the advantage of being more than 140 times faster. Moreover, we demonstrate that our network can serve as an excellent initialization scheme for classical optimizers. By using the network’s prediction as a starting point for a gradient-based optimizer, we obtain lower losses and converge more than 2x faster than classical optimization approaches, while benefiting from all theoretical guarantees of the respective minimization algorithm.
This paper is organized as follows: Sec. 2 gives more details on how THz imaging systems work. Sec. 3 summarizes the related work on learning optimizers, machine learning for THz imaging techniques, and model-based autoencoders. Sec. 4 describes model-based AEs in contrast to classical supervised learning approaches in detail, before Sec. 5 summarizes our implementation. Sec. 6 compares the proposed approaches to classical (optimization-based) reconstruction techniques in terms of speed and accuracy before Sec. 7 draws conclusions.
2 THz Imaging Systems
There are several approaches to realizing THz imaging, e.g. femtosecond laser based scanning system [5, 6], synthetic aperture systems [7, 8], and hybrid systems [9]. A typical approach to THz imaging is based on the Frequency Modulated Continuous Wave (FMCW) concept [8], which uses active frequency modulated THz signals to sense reflected signals from the object. The reflected energy and phase shifts due to the signal path length make 3D THz imaging possible.
In Figure 1, the setup of our electronic FMCW-THz 3D imaging system is shown. More details on the THz imaging system are described in [8].
In this paper, we denote by the measured demodulated time domain signal of the reflected electric field amplitude of the FMCW system at lateral position . In FMCW radar signal processing, this continuous wave temporal signal is converted into frequency domain by a Fourier transform [10, 11]. Since the linear frequency sweep has a unique frequency at each spatial position in -direction, the converted frequency domain signal directly relates to the spatial azimuth (-direction) domain signal
[TABLE]
The resulting 3D image is complex data in the spatial domain, representing per-pixel complex reflectivity of THz energy. The quantities , , resemble the discretization in vertical, horizontal and depth-direction, respectively. Equivalently, we may represent by considering the real and imaginary parts as two separate channels, resulting a 4D real data tensor .
Since the system is calibrated by amplitude normalization with respect to an ideal metallic reflector, a rectangular frequency signal response is ensured for the FMCW frequency dependance [8]. After the FFT in (4), the -direction signal envelope is an ideal function as continuous spatial signal amplitude, giving rise to the physical model given in (1) in the introduction.
In (1), the electric field amplitude is the reflection coefficient for the material, which is dependent on the complex dielectric constant of the material and helps to identify and classify materials. The depth position is the position at which maximum reflection occurs, i.e., the position of the surface reflecting the THz energy. is the width of the reflected pulse, which includes information on the dispersion characteristics of the material. The phase of the reflected wave depends on the ratio of real to imaginary parts of the dielectric properties of the material. Thus, the parameters contain important information about the geometry as well as the material of the imaged object, which is of interest in a wide variety of applications.
3 Related Work
Due to the revolutionary success (convolutional) neural networks have had on computer vision problems over the last decade, researchers have extended the fields of applications of neural networks significantly. A particularly interesting concept is to learn the solution of complex, possibly nonconvex, optimization problems. Different lines of research have considered directly learning the optimizer itself, e.g. modelled as a recurrent neural network [12], or rolling out optimization algorithms and learning the incremental steps, e.g. in the form of parameterized proximal operators in [13]. Further hybrid approaches include optimization problems in the networks’ architecture, e.g. [14], or combining optimizers with networks that have been trained individually [15, 16]. The recent work of Moeller et al. [17] trains a network to predict descent directions to a given energy in order to give provable convergence results on the learned optimizer.
Objectives similar to the one arising in the training of our model-based AEs are considered, for instance, for solving inverse problems with deep image priors [18] or deep decoders [19]. These works, however, consider the input to the networks being fixed random noise and have to solve an optimization problem for the networks weights for each inverse problems, such that they are regularization-by-parametrization approaches rather than learned optimizers.
The most related prior work is the 3D face reconstruction network from Tewari et al. [20]. They aimed at finding a semantic code vector from a given facial image such that feeding this code vector into a rending engine yields an image similar to the input image itself. While this problem had been addressed using optimization algorithms a long time ago [21] (also known under the name of analysis-by-synthesis approaches), the approach by Tewari et al. [20] replaced the optimizer with a neural network and kept the original cost function to train the network in an unsupervised way. The resulting structure resembles an AE in which the decoder fixed to the forward model and was therefore coined model-based AE. As we will discuss in the next section, the idea of model-based AEs generalizes far beyond 3D face reconstruction and can be used to boost the THz parameter identification problem significantly.
Finally, a recent work has exploited deep learning techniques in Terahertz imaging in [22], but the considered application of super-resolving the THz amplitude image by training a convolutional neural network on synthetically blurred images is not directly related to our proposed approach.
4 A Model-Based Autoencoder for THz Image Reconstruction
Let us denote the THz input data by , and consider our four unknown parameters to be matrices, allowing each parameter to change at each pixel. Under slight abuse of notation we can interpret all operations in (1) to be pointwise and again identify complex values with two real values in order to have , where denotes the depth sampling grid. Concatenating all four matrix valued parameters into a single parameter tensor , our goal can be formalized as finding such that .
A classical supervised machine learning approach to problems with known forward operator is illustrated in Figure 2 for the example of THz image reconstruction: The explicit forward model is used to simulate a large set of images from known parameters which can subsequently be used as training data for predicting via a neural network depending on weights . Such supervised approaches with simulated training data are frequently used in other image reconstruction areas, e.g. super resolution [23, 24], or image deblurring [25, 26]. The accuracy of networks trained on simulated data, however, crucially relies on precise knowledge of the forward model and the simulated noise. Slight deviations thereof can significantly degrade a network performance as demonstrated in [27], where deep denoising networks trained on Gaussian noise were outperformed by BM3D when applied to realistic sensor noise.
Instead of pursuing the supervised learning approach described above, we replace in the optimization approach (3) by a suitable network that depends on the raw input data and learnable parameters , that can be trained in an unsupervised way on real data. Assuming we have multiple examples of THz data, and choosing the loss function in (3) as an -squared loss, gives rise to the unsupervised training problem
[TABLE]
As we have illustrated in Figure 3, this training resembles an AE architecture: The input to the network is data which gets mapped to parameters that – when fed into the model function – ought to reproduce again.
Opposed to the straight forward supervised learning approach, the proposed approach (5) has two significant advantages
- •
It allows us to train the network in an unsupervised way, i.e., on real data, and therefore learn to deal with measurement-specific distortions.
- •
The cost function in (5) implicitly handles the scaling of different parameters, and therefore circumvents the problem of defining meaningful cost functions on the parameter space: Simple parameter discrepancies such as for two different parameters sets and largely depend on the scaling of the individual parameters and might even be meaningless, e.g. for cyclic parameters such as the phase offset .
5 Encoder Network Architecture and Training
5.1 Data Preprocessing
As illustrated in the plot of the magnitude of an exemplary measured THz signal shown in Figure 4, the THz energy is mainly focused in the main lobe and first side-lobes of the function. Because the physical model remains valid in close proximity of the main lobe only, we preprocess the data to reduce the impressively large range of measurements per pixel. We, therefore, crop out 91 measurements per pixel centered around the main lobe, whose position is related to the object distance and to the parameter . Details of the cropping window are described in [4]. We represent the THz data in a 4D real tensor , where , and is the size of the cropping window, i.e. in our case.
5.2 Encoder Architecture and Training
For the encoder network we pick a spatially decoupled architecture using convolutions on only, leading to a signal-by-signal reconstruction mechanism that allows a high level of parallelism and therefore maximizes the reconstruction speed on a GPU. The specific architecture (illustrated in Figure 5) applies a first set of convolutional filters on the real and imaginary part separately, before concatenating the activations, and applying three further convolutional filters on the concatenated structure. We apply batch-normalization (BN) [28] after each convolution and use leaky rectified linear units (LeReLU) [29] as activations. Finally, a fully connected layer reduces the dimension to the desired size of four output parameters per pixel. To ensure that the amplitude is physically meaningful, i.e., non-negative, we apply an absolute value function on the first component. Interestingly, this choice compared favorably to a plain rectified linear unit when the network is trained.
We train our model optimizing (5) using the Adam optimizer [30] on of the pixels from a real (measured) THz image for 1200 epochs. The remaining of the pixels serve as a validation set. The batch size is set to . The initial learning rate is set to , and is reduced by a factor of 0.99 every 20 epochs. Figure 6 illustrates the decay of the training and validation losses over 1200 epochs. As we can see, the validation loss nicely resembles the training loss with almost no generalization gap.
6 Numerical Experiments
We evaluate the proposed model-based AE on two datasets, which are acquired using the setup described in Sec. 2, namely the MetalPCB dataset and the StepChart dataset. The MetalPCB dataset is measured by a nearly planar copper target etched on a circuit board (Figure 7a), which includes metal and PCB material regions, in the standard size scale of USAF target MIL-STD-150A [31]. After the preprocessing described in Sec. 5.1, the MetalPCB dataset has sample points. The StepChart dataset is based on an aluminum object (Figure 7b) with sharp edges to evaluate the distance measurement accuracy using a 3D object. The StepChart dataset has sample points after preprocessing.
In order to evaluate the optimization quality on different materials and structures, MetalPCB dataset is evaluated in regions: PCB region is a local region that contains PCB material only, Metal region is a local region that contains copper material only, and All region is the entire image area. Similarly, the StepChart dataset is evaluated by 3 regions: Edge region is the region that contains physical edges, Steps region is the center planar region of each steps, and All region is the entire image area. This segmentation is done, because the THz measurements of the highly specular aluminum target results in strong multi-path interference artifacts at the edges that should be investigated separately.
The proposed model-based AE is trained on the MetalPCB dataset only, while the parameter inference is made for both the MetalPCB and StepChart datasets. This cross-referencing between two datasets can verify whether the proposed AE method is modelling the physical behavior of the system without overfitting to a specific dataset or recorded material.
To compare with the classical optimization methods, the parameters are estimated using the Trust-Region Algorithm (TRA) [32], which is implemented in MATLAB® . The TRA optimization requires a proper definition of the parameter ranges. Furthermore, it is very sensitive with respect to the initial parameter set. We, therefore, carefully select the initial parameters by sequentially estimating them from the source data (see [4] for more details). Still, the optimization may result in a parameter set with significant loss values; see Sec. 6.2.
The trained encoder network is independent of any initialization scheme as it tries to directly predict optimal parameters from the input data. While the network alone gives remarkably good results with significantly lower runtimes than the optimization method, there is no guarantee that the network’s predictions are critical points of the energy to be minimized. This motivates the use of the encoder network as an initialization scheme to the TRA, specifically because the TRA guarantees the monotonic decrease of objective function such that using the TRA on top of the network can only improve the results. We abbreviate this approach to AE+TRA for the rest of this paper.
To fairly compare all three approaches, the optimization time of TRA and the inference time of the AE are both recorded by an Intel® i7-8700K CPU computation, while the AE is trained on a NVIDIA® GTX 1080 GPU. The PyTorch source code is available at https://github.com/tak-wong/THz-AutoEncoder.
6.1 Loss and timing
In Table 1, the average loss in (5) and the timing are shown for the Trust-Region Algorithm (TRA), the Autoencoder (AE) and the joint AE+TRA approaches, respectively. We can see that the proposed encoder network achieves a lower average loss than the TRA method in the metal region of the MetalPCB dataset, it yields higher average losses than the TRA on both datasets. It is encouraging to see that although the AE was trained on the MetalPCB dataset, the relative performance in comparison to the TRA does not decay too significantly when changing to an entirely unseen data set with a different material, with the AE loss being and higher than the TRA loss on the MetalPCB and StepChart data sets, respectively. If such a sacrifice in accuracy is acceptable, the speed-up in runtime is tremendous with the AE being over 140 times faster than the TRA (for both methods being evaluated on a CPU). Note that even the sum of training and inference time are smaller for the proposed AE than the runtime of the TRA on the MetalPCB dataset.
Interestingly, the combined AE+TRA approach of initializing the TRA with the encoder network’s prediction leads to better losses than the TRA alone in all regions. Additionally, the AE-initialized TRA converged more than 2 times faster due to the stopping criterion being reached earlier.
We note that the losses of all approaches are significantly higher for the StepCart data set than they are for the MetalPCB. This is because the aluminum StepChart object (Figure 7b) has a more complex physical structure than the MetalPCB object, which results in a mixture of scattered THz pulses by multi-path interference effects in all object regions. Incorporating such effects in the reflection model of (1) could therefore be an interesting aspect of future research for improving the explainability of the measured data with the physical model.
6.2 Quality Assessment of THz Images
In THz imaging, the intensity image that is equal to the squared amplitude, i.e. is the most important criteria for quality assessment. Note that the intensity could be inferred directly from the data by considering that (1) yields
[TABLE]
where is the complex conjugate of . As we illustrate in Figure 8, the model-based approach is not only capable of extracting all relevant parameters, i.e., , , and , but, compared to values directly extracted from the source data, the resulting intensity is more homogeneous in homogeneous material regions. The homogeneity of the directly extracted intensity results from the very low depth of field of THz imaging systems in general, combined with the slight non-planarity of the MetalPCB target. As depicted in Figure 8c, the intensity variations along the selected line in the homogeneous copper region are reduced using the three model-based methods, i.e. TRA, AE, and AE+TRA. However, due to the crucial selection of the initial parameters (see discussion at the beginning of Sec. 6), the TRA optimization results exhibit significant amplitude fluctuations and loss values (Figure 8d) in the two horizontal sub-regions and . The proposed AE and AE+TRA methods, however, deliver superior results with respect to the main quality measure applied in THz imaging, i.e. to the intensity homogeneity and the loss in model fitting. Still, the AE approach shows very few extreme loss values, while the AE+TRA method’s loss values are consistently low along the selected line in the homogeneous copper region.
7 Conclusions and Future Work
In this paper, we propose a model-based autoencoder for THz image reconstruction. Comparing to a classical Trust-Region optimizer, the proposed autoencoder gets within margin to the objective value of the optimizer, while being more than 140 times faster. Using the network’s prediction as an initialization to a gradient-based optimization scheme improves the result over a plain optimization scheme in terms of objective values while still being two times faster. We believe that these are very promising results for training optimizers/initialization schemes for parameter identification problems in general by exploiting the idea of model-based autoencoders for unsupervised learning.
Future research will include exploiting spatial information during the reconstruction as well as considering joint parameter identification and reconstruction problems such as denoising, sharpening, and super-resolving parameter images such as the amplitude images shown in Figure 8b.
Acknowledgement
This is a pre-print of a conference proceeding article published in German Conference on Pattern Recognition. The final authenticated version is available online at: https://doi.org/10.1007/978-3-030-33676-9_7
The reference list from the paper itself. Each links out to its DOI / PubMed record.
- 1[1] Wai Lam Chan, Jason Deibel, and Daniel M Mittleman. Imaging with terahertz radiation. Reports on progress in physics , 70(8):1325, 2007.
- 2[2] Christian Jansen, Steffen Wietzke, Ole Peters, Maik Scheller, Nico Vieweg, Mohammed Salhi, Norman Krumbholz, Christian Jördens, Thomas Hochrein, and Martin Koch. Terahertz imaging: applications and perspectives. Appl. Opt. , 49(19):E 48–E 57, 2010.
- 3[3] Peter H Siegel. Terahertz technology. IEEE Transactions on microwave theory and techniques , 50(3):910–928, 2002.
- 4[4] Tak Ming Wong, Matthias Kahl, Peter Haring Bolívar, and Andreas Kolb. Computational image enhancement for frequency modulated continuous wave (fmcw) thz image. Journal of Infrared, Millimeter, and Terahertz Waves , 40(7):775–800, 2019.
- 5[5] Ken B Cooper, Robert J Dengler, Nuria Llombart, Bertrand Thomas, Goutam Chattopadhyay, and Peter H Siegel. Thz imaging radar for standoff personnel screening. IEEE Transactions on Terahertz Science and Technology , 1(1):169–182, 2011.
- 6[6] Binbin B Hu and Martin C Nuss. Imaging with terahertz waves. Optics letters , 20(16):1716–1718, 1995.
- 7[7] K Mc Clatchey, MT Reiten, and RA Cheville. Time resolved synthetic aperture terahertz impulse imaging. Applied physics letters , 79(27):4485–4487, 2001.
- 8[8] Jinshan Ding, Matthias Kahl, Otmar Loffeld, and Peter Haring Bolívar. Thz 3-d image formation using sar techniques: simulation, processing and experimental results. IEEE Transactions on Terahertz Science and Technology , 3(5):606–616, 2013.
