Preconditioning the discrete dipole approximation
Samuel P. Groth, Athanasios G. Polimeridis, Jacob K. White

TL;DR
This paper introduces a preconditioning method for the discrete dipole approximation that significantly accelerates convergence in scattering simulations of atmospheric ice crystals, enabling faster computations.
Contribution
It proposes a multi-level circulant preconditioner for the DDA system matrix, improving iterative solution convergence for large and complex scattering problems.
Findings
Reduces simulation times by orders of magnitude
Effective for scattering by hexagonal ice prisms
Available MATLAB implementation online
Abstract
The discrete dipole approximation (DDA) is a popular numerical method for calculating the scattering properties of atmospheric ice crystals. The standard DDA formulation involves the uniform discretization of the underlying volume integral equation, leading to a linear system with a block-Toeplitz Toeplitz-block matrix. This structure permits a matrix-vector product to be performed with complexity via the fast-Fourier transform (FFT). Thus, in principle, the system can be solved rapidly using an iterative method. However, it is well known that the convergence of iterative methods becomes increasing slow as the optical size and refractive index of the scattering obstacle are increased. In this paper, we present a preconditioning strategy based on the multi-level circulant preconditioner of Chan and Olkin and assess its performance for improving this rate ofā¦
| level () | level () | level () | |
| + | + | + | |
| ā | ā | + | |
| ā | + | ā | |
| + | + | + | |
| + | ā | + | |
| + | + | + |
| Preconditioner | Storage cost (memory) | |
|---|---|---|
| Setup cost (time) | ||
| MVP (time) | ||
| Operator | Storage cost (memory) | |
| Setup cost (time) | ||
| MVP (time) |
| #Total | Unpreconditioned | Preconditioned | |||||||||
| voxels | GMRES | BiCG-Stab | GMRES | BiCG-Stab | |||||||
| Its. | Solve(s) | Its. | Solve(s) | Build(s) | Its. | Solve(s) | Its. | Solve(s) | |||
| 1.2 | 10 | 8 | 0.38 | 8 | 0.10 | 0.29 | 6 | 0.39 | 7 | 0.08 | |
| 20 | 14 | 1.9 | 15 | 0.36 | 0.99 | 11 | 2.0 | 12 | 0.45 | ||
| 30 | 28 | 7.3 | 33 | 1.9 | 2.4 | 27 | 8.0 | 34 | 3.1 | ||
| 40 | 58 | 28 | 64 | 9.7 | 4.9 | 31 | 21 | 34 | 8.6 | ||
| 60 | 189 | 72 | 247 | 120 | 17 | 32 | 29 | 36 | 31 | ||
| 80 | 416 | 3,800 | 556 | 620 | 40 | 37 | 184 | 41 | 86 | ||
| 100 | 795 | 36,000 | 1062 | 2400 | 93 | 42 | 387 | 47 | 200 | ||
| 1.4 | 10 | 10 | 0.45 | 11 | 0.071 | 0.63 | 7 | 0.47 | 8 | 0.072 | |
| 20 | 41 | 4.6 | 47 | 1.4 | 1.6 | 27 | 4.2 | 39 | 1.9 | ||
| 30 | 158 | 61 | 233 | 20 | 3.3 | 35 | 15 | 43 | 6.1 | ||
| 40 | 400 | 670 | 554 | 120 | 7.2 | 42 | 39 | 52 | 19 | ||
| 60 | 1533 | 27,000 | 2957 | 2100 | 24 | 64 | 180 | 98 | 120 | ||
| 80 | 81 | 107 | 750 | 234 | 750 | ||||||
| 100 | 150 | 116 | 1800 | 217 | 1500 | ||||||
| 1.6 | 10 | 16 | 0.75 | 17 | 0.13 | 0.43 | 11 | 0.76 | 12 | 0.15 | |
| 20 | 85 | 11 | 111 | 4.1 | 1.6 | 29 | 5.6 | 33 | 2.2 | ||
| 30 | 573 | 850 | 1124 | 140 | 4.6 | 52 | 30 | 105 | 23 | ||
| 40 | 1339 | 9500 | 3460 | 1100 | 10 | 73 | 92 | 132 | 71 | ||
| 60 | 38 | 111 | 470 | 570 | 1100 | ||||||
| 80 | 110 | 195 | 2,700 | ||||||||
| 100 | 290 | 247 | 12,000 | ||||||||
| 1.8 | 10 | 18 | 0.50 | 20 | 0.18 | 0.52 | 12 | 0.51 | 13 | 0.21 | |
| 20 | 313 | 120 | 601 | 32 | 2.3 | 36 | 6.5 | 49 | 4.5 | ||
| 30 | 1,224 | 5,100 | 3,471 | 730 | 6.4 | 72 | 52 | 425 | 160 | ||
| 40 | 16 | 119 | 230 | ||||||||
| 60 | 60 | 207 | 1,800 | ||||||||
| 80 | 180 | 367 | 13,000 | ||||||||
| 100 | 413 | ||||||||||
| 2 | 10 | 20 | 0.56 | 24 | 0.24 | 0.61 | 13 | 0.57 | 13 | 0.25 | |
| 20 | 502 | 350 | 1,313 | 91 | 2.7 | 46 | 11 | 139 | 19 | ||
| 30 | 8.5 | 114 | 120 | 706 | 320 | ||||||
| 40 | 38 | 191 | 630 | ||||||||
| 60 | 97 | 407 | 7,500 | ||||||||
| 80 | 280 | ||||||||||
| 100 | 600 | ||||||||||
| #Total | Unpreconditioned | Preconditioned | |||||||||
| voxels | GMRES | BiCG-Stab | GMRES | BiCG-Stab | |||||||
| Its. | Solve | Its. | Solve | Build | Its. | Solve | Its. | Solve | |||
| 1.2 | 10 | 11 | 0.49 | 11 | 0.08 | 0.37 | 9 | 0.29 | 9 | 0.10 | |
| 20 | 27 | 2.5 | 29 | 1.1 | 1.3 | 23 | 2.80 | 32 | 1.93 | ||
| 30 | 65 | 19 | 76 | 9.0 | 3.8 | 29 | 10.7 | 36 | 6.66 | ||
| 40 | 120 | 100 | 149 | 37 | 8.4 | 35 | 30.3 | 38 | 16.8 | ||
| 60 | 308 | 1700 | 391 | 370 | 40 | 41 | 136 | 46 | 91.8 | ||
| 80 | 707 | 20,000 | 906 | 2,100 | 130 | 55 | 610 | 72 | 630 | ||
| 100 | 290 | 61 | 2,500 | 79 | 2,800 | ||||||
| 1.4 | 10 | 22 | 0.48 | 25 | 0.20 | 0.60 | 17 | 0.58 | 21 | 0.30 | |
| 20 | 116 | 20 | 156 | 7.6 | 1.9 | 31 | 5.1 | 46 | 3.4 | ||
| 30 | 393 | 510 | 563 | 100 | 5.8 | 45 | 27 | 63 | 19 | ||
| 40 | 852 | 5,400 | 1,692 | 760 | 17 | 60 | 89 | 96 | 82 | ||
| 60 | 74 | 95 | 680 | 159 | 670 | ||||||
| 80 | 230 | 149 | 6,100 | 388 | 8,500 | ||||||
| 100 | 520 | 956 | 19,000 | ||||||||
| 1.6 | 10 | 34 | 0.80 | 41 | 0.40 | 0.50 | 20 | 0.76 | 26 | 0.43 | |
| 20 | 384 | 230 | 641 | 44 | 2.5 | 42 | 9.7 | 58 | 6.8 | ||
| 30 | 1,148 | 6,000 | 2,730 | 670 | 9.4 | 81 | 74 | 236 | 110 | ||
| 40 | 26 | 114 | 300 | 330 | 411 | ||||||
| 60 | 130 | 200 | 4,400 | 3,054 | 23,000 | ||||||
| 80 | 362 | ||||||||||
| 1.8 | 10 | 79 | 3.2 | 131 | 1.8 | 0.93 | 25 | 1.6 | 33 | 0.92 | |
| 20 | 802 | 1,400 | 2,333 | 230 | 3.9 | 61 | 23 | 178 | 31 | ||
| 30 | 14 | 127 | 205 | ||||||||
| 40 | 41 | 222 | 1,300 | ||||||||
| 60 | 190 | 461 | 20,000 | ||||||||
| 2 | 10 | 133 | 9.1 | 256 | 4.2 | 0.78 | 30 | 1.8 | 42 | 1.2 | |
| 20 | 1,268 | 4,400 | 5.3 | 111 | 75 | ||||||
| 30 | 23 | 216 | 620 | ||||||||
| 40 | 65 | 443 | 5,740 | ||||||||
| 60 | 280 | ||||||||||
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsElectromagnetic Scattering and Analysis Ā· Soil Moisture and Remote Sensing Ā· Electromagnetic Simulation and Numerical Methods
Preconditioning the discrete dipole approximation
Samuel P. Groth11footnotemark: 1111Present address: Department of Engineering, University of Cambridge, United Kingdom.Ā 22footnotemark: 2222Corresponding author. Email address: [email protected], Athanasios G. Polimeridis, and Jacob K. White
Department of Electrical Engineering and Computer Science,
Massachusetts Institute of Technology, USA.
Q Bio, Redwood, CA 94063, USA
Abstract
The discrete dipole approximation (DDA) is a popular numerical method for calculating the scattering properties of atmospheric ice crystals. The standard DDA formulation involves the uniform discretization of the underlying volume integral equation, leading to a linear system with a block-Toeplitz Toeplitz-block matrix. This structure permits a matrix-vector product to be performed with complexity via the fast-Fourier transform (FFT). Thus, in principle, the system can be solved rapidly using an iterative method. However, it is well known that the convergence of iterative methods becomes increasing slow as the optical size and refractive index of the scattering obstacle are increased. In this paper, we present a preconditioning strategy based on the multi-level circulant preconditioner of Chan and Olkin [Numer.Ā Algorithms 6, 89 (1994)] and assess its performance for improving this rate of convergence. In particular, we approximate the system matrix by a block-circulant circulant-block matrix which can be inverted rapidly using the FFT. We present numerical results for scattering by hexagonal ice prisms demonstrating that this serves as an effective preconditioning strategy, reducing simulation times by orders of magnitude in many cases. A Matlab implementation of this work is freely available online.
**Keywords: ** Volume integral equation, discrete dipole approximation, preconditioning, circulant, electromagnetic scattering
1 Introduction
Since the publication of Purcell and Pennypackerās seminal paper in 1973Ā [1] and the subsequent work of, amongst others, Draine and FlatauĀ [2], Goedecke and OāBrienĀ [3], and Yurkin and HoekstraĀ [4], the discrete dipole approximation (DDA) has proven popular for electromagnetic scattering simulations. Application areas include dust particlesĀ [1, 5], biological tissuesĀ [6, 7], optical tweezersĀ [8], and atmospheric ice crystalsĀ [9].
The success of DDA is in no small part due to the fact that, when the underlying volume integral equation is discretized over a uniform (āvoxelizedā) grid, the system matrix obtains a block-Toeplitz Toeplitz-block (BTTB) structure. This permits a matrix-vector product to be performed in operations via the fast-Fourier transform (FFT), where is the number of voxels in the gridĀ [10, 11]. Therefore, the cost of solving the linear system via an iterative method is , where is the number of iterations required for convergence.
In principle, this modest growth of computational cost as increases should enable simulations for extremely large scattering obstacles to be efficiently performed. However, it is well known that, as the optical size and refractive index of the scatterer grow, becomes prohibitively large (see, e.g., [4]). In fact, as we shall observe later, , where is the size parameter of the particle, with the wavenumber and the radius of the obstacleās smallest circumscribing sphere. In ice crystal simulations, for example, the scatterer may be up to a hundred wavelengths across for which is so large as to make DDA infeasible. Therefore, an effective preconditioning strategy is required to temper the growth of .
This paper revisits the well-established circulant preconditioning techniques of Chan and OlkinĀ [12, 13], in which the underlying Toeplitz matrix is approximated by a circulant, and applies it within the DDA context for the first time. This builds on work by the present authors where a similar approach was applied within a silicon photonics context in which the structures of interest typically have extreme length in one of the three physical dimensionsĀ [14, 15, 16].
To clarify briefly this approach, consider the structure of the DDA linear system. It is of the form
[TABLE]
where D is a diagonal matrix containing the polarizabilities of the dipoles and T is a BTTB matrix with three levels of Toeplitz structure, obtained from discretizing the dyadic Greenās function over the three-dimensional voxelized grid. For optically large scatterers, solving this system iteratively is expensive due to a large , hence we seek an appropriate preconditioner P such that the modified system
[TABLE]
is more efficiently solved, i.e., is drastically reduced.
In order for P to be effective, it must also be reasonably cheap to construct and invert. In our Toeplitz setting, a natural candidate for P is a circulant matrix. Circulant matrices constitute a special class of Toeplitz matrix with the additional desirable property that they are diagonalized by the discrete Fourier transform, and hence can be inverted in . Circulant preconditioners for Toeplitz systems have proven to be successful in many application areas (see [17] and the references therein), however, to the present authorsā knowledge, they have yet to be applied to DDA for practical EM scattering problems.
The shrewd reader would have observed that the matrix D does not, in general, have a constant diagonal (only when the scatter is a homogeneous cuboid is this constant). Hence, does not inherit the Toeplitz property of T for general scatterers. To circumvent this issue, we employ a simple averaging strategy to create a preconditioner of the form , where has constant diagonal and is constructed by averaging the diagonal of D, and C is a circulant approximation to T Ć la Chan and OlkinĀ [12]. Since has constant diagonal, then inherits the circulant nature of C, and hence the preconditioner P can be cheaply inverted.
The example scatterers considered in this article are homogeneous hexagonal ice prisms of various size parameters and refractive indices. The discretized domain is the smallest box enclosing the scatterer (this is required for the FFT acceleration) so that the values along the diagonal of D correspond to those of ice and air voxels. Here we have chosen to construct as , where I is the identity matrix, i.e., the constant diagonal of is the modal average of the diagonal entries in D. For the hexagonal prisms considered here, this means that the preconditioner corresponds to the total bounding box āfilled inā with ice. As we see in SectionĀ 4, this is an effective strategy for the ice crystal examples considered in this article. Improvements may potentially be made by considering either a different averaging technique or a more sophisticated āToeplitz-plus-diagonalā preconditioner, as in [18], however these ideas are not explored here.
In terms of previous work on preconditioning for DDA, there appears to have been little development. Some brief experiments were presented in [19] where the simple point-Jacobi and Neumann polynomial preconditioners were used. However, large size parameters were not investigated and the improvement for small size parameters is extremely modest, if anything at all. More general preconditioning strategies exist, such as incomplete-LU, block-JacobiĀ [20], and the inverse fast multipole methodĀ [21], but these are potentially expensive, are not effective for high frequency problems, or are complicated to implement. The distinct advantages of circulant preconditioning are that it is well suited to the Toeplitz structure of DDA, is comparatively straightforward to implement, and it is inexpensive. Furthermore, as we present in SectionĀ 4, circulant preconditioning performs extremely well for the ice crystal scattering simulations considered here, providing speed-up factors of ten or more, and in many cases it enables previously inaccessible simulations to be performed on a desktop PC. Therefore, we believe that this paper presents the first viable preconditioning approach for an important class of DDA scattering simulations.
The layout of the paper is as follows. In SectionĀ 2, we provide details of the standard DDA formulation of [2] employed here. Also details of the Toeplitz structure of the system matrix are provided to facilitate the description of the circulant approximation in the following section. In SectionĀ 3, we review circulant preconditioning applied to Toeplitz matrices and its extension to Toeplitz-block matrices. In particular, we describe how the general approach of [12] is applied to our particular BTTB DDA matrix. Details of the algorithmic costings of assembling and applying the preconditioner are provided. We also present some pseudocode to help readers incorporate this preconditioning strategy into their own DDA codes. In SectionĀ 4, we consider the scattering of a polarized plane wave by hexagonal prisms of refractive indices and of size parameters . We present the CPU times and iteration counts for the unpreconditioned and preconditioned iterative solves of the arising linear systems, using both GMRES and BiCG-Stab. For smaller size parameters, little gain is achieved, but for large size parameters, we observe acceleration factors of ten or more. In fact, for the largest size parameters, where unpreconditioned DDA fails to converge, we achieve convergence with the preconditioned DDA within an acceptable number of iterations. In SectionĀ 5, we provide some concluding remarks and ideas for the further development of the preconditioning strategy.
A Matlab implementation is openly available online (https://github.com/samuelpgroth/VoxScatter) which we hope will be useful to students and those wishing to develop this work further.
2 Integral equation formulation
The discrete dipole approximation, in its many forms, begins with the following integral equation representation for the time-harmonic () electric field in the presence of a non-magnetic dielectric body :
[TABLE]
where is the incident field and is the electric susceptibility, with the relative permittivity. The dyadic Greenās function, , is defined as
[TABLE]
where , , and is the identity matrixĀ [2, 4]. Reordering (3) to obtain an integral equation for the unknown field gives
[TABLE]
where is the integral operator
[TABLE]
In the DDA of Purcell and PennypackerĀ [1], and Draine and FlatauĀ [2], equation (5) is phrased in terms of the unknown polarization rather than the electric field. The polarization is defined as
[TABLE]
and so the integral equation becomes
[TABLE]
for (of course where so we can neglect the contributions from such voxels). The formulation (8) is seen as desirable since there exists an accurate approximation for the self term via the Clausius-Mossotti relation. This enables the complicated evaluation of the singular portion of the integral to be sidestepped.
In this paper, we solve (8) via the āclassicalā DDA approach as expounded in, for example, [1, 2]. A more rigorous approach would be to solve (8) via Galerkinās method and evaluate the resulting double integrals with sophisticated numerical quadrature, as is done in [6] where it is used for magnetic resonance applications. Here we choose to present results for the simpler DDA approach, but point out that the preconditioning strategy presented can be applied to any volume integral equation scheme (e.g., DDA, Galerkin, collocation). The only requirement is that a cuboidal discretization grid is used, so that the resulting linear system has Toeplitz structure.
2.1 Discrete Dipole Approximation
DDA can be viewed as a collocation approach for solving equation (5) in which the singular self-term integrals are evaluated using semi-analytical means (namely, the Claussius-Mossotti relation) and the non-singular integrals are evaluated using the midpoint quadrature rule. We briefly summarize this approach.
Begin by writing the unknown polarization as
[TABLE]
where each basis function is a three-dimensional unit pulse function supported on voxel alone, i.e.,
[TABLE]
are the unknown coefficient vectors, and represents the Hadamard product. Upon substitution of the piecewise constant representation (9) into the integral equation (8), and then forcing this to be exact at the voxel centers, we obtain the linear system of equations
[TABLE]
for . This is the collocation approach for the solution of (8). We observe that when (the self term), the integral in (10) is singular. In DDA schemes, this self term is given explicitly via the Clausius-Mossotti relation:
[TABLE]
with the polarizability of a dipole at location . Here we follow [2] and take , namely the lattice dispersion relation (LDR) correction to the Clausius-Mossotti polarizabilities . The definitions are given as
[TABLE]
where is the relative permittivity of the material occupying the th voxel, and and are unit vectors defining the direction and polarization of the incident field. Note that we use slightly different definitions of and to that used in [2], namely we omit the scaling by the voxel volume. But this difference is compensated for in our modified definition of the polarization (7), which we choose to fall in line with the more standard definition.
The non-singular integrals () are evaluated using the midpoint quadrature rule:
[TABLE]
where is the voxel dimension (see SectionĀ 2.2). Such a crude quadrature scheme is accurate only for well-separated voxels. For nearby voxels, where the integral is close to singular, the midpoint rule is inaccurate. Schemes such as the digitized Greenās functionĀ [3], coupled dipole methodĀ [22], and the Galerkin implementationĀ [6] use rigorous quadrature techniques to evaluate these integrals more accurately, and so are more accurate in general, and particularly for large permittivities. However, here we choose the more classical approach for simplicity. The important point is that both approaches are based on voxel discretizations, so can employ the preconditioning strategy proposed in this article.
2.2 Voxel discretization and Toeplitz structure
Although tetrahedral discretizations (e.g., [23]) can provide a more accurate geometrical representation, voxel discretizations have proven popular owing to the fact that they lead to a discrete system of convolution form, and hence permit a fast matrix-vector product via the FFT.
We begin the discretization by choosing an appropriate voxel dimension . Typically is chosen so that in order to ensure an accurate approximation, where is the wavelength of the incident field. In this article, we take to enable rapid simulations with meaningful results. Then a box bounding the scatterer is constructed, of dimension so that the voxel grid consists of voxels.
Discretizing the linear system of equations (10) over the voxel grid using (11) and (12), and using the ordering described in AlgorithmĀ 1 leads to a discrete system of the form
[TABLE]
The blocks are diagonal and each of the blocks has BTTB structure on three levels, corresponding to the three physical dimensions of the problem. Note the symmetry in these blocks: only six of them are unique. Further, each of these blocks is either symmetric or anti-symmetric. This, combined with their BTTB structure, allows them to be each defined by a single row. Hence the storage cost for the G matrix is .
Further note that if the matrix has a constant diagonal, i.e., the structure is homogeneous, then the matrix inherits the BTTB structure of G. This is the particular case in which circulant preconditioners prove most effective, as we discuss in the following section.
3 Circulant preconditioning
The circulant preconditioners employed here are based on those proposed in [12] for Toeplitz-block matrices, which are an extension of the optimal point-circulant preconditioners of [13]. We review here the salient features of multi-level circulant preconditioners and refer the reader to [12] for further details.
A Toeplitz matrix is Toeplitz if , i.e., the diagonals are constant. Circulant matrices are also Toeplitz but with the additional property that every row of the matrix is a right cyclic shift of the row above, i.e, . Written out, these matrices have the respective forms
[TABLE]
Note that circulant matrices have the desirable property that they are diagonalized by the discrete Fourier matrix , such that , where is a diagonal matrix with the defining column of . Therefore, is inverted via the FFT in operations. For a Toeplitz matrix, T.Ā Chan [13] proposed the optimal point-circulant preconditioner whose entries are given by
[TABLE]
This approximation is optimal in the sense that it is the closest circulant matrix to in the Frobenius norm. There exist other circulant preconditioners (see, for example, the review [24]) and we anticipate the results presented in this paper would be similar if, for example, the Strang circulant preconditionerĀ [25] were instead employed. We choose to employ T.Ā Chanās preconditioner since it is explicitly defined by the simple formula (13) and has been shown to be effective for many Toeplitz problems.
T.Ā Chanās preconditioner was extended to Toeplitz-block matrices in [12]. In our setting, the DDA matrix, , has Toeplitz blocks, each of size . Let us denote such a matrix (although we should keep in mind our DDA matrix). Then its circulant-block approximation, , is obtained by calculating the circulant approximation to each Toeplitz block via (13). These matrices are written as
[TABLE]
and
[TABLE]
where denotes the Chan circulant approximation, defined by (13), to T. Having constructed , we then proceed to calculate its inverse via applications of the FFT. Each circulant block of has the representation . Defining , we then have that
[TABLE]
The matrix is an diagonal-block matrix, where the diagonal blocks have size . As described in [12], this matrix is easily collapsed to a block-diagonal matrix D after multiplication by a permutation matrix P, where
[TABLE]
Therefore, the inverse of is given by
[TABLE]
We term the 1-level circulant preconditioner and illustrate its construction in FigureĀ 1. The cost of the inversion of is dominated by the inversion of the dense blocks , each of size . Therefore, the cost is .
If and are small, can be a cheap preconditioner. If they are not small, one may resort to a second level of circulant approximation, applied this time to each of the dense blocks . In our BTTB case, the blocks are themselves Toeplitz-block, thus allowing the above procedure to be repeated for each , leading to a 2-level circulant preconditioner which we denote by . The matrix can be written as
[TABLE]
where denotes the 1-level circulant approximation
[TABLE]
where are new blocks, of size . The lines above and are to highlight that they are of the dimension appropriate for . An illustration of the 2-level circulant approximation is shown in FigureĀ 2. The resulting block-diagonal matrix has blocks, each of size , which must inverted to obtain for use as our preconditioner:
[TABLE]
Again, the inversion cost is dominated by the inversion of the blocks and so is now .
3.1 Algorithms
We present a few details as to the practical construction and inversion of the 2-level circulant preconditioner . First we remind the reader that matrix we are wishing to approximate is of the form
[TABLE]
where is a diagonal matrix with each entry being the polarizability of the appropriate voxel and G is our BTTB DDA matrix. In general, does not have a constant diagonal unless we are dealing with a homogeneous cuboid. In the examples considered in SectionĀ 4, we deal with homogeneous hexagonal prisms so that the polarizabilites take one of two values, that of the āiceā voxels or of the āairā voxels.
The endeavour is to create the 2-level circulant approximation with the hope that so that it acts as a good preconditioner. To do this, we first construct the 2-level circulant approximation to the BTTB matrix G and denote this . Then we must construct a constant diagonal matrix that approximates , denoting this . Now we have that the matrix
[TABLE]
is also circulant and hence appropriate as our circulant preconditioner. We stress the importance of the construction of the diagonal matrix since, when is not a constant diagonal (i.e., when the scatterer is not a homogeneous cuboid), the matrix does not inherit the circulant properties of and hence is not cheaply inverted. The choice made here is to take as the value of the āiceā voxels, however it may be the case that some derived from averaging over the leads to superior performance (as was seen in [16] for the 1-level circulant preconditoner). But we do not explore this question in this article. The final step is the inversion of , which is performed in a parallel loop over its diagonal blocks, each of dimension .
Now we state a few details about the efficient construction of the circulant approximation to G. This construction can be performed efficiently by exploiting the symmetries within the constituent blocks of . Each of the six unique blocks of has a three-level Toeplitz structure and on each of these levels the elements/blocks are arranged either symmetrically or anti-symmetrically - these symmetries are provided in TableĀ 1.
As an example, let us take the block . First we wish to calculate the 1-level circulant approximation to this block in the -direction. We know that this block is anti-symmetric on the first level, so itās circulant approximation is given as shown in AlgorithmĀ 2; note the minus sign in this version of the circulant approximation (13).
After having performed this circulant approximation for each of the six unique portions of G (taking into account the symmetry or anti-symmetry of each), we generate the defining tensor of the 1-level circulant preconditioner. From this, the full 1-level circulant, as shown in FigureĀ 1, can be constructed.
Here, however, we focus on the 2-level preconditioner so perform one further level of circulant approximation. Considering again the block , we must perform the circulant approximation in the -direction to the tensor , constructed in AlgorithmĀ 2. This is shown in AlgorithmĀ 3. Observe how we now loop over the blocks generated from the first level of circulant approximation. From the tensor we may now generate the full 2-level circulant approximation as shown in FigureĀ 2. This generation of the full preconditioner requires some familiarity with the symmetries of the matrix G as described in TableĀ 1 but is not too complicated.
AlgorithmĀ 2 and AlgorithmĀ 3 serve to illustrate that the generation of the 1- and 2-level circulant approximations are fairly straightforward and can be performed directly from the defining tensor given in AlgorithmĀ 1, which is constructed within all FFT-accelerated DDA implementations.
3.2 Costings
Costings were provided in Chan and OlkinĀ [12] but for a general Toeplitz-block matrix. Here we provide the relevant costings for our symmetric system. FollowingĀ [12] we consider a floating-point operation (flop) as one multiplication plus one addition. We also denote the cost of applying the FFT to a vector of length as fft(), which is typically flops for standard FFT algorithms such as FFTWĀ [26].
First we consider the setup (including the inversion) and per-iteration application costs of the 1-level circulant preconditioner.
1-level: setup
Point-circulant approximation via (13) of the unique blocks of lengthĀ : flops 2. 2.
FFTs of length to generate : 3. 3.
Generate diagonal blocks from using knowledge of BTTB structure and symmetries in TableĀ 1: free. 4. 4.
Inversion of the dense diagonal blocks of size : .
[TABLE]
1-level: application (per iteration)
FFTs, each of length : 2. 2.
Multiplication with the diagonal blocks of size : . 3. 3.
inverse FFTs, each of length :
[TABLE]
Next we consider the setup and per-iteration application costs of the 2-level circulant preconditioner.
2-level setup
Steps 1 and 2 from the 1-level setup to generate the diagonal blocks: 2. 2.
Point-circulant approximation via (13) of the blocks of length : . 3. 3.
FFTs of length to generate : 4. 4.
Generate diagonal blocks from using knowledge of BTTB structure and symmetries in TableĀ 1: free. 5. 5.
Inversion of each of the dense diagonal blocks of size : .
[TABLE]
2-level application (per iteration)
FFTs of length : 2. 2.
For each of the blocks:
- (i)
FFTs of length : 2. (ii)
Multiplication with the blocks of size : 3. (iii)
inverse FFTs of length : 3. 3.
inverse FFTs of length : .
[TABLE]
The setup cost of the 1-level preconditioner is dominated by the block inversion and so the complexity is . For problems in which , this cost is low and hence this preconditioner is feasible - this was seen for silicon photonics geometries in [16]. However, for ice crystal applications the geometries of interest are typically optically large in all three dimensions. For example, a cube with ten wavelengths across (a size parameter of ) discretized at a resolution of 10 voxels per wavelength would require the storage and inversion of 100 dense matrix blocks of dimension , a cost that is extremely demanding of most computers. Switching instead to the 2-level preconditioner, for this example, requires the storage and inversion of dense matrix blocks of dimension , a much more manageable task. Furthermore, this task can be performed in parallel and hence very rapidly, as we shall see in SectionĀ 4. For this reason of cost, we shall be applying only the 2-level preconditioner in this paper. For details on the performance of the 1-level, the reader is referred to [16]. We also note that a 3-level circulant approximation is possible, however it was found in a preliminary study to yield poorer results so we do not consider it here.
In terms of size parameter , since , we summarize the costings in TableĀ 2. For reference, we also provide the costings for assembling the defining tensor of G and performing an MVP with it. We observe from the table that the cost of the preconditioner is greater than that of the integral operator, however not substantially so. Furthermore, the constants are hidden. In SectionĀ 4, we provide timings and memory consumption figures in order to observe these costs in practice.
4 Numerical results
In order to test the performance of the circulant preconditioner for a realistic application, we consider the scattering of a plane wave by hexagonal plates with a variety of refractive indices and size parameters. In particular, we consider the scattering setup shown in FigureĀ 3.
The incident wave is polarized in the -direction and travels in the positive -direction, i.e., it has the form . We consider two different values for the aspect ratioĀ , where is the height of the plate and is the radius of the smallest circumscribing circle of the hexagonal face. The refractive indices considered are and the size parameters are , where . These parameter values are chosen to allow for a soft comparison to [4] where iteration counts are given for DDA, albeit there for scattering by spheres.
We present performance results for the iterative solves of the linear system using both the generalized minimum residual method (GMRES) and the biconjugate gradient stabilized method (BiCG-Stab) on an Intel (R) Xeon (R) CPU E5-2680 v4 @ 2.40GHz machine. BiCG-Stab is a popular iterative solver for DDA since it is fast, however its convergence is not guaranteed. GMRES on the other hand is slower and more memory intensive owing to the storage and use of the Krylov vectors, but its convergence is guaranteed if the entire Krylov subspace is kept, which we do here. We note that using restarted GMRES may lead to superior performance but we do not explore that in this article. As a stopping tolerance for the iterative solvers, we use , following [4].
TableĀ 3 shows the iteration counts and timings for the hexagonal plate of aspect ratio . We are employing GMRES and BiCG-Stab as the iterative solvers and choose to cease the solves if convergence is not achieved within 2000 and 4000 iterations, respectively. One can observe that, with no preconditioning, the iteration count grows approximately as with both GMRES and BiCG-Stab, so that for large values of , DDA simulations are infeasible, which motivates the use of a good preconditioner. We note further that BiCG-Stab is indeed faster than GMRES for many of the unpreconditioned simulations.
With the preconditioner, the iteration count grows much more slowly as the size parameter increases. For , the iteration count growth is (as shown in FigureĀ 4) whereas for larger , the growth is closer to , thereby permitting much larger size parameter simulations compared to without preconditioning. It is worthwhile to observe that BiCG-Stab yields faster preconditioned simulations than GMRES for but for higher values of the refractive index , GMRES proves more reliable. However, for we see that the memory of the machine is exceeded by the Krylov subspace generated by GMRES at the largest size parameters. This suggests that exploring the use of GMRES with restarts would be worthwhile from a performance perspective. In any case, we observe that the preconditioner is providing an excellent improvement in the performance of iterative solvers. To illustrate this further, in FigureĀ 5 we present the convergence of the relative residual of GMRES and BiCG-Stab for .
In terms of timings, for the small size parameters, where little gain is achieved using the preconditioner, the simulation times are comparable between preconditioned and unpreconditioned solves, and the overhead of building the preconditioner is less than a second in all cases. For the larger size parameters, we observe the huge advantage gained by employing the preconditioner. For example, for , the solve with BiCG-Stab without a preconditioner takes 35 minutes, whereas with the preconditioner the solve takes 2.5 minutes (including the 24s preconditioner build time) ā a factor of 14 speed up. So the small overhead time required to build the preconditioner is certainly worth it.
In FigureĀ 6 we present more details of the overhead required to use the preconditioner. FigureĀ 6(a) compares the assembly times of the preconditioner and the integral operator G for growing size parameter . In terms of the assembly, the time for G grows as as can be seen from AlgorithmĀ 1. The cost of assembling the preconditioner is slightly higher and appears to increase as also, which is contrary to our prediction of in SectionĀ 3.2. This is likely due to the fact that the assembly is parallelized and Matlabās matrix inversion routines are extremely efficient and so it takes a very large before the asymptotic range of is reached. In this range of , the preconditioner takes approximately 2.5 times as long to assemble as the operator G but we note that the total simulation time is dominated by the iterative solve, so this increased setup time is worth it as long as the iteration count is reduced sufficiently.
Also in Fig.Ā 6(a) are shown the times required to perform matrix-vector products with the preconditioner as well as with the operator . We observe that they are comparable and agree well with the costings provided in SectionĀ 3.2. Since they are comparable, this suggests that the break even point in using the preconditioner is to reduce the iteration count to approximately half. That is, if the iteration count for the preconditioned solve is smaller than half that of the unpreconditioned solve, then employing the preconditioner is worthwhile. Indeed, we see in TableĀ 3 that this is indeed the case for the majority of the parameter combinations.
Finally, in Fig.Ā 6(b) we compare the memory required to store the preconditioner and integral operator. The memory required to store the preconditioner grows as compared to the required for the operator, with the preconditioner being more expensive for . This increased memory consumption is not problematic for the problems looked at here, however for higher resolution simulations and/or larger scatterers, this may prove a bottleneck. It is likely that some compression of the preconditioner is possible, as was seen in [16] for the 1-level circulant preconditioner.
The final results presented are for the same scattering setup but now with an aspect ratio of , in TableĀ 4. This scatterer is twice as large as the previous one so we would expect that the iteration counts for the unpreconditioned solves are even higher. Indeed we find this is the case - unpreconditioned solves require roughly twice the number of iterations compared to the example. The preconditioned solves also require more iterations but the increase is slightly less severe. The most noteworthy aspect is that now the timings for the preconditioned solves with GMRES are more comparable to those with BiCG-Stab, and certainly more reliable. For , BiCG-Stab struggles to converge except for at the lowest values of . Again we see that for the most challenging problems, the memory consumption of GMRES without restarts is prohibitive, again motivating future numerical experimentation with restarted GMRES.
5 Conclusions and future work
In this paper, we have presented the first application of a multi-level circulant preconditioner to electromagnetic scattering simulations with the discrete dipole approximation. Indeed, we believe that this is the first presentation of a viable preconditioner of any kind for ice crystal simulations within the DDA literature.
In particular, we applied the so-called āoptimalā multi-level preconditioner of Chan and OlkinĀ [12] in the simulation of scattering by homogeneous hexagonal plates of various size parameters and refractive indices. Via a consideration of the symmetrical block-Toeplitz Toeplitz-block structure of the voxel-discretized dyadic Greenās function G, we provided costings for the assembly and application of the 2-level circulant preconditioner. These costings suggest that the number of flops required for the assembly of the preconditioner scales with the size parameter as , compared to the standard DDA cost of . However, it was seen in the numerical experiments that the cost scaling of the preconditioner is milder than this, close to for the size parameters considered. So that the preconditioner is approximately 2.5 times more expensive than the assembly of G, which is of the order of seconds or minutes.
Further, we observed that the cost of applying the preconditioner is almost the same as the cost of an MVP with G, suggesting that a reduction in iteration count by a factor of two is the break-even point. For the vast majority of the parameter combinations considered, a reduction factor far greater than two was achieved. In fact, the iteration count appears to grow as (or even more mildy) compared to the growth seen for unpreconditioned solves. Hence, for larger size parameters, the reduction in solve time achieved by using the preconditioner is greatest. For some parameter combinations, the preconditioned solves were up to 15 times faster. More remarkably, for large size parameter and large refractive index scatterers, the preconditioner enables previously infeasible problems to become tractable, thereby enabling a wider applicability of DDA.
This work therefore has shown that circulant preconditioners for DDA simulations can be extremely effective. The scatterers considered here, although limited in their variety, are already of importance in atmospheric physics applications. For more general scattering setups, further experimentation and, potentially, development is required, however it seems probable that a circulant preconditioning strategy will prove helpful. We conclude by discussing some directions for future development of this work.
Recall that in SectionĀ 3.1 a choice was made in the construction of the 2-level preconditioner. In particular, we created a constant diagonal matrix to replace the matrix . This was done by simply using the value of for āiceā for the air voxels also. Such an approximation step, however, may not be necessary. Instead, investigation of āCirculant-plus-diagonalā preconditioners, such as in Ā [18], may prove fruitful. Or potentially a more sophisticated point-circulant approximation such as was considered in [27] for one-dimensional problems.
Another choice was made in this article. Namely, we chose to perform the circulant approximation in the - and -directions since, for the hexagonal plates considered here, these are the largest dimensions. However, for hexagonal prisms of larger aspect ratio, superior results may be gained by choosing the longitudinal () axis as one of the circulant approximation directions.
Finally, we remark upon the memory consumption of the preconditioner and the GMRES Krylov subspace. The memory consumption of the preconditioner scales as which was not problematic for the simulations performed here. However, for higher resolution and/or larger size parameter simulations, it may be desirable to compress the preconditioner in some way. This was seen to be achievable for the 1-level preconditioner in [16] so it likely that some compression can also be obtained in the 2-level case, which may in turn lead to faster assembly times.
The iterative solves with GMRES were seen to be more reliable than those with BiCG-Stab, however much more memory intensive due the storage of the entire Krylov subspace. The entire Krylov subspace was retained here since it guarantees convergence of solver, but it is not necessary. Experimentation with GMRES with (deflated) restarts (e.g., [28]) would be useful to determine a reliable, fast, and memory efficient strategy for performing the preconditioned solves with GMRES.
Funding
This work was supported by a grant from Skoltech as part of the Skoltech- MIT Next Generation Program, and the Design for Manufacturability (DFM) Methods, PDK Extensions, and Tools for Photonic Systems project sponsored by AIM Photonics.
The reference list from the paper itself. Each links out to its DOI / PubMed record.
- 1[1] E. M. Purcell and C. R. Pennypacker, āScattering and absorption of light by nonspherical dielectric grains,ā The Astrophysical Journal , vol. 186, pp. 705ā714, 1973.
- 2[2] B. T. Draine and P. J. Flatau, āDiscrete-dipole approximation for scattering calculations,ā JOSA A , vol. 11, no. 4, pp. 1491ā1499, 1994.
- 3[3] G. H. Goedecke and S. G. OāBrien, āScattering by irregular inhomogeneous particles via the digitized Greenās function algorithm,ā Applied Optics , vol. 27, no. 12, pp. 2431ā2438, 1988.
- 4[4] M. A. Yurkin, V. P. Maltsev, and A. G. Hoekstra, āThe discrete dipole approximation for simulation of light scattering by particles much larger than the wavelength,ā Journal of Quantitative Spectroscopy and Radiative Transfer , vol. 106, no. 1-3, pp. 546ā557, 2007.
- 5[5] T. Nousiainen, E. Zubko, J. V. Niemi, K. Kupiainen, M. Lehtinen, K. Muinonen, and G. Videen, āSingle-scattering modeling of thin, birefringent mineral-dust flakes using the discrete-dipole approximation,ā Journal of Geophysical Research: Atmospheres , vol. 114, no. D 7, 2009.
- 6[6] A. Polimeridis, J. F. Villena, L. Daniel, and J. White, āStable FFT-JVIE solvers for fast analysis of highly inhomogeneous dielectric objects,ā Journal of Computational Physics , vol. 269, pp. 280ā296, 2014.
- 7[7] M. A. Yurkin, K. A. Semyanov, P. A. Tarasov, A. V. Chernyshev, A. G. Hoekstra, and V. P. Maltsev, āExperimental and theoretical study of light scattering by individual mature red blood cells by use of scanning flow cytometry and a discrete dipole approximation,ā Applied Optics , vol. 44, no. 25, pp. 5249ā5256, 2005.
- 8[8] T. A. Nieminen, V. L. Loke, A. B. Stilgoe, G. Knƶner, A. M. BraÅczyk, N. R. Heckenberg, and H. Rubinsztein-Dunlop, āOptical tweezers computational toolbox,ā Journal of Optics A: Pure and Applied Optics , vol. 9, no. 8, p. S 196, 2007.
