h-multigrid agglomeration based solution strategies for discontinuous Galerkin discretizations of incompressible flow problems
Lorenzo Botti, Alessandro Colombo, Francesco Bassi

TL;DR
This paper develops h-multigrid preconditioners using agglomeration for discontinuous Galerkin methods to efficiently solve complex incompressible flow problems, demonstrating significant time savings in 2D and 3D cases.
Contribution
It introduces an agglomeration-based h-multigrid approach with efficient coarse grid operator inheritance for DG discretizations of flow equations.
Findings
Achieves uniform convergence with proper stabilization on coarse levels.
Demonstrates significant execution time reductions in 2D and 3D simulations.
Effectively handles complex unstructured meshes and real-life problems.
Abstract
In this work we exploit agglomeration based -multigrid preconditioners to speed-up the iterative solution of discontinuous Galerkin discretizations of the Stokes and Navier-Stokes equations. As a distinctive feature -coarsened mesh sequences are generated by recursive agglomeration of a fine grid, admitting arbitrarily unstructured grids of complex domains, and agglomeration based discontinuous Galerkin discretizations are employed to deal with agglomerated elements of coarse levels. Both the expense of building coarse grid operators and the performance of the resulting multigrid iteration are investigated. For the sake of efficiency coarse grid operators are inherited through element-by-element projections, avoiding the cost of numerical integration over agglomerated elements. Specific care is devoted to the projection of viscous terms discretized by means of the BR2 dG…
| -coarsened quadrilateral mesh sequences | |||||
|---|---|---|---|---|---|
| 1208 | 365 | 109 | 34 | 11 | |
| 4824 | 1447 | 437 | 136 | 41 | |
| 19218 | 5791 | 1754 | 535 | 161 | |
| 76880 | 23087 | 6976 | 2116 | 643 | |
| -coarsened triangular mesh sequences | |||||
| 541 | 157 | 46 | 13 | 4 | |
| 2290 | 660 | 194 | 57 | 17 | |
| 9287 | 2683 | 780 | 231 | 66 | |
| 37551 | 10956 | 3214 | 935 | 274 | |
| 3D -coarsened mesh sequences, grid partition size | ||||||
|---|---|---|---|---|---|---|
| processes () | ||||||
| 16 | 131078 | 19378 | 2895 | 434 | 64 | |
| 131066 | 19258 | 2846 | 416 | 60 | ||
| 32 | 65541 | 9763 | 1468 | 219 | 34 | |
| 65531 | 9666 | 1426 | 207 | 28 | ||
| 64 | 32770 | 4878 | 734 | 111 | 17 | |
| 32766 | 4809 | 708 | 104 | 14 | ||
| 128 | 16392 | 2454 | 372 | 57 | 9 | |
| 16380 | 2407 | 353 | 50 | 7 | ||
| Linear solver iterations, 2D Poisson problem, quadrilateral mesh sequences | |||||||||
|---|---|---|---|---|---|---|---|---|---|
| operators | inherited | non-inherited | rescaled-inherited | ||||||
| grid () | |||||||||
| FGMRES = 2 | 12 | 12 | 13 | 10 | 10 | 10 | 10 | 10 | 10 |
| FGMRES = 3 | 15 | 15 | 15 | 10 | 10 | 10 | 10 | 11 | 11 |
| FGMRES = 4 | 19 | 19 | 20 | 10 | 10 | 10 | 10 | 11 | 11 |
| FGMRES = 5 | 24 | 24 | 25 | 10 | 10 | 10 | 11 | 11 | 11 |
| CG ILU(0) | 206 | 398 | 743 | ||||||
| GMRES(120) ILU(0) | 229 | 566 | 1255 | ||||||
| FGMRES = 2 | 9 | 9 | 9 | 8 | 8 | 8 | 8 | 8 | 8 |
| FGMRES = 3 | 16 | 15 | 15 | 8 | 8 | 8 | 8 | 8 | 8 |
| FGMRES = 4 | 27 | 27 | 26 | 9 | 8 | 8 | 8 | 8 | 8 |
| FGMRES = 5 | 60 | 45 | 49 | 9 | 9 | 8 | 9 | 9 | 8 |
| CG ILU(0) | 230 | 446 | 827 | ||||||
| GMRES(120) ILU(0) | 312 | 811 | 1925 | ||||||
| FGMRES = 2 | 8 | 7 | 7 | 7 | 7 | 6 | 7 | 6 | 6 |
| FGMRES = 3 | 12 | 12 | 13 | 7 | 7 | 6 | 7 | 7 | 6 |
| FGMRES = 4 | 20 | 21 | 23 | 8 | 7 | 7 | 8 | 7 | 7 |
| FGMRES = 5 | 37 | 39 | 42 | 8 | 8 | 7 | 8 | 8 | 7 |
| CG ILU(0) | 250 | 480 | 899 | ||||||
| GMRES(120) ILU(0) | 310 | 1015 | 2000∗ | ||||||
| Linear solver iterations, 2D Poisson problem, triangular mesh sequences | |||||||||
|---|---|---|---|---|---|---|---|---|---|
| operators | inherited | non-inherited | rescaled-inherited | ||||||
| grid () | |||||||||
| FGMRES = 2 | 19 | 22 | 28 | 24 | 29 | 37 | 19 | 23 | 30 |
| FGMRES = 3 | 20 | 23 | 28 | 24 | 29 | 38 | 19 | 24 | 30 |
| FGMRES = 4 | 22 | 24 | 29 | 24 | 29 | 38 | 19 | 23 | 30 |
| FGMRES = 5 | 26 | 29 | 32 | 24 | 29 | 38 | 19 | 23 | 30 |
| CG ILU(0) | 250 | 509 | 1011 | ||||||
| GMRES(120) ILU(0) | 298 | 677 | 1520 | ||||||
| FGMRES = 2 | 17 | 19 | 24 | 21 | 26 | 37 | 17 | 20 | 27 |
| FGMRES = 3 | 20 | 22 | 26 | 21 | 26 | 37 | 17 | 21 | 27 |
| FGMRES = 4 | 26 | 32 | 37 | 21 | 26 | 38 | 17 | 21 | 27 |
| FGMRES = 5 | 44 | 61 | 77 | 21 | 27 | 38 | 17 | 21 | 27 |
| CG ILU(0) | 297 | 599 | 1197 | ||||||
| GMRES(120) ILU(0) | 330 | 866 | 2000∗ | ||||||
| FGMRES = 2 | 16 | 17 | 20 | 17 | 20 | 28 | 15 | 18 | 21 |
| FGMRES = 3 | 18 | 19 | 23 | 17 | 21 | 29 | 16 | 18 | 21 |
| FGMRES = 4 | 25 | 25 | 31 | 18 | 21 | 29 | 16 | 18 | 22 |
| FGMRES = 5 | 71 | 43 | 67 | 18 | 22 | 29 | 17 | 19 | 22 |
| CG ILU(0) | 334 | 664 | 1324 | ||||||
| GMRES(120) ILU(0) | 396 | 911 | 2000∗ | ||||||
| Total CPU time (s), 2D Poisson problem, quadrilateral mesh sequences | |||||||||
|---|---|---|---|---|---|---|---|---|---|
| operators | inherited | non-inherited | rescaled-inherited | ||||||
| grid () | |||||||||
| FGMRES = 2 | 1.58 | 5.95 | 27.4 | 2.03 | 7.23 | 31.7 | 1.49 | 5.70 | 24.5 |
| FGMRES = 3 | 1.72 | 6.70 | 28.1 | 2.10 | 7.89 | 32.3 | 1.69 | 6.11 | 25.9 |
| FGMRES = 4 | 1.92 | 7.78 | 31.2 | 2.27 | 8.61 | 34.6 | 1.60 | 6.29 | 25.1 |
| FGMRES = 5 | 2.14 | 8.32 | 36.0 | 2.67 | 8.95 | 36.9 | 1.57 | 6.69 | 26.3 |
| CG ILU(0) | 1.45 | 8.01 | 48.7 | ||||||
| GMRES(120) ILU(0) | 3.01 | 28.1 | 230 | ||||||
| FGMRES = 2 | 2.75 | 11.3 | 54.4 | 3.30 | 13.6 | 64.4 | 2.43 | 10.3 | 51.1 |
| FGMRES = 3 | 3.49 | 12.0 | 54.2 | 3.34 | 14.1 | 58.3 | 2.43 | 9.4 | 42.1 |
| FGMRES = 4 | 4.12 | 16.7 | 73.9 | 3.36 | 13.8 | 57.9 | 2.43 | 9.3 | 41.4 |
| FGMRES = 5 | 7.18 | 24.3 | 115 | 3.74 | 15.6 | 60.3 | 2.43 | 9.7 | 41.7 |
| CG ILU(0) | 3.64 | 22.1 | 143 | ||||||
| GMRES(120) ILU(0) | 7.90 | 86.7 | 811 | ||||||
| FGMRES = 2 | 6.62 | 30.3 | 195 | 8.40 | 37.5 | 226 | 6.65 | 29.4 | 191 |
| FGMRES = 3 | 6.03 | 26.1 | 126 | 7.46 | 30.0 | 133 | 5.07 | 22.0 | 96.2 |
| FGMRES = 4 | 7.91 | 34.6 | 160 | 7.59 | 32.1 | 129 | 5.30 | 20.9 | 88.9 |
| FGMRES = 5 | 11.9 | 52.5 | 242 | 8.17 | 33.4 | 134 | 5.38 | 21.8 | 87.7 |
| CG ILU(0) | 11.0 | 68.4 | 475 | ||||||
| GMRES(120) ILU(0) | 19.3 | 240 | 1851∗ | ||||||
| vs CG | Total CPU time speedup | ||||||||
|---|---|---|---|---|---|---|---|---|---|
| degree | |||||||||
| quad grid | |||||||||
| CG/MG step time | 0.92 | 1.2 | 1.8 | 1.5 | 2.3 | 3.4 | 2.0 | 3.1 | 5.4 |
| tri grid | |||||||||
| CG/MG step time | 0.95 | 1.6 | 2.4 | 1.4 | 2.3 | 3.5 | 1.7 | 3.1 | 5.0 |
| CG() vs | Total CPU time ratio | |||||
|---|---|---|---|---|---|---|
| solver(degree) | CG(1)/(2) | CG(2)/(3) | CG(1)/(3) | |||
| finest grid | quad | tri | quads | tri | quads | tri |
| CG/MG step time | 0.95 | 1.0 | 1.6 | 1.3 | 0.55 | 0.37 |
| Preprocessing CPU time (s), quadrilateral elements mesh sequence | ||||||||||||
|---|---|---|---|---|---|---|---|---|---|---|---|---|
| task | grid topology | orthogonalization and intergrid operators | ||||||||||
| k=1 | k=2 | k=3 | ||||||||||
| grid | ||||||||||||
| = 2 | 0.72 | 3.28 | 22.1 | 0.17 | 0.64 | 2.79 | 0.34 | 1.37 | 5.36 | 0.72 | 2.63 | 10.5 |
| = 3 | 0.75 | 3.84 | 25.1 | 0.21 | 0.80 | 4.01 | 0.45 | 1.68 | 6.82 | 0.97 | 3.72 | 15.1 |
| = 4 | 0.82 | 4.22 | 26.8 | 0.25 | 1.00 | 4.68 | 0.58 | 2.11 | 8.24 | 1.26 | 4.76 | 19.3 |
| = 5 | 0.88 | 4.65 | 28.1 | 0.29 | 1.19 | 5.46 | 0.67 | 2.53 | 10.4 | 1.57 | 5.95 | 24.2 |
| = 0 | 0.09 | 0.29 | 1.13 | 0.03 | 0.10 | 0.38 | 0.07 | 0.22 | 0.83 | 0.13 | 0.42 | 1.77 |
| Linear solver iterations, 3D Poisson problem in parallel | |||||||||
|---|---|---|---|---|---|---|---|---|---|
| k | 1 | 2 | 3 | ||||||
| processes | 8 | 16 | 32 | 16 | 32 | 64 | 32 | 64 | 128 |
| FGMRES = 2 | 11 | 11 | 11 | 10 | 10 | 10 | 9 | 9 | 10 |
| FGMRES = 3 | 12 | 12 | 12 | 11 | 11 | 11 | 10 | 10 | 11 |
| FGMRES = 4 | 12 | 12 | 12 | 11 | 11 | 11 | 10 | 11 | 11 |
| CG BJACOBI(ILU) | 488 | 509 | 530 | 680 | 708 | 687 | 567 | 856 | 898 |
| GMRES ASM(1,ILU) | 432 | 467 | 455 | 675 | 699 | 614 | 868 | 794 | 745 |
| CPU time (s) | solution | assembly | total | ||||||
|---|---|---|---|---|---|---|---|---|---|
| processes | 8 | 16 | 32 | 8 | 16 | 32 | 8 | 16 | 32 |
| FGMRES = 2 | 12.2 | 5.94 | 5.11 | 28.3 | 13.2 | 6.26 | 40.5 | 19.2 | 11.4 |
| FGMRES = 3 | 11.6 | 5.85 | 3.12 | 28.8 | 12.8 | 6.72 | 40.4 | 18.7 | 9.84 |
| FGMRES = 4 | 11.6 | 5.82 | 3.26 | 28.8 | 12.9 | 6.27 | 37.4 | 18.7 | 9.53 |
| CG BJACOBI(ILU) | 55.7 | 29.3 | 15.1 | 25.6 | 12.8 | 6.71 | 88.4 | 42.2 | 21.8 |
| GMRES(120) ASM(1,ILU) | 144 | 80.8 | 41.3 | 25.9 | 12.9 | 6.74 | 170 | 93.8 | 48.0 |
| processes | 16 | 32 | 64 | 16 | 32 | 64 | 16 | 32 | 64 |
| FGMRES = 2 | 32.7 | 26.1 | 17.9 | 45.6 | 23.3 | 11.8 | 78.3 | 49.3 | 29.8 |
| FGMRES = 3 | 26.4 | 14.3 | 7.95 | 47.2 | 24.1 | 11.8 | 73.6 | 38.4 | 19.7 |
| FGMRES = 4 | 25.8 | 14.2 | 7.92 | 48.2 | 24.4 | 12.5 | 74.0 | 38.6 | 20.4 |
| CG BJACOBI(ILU) | 224 | 109 | 54.6 | 45.4 | 22.9 | 11.4 | 269 | 132 | 66.1 |
| GMRES(120) ASM(1,ILU) | 411 | 237 | 100 | 45.4 | 23.2 | 11.4 | 457 | 261 | 111 |
| processes | 32 | 64 | 128 | 32 | 64 | 128 | 32 | 64 | 128 |
| FGMRES = 2 | 111 | 54.2 | 36.9 | 98.4 | 49.6 | 25.1 | 209 | 104 | 62.0 |
| FGMRES = 3 | 60.9 | 24.1 | 15.1 | 101 | 50.2 | 26.1 | 162 | 74.2 | 41.2 |
| FGMRES = 4 | 53.8 | 30.2 | 14.7 | 103 | 51.9 | 26.4 | 157 | 81.4 | 41.1 |
| CG BJACOBI(ILU) | 472 | 305 | 172 | 94.5 | 46.5 | 23.5 | 567 | 352 | 195 |
| GMRES(120) ASM(1,ILU) | 729 | 386 | 236 | 94.6 | 47.3 | 24.1 | 824 | 433 | 262 |
| CPU time (s) | grid topology computation | ||||
|---|---|---|---|---|---|
| processes | 8 | 16 | 32 | 64 | 128 |
| = 2 | 31.3 | 17.1 | 7.15 | 2.88 | 1.77 |
| = 3 | 35.2 | 19.4 | 8.94 | 3.65 | 2.13 |
| = 4 | 41.7 | 22.5 | 10.2 | 4.90 | 2.52 |
| = 0 | 6.05 | 4.91 | 1.71 | 0.75 | 0.45 |
| CPU time (s) | orthonormalization and intergrid operators | ||||||||
|---|---|---|---|---|---|---|---|---|---|
| k=1 | k=2 | k=3 | |||||||
| processes | 8 | 16 | 32 | 16 | 32 | 64 | 32 | 64 | 128 |
| = 2 | 5.58 | 2.78 | 1.51 | 9.65 | 4.95 | 2.88 | 24.4 | 12.3 | 6.69 |
| = 3 | 7.75 | 3.88 | 2.02 | 13.9 | 7.73 | 3.65 | 37.6 | 19.7 | 10.7 |
| = 4 | 11.5 | 5.81 | 2.95 | 22.8 | 9.85 | 5.58 | 61.9 | 31.1 | 18.4 |
| = 0 | 1.03 | 0.52 | 0.26 | 1.88 | 0.96 | 0.48 | 4.06 | 2.04 | 1.11 |
| vs CG | Total CPU time speedup, 2M hex elems grid | ||||||||
|---|---|---|---|---|---|---|---|---|---|
| degree | |||||||||
| processes | 8 | 16 | 32 | 16 | 32 | 64 | 32 | 64 | 128 |
| CG/MG step time | 2.4 | 2.3 | 2.3 | 3.6 | 3.4 | 3.2 | 3.6 | 4.3 | 4.7 |
| CG() vs | Total CPU time ratio | ||
|---|---|---|---|
| solver(degree) | CG(1)/(2) | CG(2)/(3) | CG(1)/(3) |
| CG/MG step time | 1.1 | 1.6 | 0.53 |
| Linear solver iterations, 2D Stokes problem, quadrilateral mesh sequence | |||||||||
| solver | FGMRES | SchurCompl | GMRES ILU | ||||||
| 2 | 3 | 4 | 5 | 2 | 3 | 4 | 5 | 0 | |
| grid | |||||||||
| 25 | 26 | 27 | 27 | 11 | 11 | 11 | 11 | 1013 | |
| 25 | 27 | 27 | 27 | 11 | 12 | 11 | 11 | 2000∗ | |
| 26 | 27 | 28 | 28 | 11 | 12 | 12 | 12 | 2000∗ | |
| 27 | 27 | 28 | 28 | 11 | 12 | 12 | 12 | 2000∗ | |
| 16 | 16 | 17 | 17 | 10 | 10 | 10 | 10 | 531 | |
| 16 | 16 | 17 | 17 | 10 | 10 | 10 | 10 | 2000∗ | |
| 16 | 16 | 17 | 17 | 10 | 10 | 10 | 10 | 2000∗ | |
| 16 | 16 | 17 | 17 | 10 | 10 | 10 | 10 | 2000∗ | |
| 13 | 14 | 14 | 14 | 10 | 10 | 10 | 10 | 597 | |
| 13 | 14 | 14 | 14 | 10 | 10 | 10 | 10 | 2000∗ | |
| 13 | 14 | 14 | 14 | 10 | 10 | 10 | 10 | 2000∗ | |
| 18 | 14 | 14 | 14 | 10 | 10 | 10 | 10 | 2000∗ | |
| Linear solver iterations, 2D Stokes problem, triangular mesh sequence | |||||||||||||
| solver | FGMRES | SchurCompl | GMRES | ||||||||||
| it | 1 | 2 | 2 | ||||||||||
| 2 | 3 | 4 | 5 | 2 | 3 | 4 | 5 | 2 | 3 | 4 | 5 | 0 | |
| grid | |||||||||||||
| 35 | 35 | 35 | 35 | 20 | 20 | 20 | 20 | 28 | 29 | 29 | 30 | 464 | |
| 39 | 39 | 39 | 39 | 23 | 23 | 23 | 23 | 44 | 45 | 45 | 46 | 1433 | |
| 46 | 47 | 47 | 47 | 26 | 26 | 26 | 26 | 70 | 70 | 70 | 71 | 2000∗ | |
| 51 | 51 | 51 | 51 | 29 | 29 | 29 | 29 | 86 | 86 | 86 | 86 | 2000∗ | |
| 29 | 29 | 29 | 29 | 16 | 16 | 16 | 16 | 29 | 29 | 29 | 29 | 374 | |
| 36 | 36 | 36 | 36 | 22 | 22 | 22 | 22 | 45 | 45 | 45 | 45 | 1526 | |
| 55 | 55 | 55 | 55 | 28 | 28 | 28 | 28 | 81 | 80 | 81 | 81 | 2000∗ | |
| 73 | 70 | 75 | 76 | 41 | 41 | 41 | 41 | 64 | 63 | 66 | 65 | 2000∗ | |
| 27 | 27 | 27 | 27 | 14 | 14 | 14 | 14 | 39 | 39 | 39 | 39 | 437 | |
| 36 | 36 | 36 | 36 | 19 | 19 | 19 | 19 | 57 | 57 | 57 | 57 | 1863 | |
| 44 | 44 | 44 | 44 | 26 | 26 | 26 | 26 | 72 | 72 | 72 | 72 | 2000∗ | |
| 63 | 63 | 63 | 63 | 40 | 40 | 40 | 40 | 119 | 119 | 119 | 120 | 2000∗ | |
| Convergence factor , 2D Stokes problem | ||||||||
|---|---|---|---|---|---|---|---|---|
| grid | quadrilateral meshes | (triangular meshes) | ||||||
| solver | FGMRES | SchurCompl | FGMRES | SchurCompl | ||||
| it | 1 | 1 | 1 | 2 | ||||
| 2 | 5 | 2 | 5 | 2 | 5 | 2 | 5 | |
| grid size | ||||||||
| .324 | .349 | .070 | .075 | .456 | .456 | .372 | .387 | |
| .328 | .354 | .078 | .079 | .497 | .498 | .536 | .544 | |
| .337 | .364 | .078 | .089 | .554 | .556 | .676 | .679 | |
| .348 | .370 | .079 | .087 | .583 | .584 | .728 | .727 | |
| .162 | .186 | .053 | .055 | .382 | .383 | .383 | .383 | |
| .165 | .193 | .052 | .053 | .470 | .471 | .544 | .543 | |
| .165 | .196 | .055 | .056 | .608 | .606 | .711 | .712 | |
| .167 | .196 | .051 | .054 | .687 | .695 | .651 | .658 | |
| .116 | .124 | .060 | .061 | .349 | .349 | .495 | .494 | |
| .117 | .127 | .051 | .051 | .464 | .465 | .616 | .617 | |
| .118 | .129 | .053 | .051 | .539 | .540 | .682 | .682 | |
| .205 | .130 | .051 | .051 | .648 | .648 | .792 | .794 | |
| Total CPU time (s), 2D Stokes problem, quadrilateral mesh sequence | ||||||||||
| solver | FGMRES | SchurCompl | GMRES | LU | ||||||
| 2 | 3 | 4 | 5 | 2 | 3 | 4 | 5 | 0 | 0 | |
| grid | ||||||||||
| 1.03 | 0.91 | 0.91 | 0.89 | 1.61 | 2.0 | 1.86 | 1.92 | 11.1 | 1.33 | |
| 5.07 | 3.87 | 3.83 | 3.89 | 7.57 | 8.42 | 7.62 | 7.58 | 90.4* | 10.7 | |
| 42.2 | 17.6 | 16.8 | 16.8 | 31.0 | 36.3 | 35.4 | 37.0 | 349* | 84.0 | |
| 480 | 99.4 | 75.9 | 75.7 | 137 | 151 | 153 | 149 | 1278* | 691 | |
| 2.81 | 1.84 | 1.83 | 1.89 | 5.86 | 5.97 | 6.1 | 6.22 | 11.2 | 6.06 | |
| 13.76 | 7.86 | 7.78 | 7.83 | 25.7 | 27.1 | 28.6 | 28.0 | 168* | 49.9 | |
| 113 | 39.9 | 33.1 | 32.7 | 114 | 110 | 109 | 112 | 713* | 409 | |
| 977 | 288 | 172 | 159 | 497 | 494 | 485 | 547 | 2690* | ||
| 6.19 | 4.47 | 4.44 | 4.51 | 18.0 | 18.1 | 18.5 | 18.6 | 30.2 | 17.2 | |
| 37.2 | 20.8 | 18.6 | 18.2 | 79.8 | 82.4 | 85.6 | 81.8 | 365* | 146 | |
| 301 | 103 | 77.8 | 74.8 | 326 | 325 | 325 | 324 | 1533* | ||
| 3012 | 734 | 381 | 347 | 1419 | 1360 | 1355 | 1350 | 5973* | ||
| Total CPU time (s), 2D Stokes problem, triangular mesh sequence | ||||||||||
| solver | FGMRES | SchurCompl | GMRES | LU | ||||||
| 2 | 3 | 4 | 5 | 2 | 3 | 4 | 5 | 0 | 0 | |
| grid | ||||||||||
| 0.42 | 0.42 | 0.42 | 0.42 | 7.63 | 8.33 | 8.75 | 9.33 | 1.70 | 0.26 | |
| 2.08 | 1.93 | 1.90 | 1.94 | 51.0 | 55.1 | 56.1 | 57.7 | 21.8 | 2.22 | |
| 11.1 | 9.04 | 9.10 | 8.74 | 329 | 346 | 356 | 357 | 121* | 18.9 | |
| 80.2 | 44.5 | 40.3 | 40.6 | 1798 | 1870 | 1895 | 1879 | 522* | 158 | |
| 1.04 | 0.99 | 0.99 | 1.10 | 19.5 | 20.6 | 21.6 | 22.1 | 3.30 | 1.65 | |
| 6.49 | 5.53 | 5.57 | 5.57 | 136 | 144 | 138 | 137 | 58.6 | 14.9 | |
| 39.8 | 29.5 | 28.0 | 26.5 | 1000 | 1082 | 1123 | 1098 | 315* | 129 | |
| 290 | 181 | 164 | 156 | 3433 | 3315 | 3458 | 3397 | 1305* | 1087 | |
| 2.73 | 2.74 | 2.57 | 2.64 | 70.6 | 72.7 | 73.8 | 74.4 | 9.45 | 4.16 | |
| 17.6 | 14.4 | 14.1 | 13.9 | 457 | 472 | 480 | 510 | 160 | 39.5 | |
| 116.8 | 77.5 | 72.1 | 71.0 | 2473 | 2444 | 2459 | 2472 | 689* | 343 | |
| 973 | 484 | 419 | 434 | 12112 | 12360 | 12355 | 12350 | 2782* | ||
| vs LU | Total CPU time speedup | ||||||||||
|---|---|---|---|---|---|---|---|---|---|---|---|
| degree | |||||||||||
| quad grid | |||||||||||
| LU/MG step time | 1.5 | 2.7 | 5 | 9.1 | 3.2 | 6.4 | 12.5 | 3.8 | 8 | ||
| tri grid | |||||||||||
| LU/MG step time | 0.6 | 1.1 | 2.2 | 3.9 | 1.5 | 2.7 | 4.9 | 7 | 1.6 | 2.8 | 4.8 |
| Average linear solver iterations | |||||||||
|---|---|---|---|---|---|---|---|---|---|
| k | 1 | 2 | 3 | ||||||
| grid | |||||||||
| FGMRES = 2 | 7 | 8 | 9 | 8 | 11 | 16 | 8 | 9 | 12 |
| FGMRES = 3 | 8 | 9 | 10 | 10 | 12 | 17 | 9 | 11 | 13 |
| FGMRES = 4 | 9 | 10 | 12 | 11 | 14 | 18 | 11 | 13 | 16 |
| FGMRES = 5 | 10 | 11 | 13 | 12 | 15 | 20 | 12 | 14 | 18 |
| FGMRES = 2 | 5 | 6 | 8 | 7 | 9 | 13 | 6 | 8 | 11 |
| FGMRES = 3 | 5 | 6 | 8 | 7 | 9 | 13 | 6 | 8 | 11 |
| FGMRES = 4 | 5 | 6 | 8 | 7 | 9 | 13 | 6 | 8 | 11 |
| FGMRES = 5 | 5 | 6 | 8 | 7 | 9 | 13 | 6 | 8 | 11 |
| GMRES(200) ILU(0) | 464 | 1589∗ | 449 | 1518 | 579 | 1669∗ | |||
| Average convergence factor, | |||||||||
|---|---|---|---|---|---|---|---|---|---|
| k | 1 | 2 | 3 | ||||||
| grid | |||||||||
| FGMRES = 2 | .23 | .26 | .34 | .31 | .40 | .53 | .28 | .34 | .43 |
| FGMRES = 3 | .29 | .33 | .39 | .36 | .44 | .55 | .35 | .41 | .47 |
| FGMRES = 4 | .33 | .37 | .43 | .40 | .49 | .58 | .40 | .46 | .52 |
| FGMRES = 5 | .36 | .41 | .47 | .43 | .52 | .61 | .42 | .49 | .57 |
| FGMRES = 2 | .14 | .20 | .28 | .22 | .33 | .45 | .20 | .28 | .40 |
| FGMRES = 3 | .14 | .20 | .28 | .23 | .33 | .45 | .20 | .28 | .40 |
| FGMRES = 4 | .14 | .20 | .28 | .23 | .33 | .45 | .21 | .28 | .40 |
| FGMRES = 5 | .14 | .20 | .28 | .23 | .33 | .45 | .21 | .28 | .40 |
| Total CPU time (s), 2D Kovasznay problem, quadrilateral mesh sequence | |||||||||
| solver | FGMRES | FGMRES | GMRES | ||||||
| 2 | 3 | 4 | 5 | 2 | 3 | 4 | 5 | 0 | |
| grid | |||||||||
| 3.18 | 2.90 | 3.08 | 3.24 | 2.86 | 2.47 | 2.48 | 2.56 | 16.5 | |
| 25.6 | 13.4 | 13.3 | 14.0 | 26.2 | 12.4 | 11.0 | 11.1 | 229 | |
| 229 | 79.1 | 59.8 | 62.3 | 294 | 108 | 58.9 | 51.8 | ||
| 11.1 | 8.14 | 8.34 | 8.58 | 13.4 | 8.46 | 8.15 | 8.27 | 46.0 | |
| 109 | 42.9 | 37.9 | 39.2 | 138 | 58.1 | 42.5 | 40.8 | 586 | |
| 1090 | 313 | 176 | 179 | 1589 | 627 | 269 | 215 | ||
| 29.4 | 19.6 | 19.8 | 20.8 | 35.1 | 22.6 | 19.8 | 19.8 | 107 | |
| 283 | 108 | 94.1 | 97.6 | 361 | 152 | 106 | 94.8 | 1640 | |
| 2218 | 829 | 481 | 471 | 3399 | 1563 | 754 | 550 | ||
| CPU time (s) | solution | assembly | total | ||||||
|---|---|---|---|---|---|---|---|---|---|
| grid | |||||||||
| FGMRES MG = 2 | 1.64 | 19.3 | 204 | 1.54 | 6.25 | 24.8 | 3.19 | 25.6 | 229 |
| FGMRES MG = 3 | 1.14 | 6.34 | 50.8 | 1.75 | 7.01 | 28.2 | 2.90 | 13.4 | 79.1 |
| FGMRES MG = 4 | 1.18 | 5.68 | 29.3 | 1.89 | 7.59 | 30.5 | 3.08 | 13.3 | 59.8 |
| FGMRES MG = 5 | 1.25 | 5.97 | 30.1 | 1.99 | 8.01 | 32.1 | 3.24 | 14.0 | 62.3 |
| GMRES(200) ILU(0) | 15.8 | 226 | 0.70 | 2.78 | 16.5 | 229 | |||
| FGMRES MG = 2 | 8.18 | 97.4 | 1048 | 2.90 | 11.7 | 42.5 | 11.1 | 109 | 1090 |
| FGMRES MG = 3 | 4.89 | 30.0 | 271 | 3.24 | 12.9 | 41.4 | 8.14 | 42.9 | 313 |
| FGMRES MG = 4 | 4.88 | 24.5 | 131 | 3.07 | 13.5 | 44.3 | 8.34 | 37.9 | 176 |
| FGMRES MG = 5 | 4.96 | 24.9 | 131 | 3.62 | 14.2 | 47.0 | 8.58 | 39.2 | 179 |
| GMRES(200) ILU(0) | 44.4 | 580 | 1.58 | 5.90 | 46.0 | 586 | |||
| FGMRES MG = 2 | 23.6 | 260 | 2121 | 5.81 | 23.5 | 96.7 | 29.5 | 283 | 2218 |
| FGMRES MG = 3 | 13.2 | 81.8 | 720 | 6.40 | 25.9 | 109 | 19.6 | 108 | 829 |
| FGMRES MG = 4 | 12.9 | 66.3 | 364 | 6.89 | 27.7 | 115 | 19.8 | 94.1 | 480 |
| FGMRES MG = 5 | 13.5 | 68.1 | 349 | 7.27 | 29.6 | 121 | 20.8 | 97.6 | 471 |
| GMRES(200) ILU(0) | 104 | 1626 | 3.01 | 13.4 | 107 | 1640 | |||
| iterations | convergence factor max(avg) | |||
|---|---|---|---|---|
| Re | 1000 | 5000 | 1000 | 5000 |
| FGMRES MG = 3 | 24 | 45 | .556 (.395) | .815 (.537) |
| FGMRES MG = 4 | .646 (.451) | .883 (.567) | ||
| FGMRES MG = 3 | 23 | 53 | .321 (.228) | .638 (.349) |
| FGMRES MG = 4 | .463 (.306) | .753 (.394) | ||
| FGMRES MG = 3 | 23 | 79 | .171 (.126) | .483 (.228) |
| FGMRES MG = 4 | .313 (.199) | .682 (.284) | ||
| FGMRES MG = 3 | 23 | 67 | .126 (.090) | .469 (.225) |
| FGMRES MG = 4 | .201 (.132) | .654 (.289) | ||
| Re 1000 | avg | it | max(avg) | |
|---|---|---|---|---|
| FGMRES MG = 2 | 32 | 16000/2350/350 | 33 | .806 (.509) |
| FGMRES MG = 3 | 16000/2350/350/53 | .942 (.589) | ||
| FGMRES MG = 2 | 64 | 8000/1175/170 | 33 | .535 (.356) |
| FGMRES MG = 3 | 8000/1175/170/25 | .822 (.487) | ||
| FGMRES MG = 2 | 128 | 4000/585/90 | 34 | .392 (.259) |
| FGMRES MG = 3 | 4000/585/90/13 | .630 (.390) |
| avg | time steps | max(avg) | ||
|---|---|---|---|---|
| FGMRES MG = 2 | 16 | 10600/1560/230 | 150 | .644 (.547) |
| FGMRES MG = 2 | 32 | 5300/770/114 | 150 | .678 (.540) |
| FGMRES MG = 2 | 64 | 2650/390/57 | 150 | .631 (.477) |
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
-multigrid agglomeration based solution strategies
for discontinuous Galerkin discretizations of incompressible flow problems
L. Botti
A. Colombo
F. Bassi
Dipartimento di Ingegneria e Scienze Applicate, Università di Bergamo
via Marconi 4, 24044 Dalmine (BG), Italy
Abstract
In this work we exploit agglomeration based -multigrid preconditioners to speed-up the iterative solution of discontinuous Galerkin discretizations of the Stokes and Navier-Stokes equations. As a distinctive feature -coarsened mesh sequences are generated by recursive agglomeration of a fine grid, admitting arbitrarily unstructured grids of complex domains, and agglomeration based discontinuous Galerkin discretizations are employed to deal with agglomerated elements of coarse levels. Both the expense of building coarse grid operators and the performance of the resulting multigrid iteration are investigated. For the sake of efficiency coarse grid operators are inherited through element-by-element projections, avoiding the cost of numerical integration over agglomerated elements. Specific care is devoted to the projection of viscous terms discretized by means of the BR2 dG method. We demonstrate that enforcing the correct amount of stabilization on coarse grids levels is mandatory for achieving uniform convergence with respect to the number of levels. The numerical solution of steady and unsteady, linear and non-linear problems is considered tackling challenging 2D test cases and 3D real life computations on parallel architectures. Significant execution time gains are documented.
keywords:
Multigrid, Agglomeration, Discontinuous Galerkin, Incompressible flow problems, Polyhedral elements
MSC:
[2010] 65N30, 65N55
††journal: Journal Of Computational Physics
1 Introduction
Discontinuous Galerkin (dG) methods have proved to be effective in the CFD field allowing to simulate complex physics in complex domains while guaranteeing accuracy and robustness. Although very popular for compressible fluid flow simulations, their adoption by incompressible fluid flow practitioners is still limited due to difficulties involved in the numerical solution of the Incompressible Navier-Stokes (INS) equations. On the one hand explicit and decoupled time integration strategies (e.g. Pressure Poisson Equation segregated methods) complicate the achievement of high-order pressure accuracy reducing the appeal of high-order accurate spatial discretizations. On the other hand fully implicit fully coupled velocity-pressure spatial discretisations result in systems of Differential Algebraic Equations (DAEs) that are very expensive to solve due to the indefiniteness of the resulting system matrices, their poor spectral properties, and the saddle point nature of the problem [1].
In this work, in order to speed up the numerical solutions of coupled variables dG discretizations of incompressible flow problems, we consider -multigrid solution strategies on -coarsened mesh sequences generated by recursive agglomeration of a fine grid. -multigrid is very attractive from the efficiency viewpoint in the sense that the number of arithmetic operations needed to solve a discrete problem is proportional to the number of degrees of freedom. Convergence factors, that is the average residual decrease at each multigrid iteration, can be made -independent and small.
In the context of dG discretizations -multigrid has been fruitfully applied in practical applications see e.g. [2, 3, 4, 5], while the theoretical and practical investigation of - and -multigrid is more recent. In 2003 Gopalakrishnan and Kanschat [6] analyzed a V-cycle preconditioner for diffusion and advection-diffusion problems. Multigrid algorithms for dG discretizations of elliptic problems were considered by Brenner et al. [7], who proved uniform convergence with respect to the number of levels for F-,V- and W-cycle on graded meshes, and Antonietti et al. [8], who provided similar results for W-cycle -,- and -multigrid. While the previous works employed -refined mesh sequences, Prill et al. [9] considered smoothed aggregation to build coarse problems for -multigrid dG solvers. The issue of developing optimal solvers for Composite discontinuous Galerkin Methods, first developed and analyzed Antonietti et al. [10] was considered by Antonietti et al. [11, 12]. More recently Antonietti et al. [13] analysed multigrid strategies for Interior Penalty dG discretizations over agglomerated elements meshes, while Wallraff and Leicht [14] and Wallraff et al. [15] applied an agglomeration based -multigrid solver to dG discretizations of the compressible Reynolds Averaged Navier-Stokes (RANS) equations.
-coarsening by agglomeration leads to unprecedented flexibility in the definition of the coarse meshes. Starting from a fine grid, a coarse mesh can be generated on the fly clustering together a number of mesh elements. The process can be repeated at will in a recursive manner resulting in a nested mesh sequence. Note that the generation of a sequence of nested grids by recursive refinement of a coarse mesh, e.g. by means of element subdivision techniques, might require to improve the rough approximation of the computational domain provided by the coarse mesh. While coarsening by agglomeration is flexible enough to account for complex 3D domains, physical frame dG discretizations allows to handle polyhedral elements of very general shape [16, 17, 18, 19, 20]. Nevertheless, as a consequence of the lack of efficient quadrature rules for agglomerated elements, numerical integration of bilinear and trilinear forms might lead to excessive matrix assembly costs, see Bassi et al. [17].
The present investigation focuses on efficiency of building coarse grid operators for dG discretizations of incompressible flow problems and effectiveness of the multigrid V-cycle iteration. In particular, we introduce a strategy for inheriting the BR2 dG formulation of [21], which provides optimal convergence properties and does not require numerical integration during assembly of coarse grid operators. Besides the BR2 formulation, here employed for the discretization of the viscous terms, inherited multigrid can be fruitfully employed for the discrete divergence and the discrete gradient operators, and also for the discretization of the non-linear convective flux terms appearing in the Navier-Stokes equations.
The material is organized as follows. In Section 2 we introduce agglomeration based dG discretization over -coarsened mesh sequences. Section 3 is dedicated to presenting dG discretizations of incompressible flow problems:
i) the incompressible Navier-Stokes equations spatial and temporal discretization in Section 3.1 and 3.2, respectively;
ii) the discretization of the steady Stokes problem in Section 3.3;
iii) the BR2 dG formulation in Section 3.4.
The ingredients of the -multigrid iteration are described in Section 4:
i) the V-cycle in Section 4.1;
ii) intergrid transfer operators in Section 4.2;
iii) inherited coarse grid operators in Section 4.3.
Section 5 briefly comments on the use of the -multigrid V-cycle iteration as a preconditioner for iterative solvers and introduces block preconditioners for the Stokes problem. Performance gain assessment as compared to state-of-the-art iterative and direct solvers is conducted in Section 6. We consider
i) elliptic problems in Section 6.1;
ii) linear Stokes problems in Section 6.2;
iii) non-linear incompressible flow problems in Section 6.3.
2 Agglomeration based dG discretizations
2.1 Coarsening by agglomeration
Let be a bounded connected open domain. Consider a (possibly non conforming) mesh of composed of (possibly curved) elements such that
(i) for any , there exists a reference polygon and a polynomial mapping such that .
(ii) quadrature rules of arbitrary order are available on the reference polygon .
The set of reference polygons includes but is not limited to triangular and quadrilateral reference elements in 2D, tetrahedral, hexahedral, pyramidal and prismatic reference elements in 3D.
Starting from we can define a sequence of coarsened meshes by agglomeration, see Figure 1. For the sake of notation we denote by any element whose diameter is , and we denote the mesh size of , by . Agglomeration generates a hierarchic sequence of nested grids, in particular for any , we suppose that
- •
is a disjoint partition of obtained clustering together the elements of ;
- •
every is an open bounded connected subset of and there exists such that
[TABLE]
The cells clustered into the agglomerated element are referred to as sub-elements.
- •
for every there exists such that
[TABLE]
This can be obtained by applying (1) recursively and formalizes the fact that agglomerated elements on any mesh level can be expressed as a composition of elements belonging to the finer mesh .
Clearly the coarsening steepness is influenced by the number of sub-elements composing aggregate elements as well as by the aspect ratio of agglomerated elements, see Figure 1. In this work all the mesh sequences are generated setting in two and three space dimensions, respectively, where the agglomeration rate is a strict upper bound for the number of sub-elements, so that . The sequence of coarse meshes are generated by means of the library MGridGen [22], which allows to fix and relies on optimization algorithms in order to ensure overall good quality of the agglomerated elements. Note that, while the typical coarsening steepness can be obtained on regular Cartesian grids by regrouping 4 quadrilateral elements (2D case, 8 hexahedral elements in 3D) sharing a node, on general unstructured grids leaving unbounded from below gives room for more aggressive aspect ratio optimizations.
To complete the definition of agglomerated grids we introduce inter-element boundaries where to define trace operators and fluxes of the dG discretization.
- •
Faces of an element are defined as a portion of such that there exists a (hyperplanar) face of the corresponding reference element such that is the image of through the mapping .
- •
Faces of an agglomerated element , are defined as a portion of such that either or there exists , , such that .
Mesh faces are collected in the sets . As mesh elements are composed by sub-elements, every face is composed by sub-faces, also called facets, which belong to the set . Moreover, for every face we introduce the set collecting the facets partitioning , i.e.,
[TABLE]
and applying the definition recursively we get
[TABLE]
where .
We introduce the set of boundary mesh faces such that and let denote the set of internal faces. Moreover, for any mesh element , the set
[TABLE]
collects the mesh faces composing the boundary of . The maximum number of mesh faces composing the boundary of mesh elements is denoted by
[TABLE]
For any mesh face we define the set
[TABLE]
regroups the two mesh elements sharing if while it consists of a single mesh element if .
2.2 Physical frame dG discretizations
For each mesh level we consider the following broken polynomial spaces
[TABLE]
where is the restriction to a mesh element of the polynomials functions of variables and total degree at most , such that . Since in this work and no confusion is possible, we drop the subscript and simply use the notation in place of . Due to the nestedness of mesh elements we have .
It is interesting to remark that physical frame discretizations are defined so to inherently span the space and provide optimal approximation properties on regular -refined mesh sequences , see e.g. Botti [23]. Accordingly, for all and for each polynomial degree , the -orthogonal projection operator is such that for all , there holds
[TABLE]
where is independent of and . The optimal approximation estimate (6) holds true over mesh sequences composed of agglomerated elements of very general shape, in particular agglomerated elements meshes built on top of a curved elements mesh are eligible to provide optimal approximation properties, see e.g. [24]. While the mesh regularity assumption implies star-shapedness of agglomerated elements, see [25] or [26] for additional details, the numerical convergence rates assessed in [16] and [17] allow to claim that optimal approximation properties are achieved over mesh sequences obtained by means of the MGridGen library (for instance using in reversed order as a -refined mesh sequence).
Sharp approximation properties estimates valid in the general framework of -discontinuous Galerkin discretizations have been obtained introducing the concept of shape regular -simplexes coverings of polygonal/polyhedral meshes, see [18].
2.3 Basis functions choice
For a given , let denote a basis for . A basis for the space is given by
[TABLE]
where each basis functions is extended to by simply setting on .
From a practical viewpoint, in order to find a numerically satisfactory physical frame basis function we rely on the procedure proposed by Bassi, Botti, Colombo, Di Pietro and Tesini [16]. Starting from a monomial basis for each elementary space defined according to a reference frame whose axes are aligned with the principal axes of inertia of , an -orthonormal basis is inferred by means of the Modified Gram-Schmidt (MGS) orthogonalization procedure. The resulting basis functions are hierarchical, orthogonal with respect to the inner product and provide well conditioned local matrices at high polynomial degrees. In particular the elementary mass matrices are unit diagonal, for any element shape.
The sole requirement to apply the orthogonalization strategy is the capability to compute the integrals of polynomial functions on each element . In the case of agglomerated elements this is achieved by exploiting the partition into standard-shaped sub-elements. The integral of any is computed as follows
[TABLE]
where and are physical and reference space coordinates, respectively, and is the Jacobian of the mapping function . The order of exactness required for exact integration over each sub-element rapidly increases when considering high order polynomials on curved elements. Moreover, the use of Gaussian quadrature rules defined on the reference frame polygon might lead to an excessive growth of the number of quadrature points if the agglomerated elements are composed of many sub-elements.
2.4 Average, jump and lifting operators
For all and all we introduce the jump and average operators defined as follows:
[TABLE]
Whenever no confusion can arise we drop the subscript . On boundary faces, we conventionally set . When is vector-valued, the weighted average operator acts componentwise on the function .
For all , denotes the unit outward normal to , whereas, for all such that , is defined as the unit normal pointing out of (the order of the elements sharing is arbitrary but fixed). For all we define the (local) lifting operator , such that, for all ,
[TABLE]
Note that the support of consists of one and two mesh elements if and , respectively, that is
[TABLE]
For any function , we also introduce the global lifting
[TABLE]
which collects the local lifting contributions, note that .
3 Incompressible flow problems
3.1 Incompressible Navier-Stokes equations dG discretization
We consider the unsteady INS equations with Dirichlet boundary conditions,
[TABLE]
where is the velocity vector, is the pressure, denotes the (constant) viscosity, is the boundary datum, is the initial condition, and denotes the average value over . The density has been assumed to be uniform and equal to one.
Letting and be the viscous and convective flux functions, Eqs. (11a)-(11b) can be written in conservation form as
[TABLE]
where . For we get
[TABLE]
The dG discretization of the Navier-Stokes equations we rely upon consists in seeking such that
[TABLE]
for all .
According to the BR2 scheme, proposed in [21] and theoretically analyzed in [27] and [28], the viscous numerical fluxes read
[TABLE]
is the consistent discrete gradient while is the consistent diffusive flux ensuring symmetry and stability of the scheme. In particular coercivity holds provided that is greater than the maximum number of faces of the elements sharing . The inviscid physical and numerical fluxes of the dG discretization reads
[TABLE]
respectively. The inviscid numerical fluxes result from the exact solution of local Riemann problems based on an artificial compressibility perturbation of the Euler equations, as proposed in [29].
Boundary conditions are enforced weakly by properly defining for each a boundary state having support on the interface of a ghost neighboring elements . The ghost boundary state is defined based on the method of characteristics exploiting the hyperbolic nature of the artificial compressibility perturbation of the Euler equation. Accordingly the ghost state depend on the Dirichlet datum but also on the internal state . Once ghost states are computed, handling of internal and boundary faces is similar: for each two neighboring elements concur to the computation of numerical fluxes and lifting operators.
3.2 Navier-Stokes equations temporal discretization
For the sake of notation we collect the vector velocity and the pressure polynomial expansions in the vector and identify the unknown vector at time with , that is . For all we introduce the following bilinear and trilinear forms
[TABLE]
In the above definitions we dropped the mesh sequence subscript for notation convenience. Note that, by abuse of notation, (18) is a bilinear (resp. trilinear) when is a linear (resp. non-linear) function of .
Given the initial condition we define the sequence iteratively by means of the backward Euler method:
Note that the continuation condition at line 4 can be replaced by checking that a proper norm of the right hand side of Equation (21) is too large. Equation (22) is needed since the average value of the pressure increment is left undefined in Equation .
To recast Problem (21) in operator form we let and introduce the linear operators such that,
[TABLE]
Moveover we introduce the residuals of momentum and continuity equations
[TABLE]
Problem (21) amounts at solving a linear system in the form:
[TABLE]
3.3 Stokes equations dG discretization
The steady Stokes equation problem can be obtained dropping the time derivative and the convective term, that is the first two terms in equation (11a). In case of a steady Stokes flow the inviscid interface fluxes and can be explicitly computed as the solution of a linear hyperbolic system, see [29] for details. The resulting dG discretization reads: find such that
[TABLE]
where
[TABLE]
and the terms on right hand side defined below accounts for the week imposition of Dirichlet boundary conditions
[TABLE]
According to (28) the discretization of the viscous term can be obtained applying the BR2 method to each velocity component, cf. definitions (19) and (36). The dG discretization in (24) was analysed by Di Pietro in [30], see also [26, Chapter 6].
Problem (24) has a block structure which we can take advantage for devising effective preconditioners. To this end we define the operators , and
[TABLE]
Note that according to (26) we are able to infer .
Problem (24) amounts at solving a linear system in the form:
[TABLE]
3.4 Viscous terms dG discretization
The BR2 dG formulation is employed for the discretization of the viscous terms of Equation (11a) being an important building block of both the Stokes and the Navier-Stokes dG discretizations. In this work we focus on the performance of solving the BR2 dG discretization of the following model Poisson problem
[TABLE]
Assessing and improving the performance of -multigrid for elliptic problems has been a critical step for achieving satisfactory performances on incompressible flow problems.
The BR2 method can be written into a consistent and symmetric bilinear form plus stabilization term as follows: For all ,
[TABLE]
where
[TABLE]
Using the definition of the BR2 bilinear form in (36) the discretization of (35) reads:
[TABLE]
Well-posedness of problem (39) was proved by Brezzi et al. [27].
The BR2 method in (39) can be reformulated as follows: Given ,
[TABLE]
where is the -orthogonal projection operator into and is the fine grid operator, such that
[TABLE]
Solving (40) amounts to solving a linear system in the form
[TABLE]
4 -multigrid V-cycle
The ability to define -coarsened mesh sequences by agglomeration and to perform high-order accurate dG discretizations on general polygonal grids allows to exploit h-multigrid solvers to improve the efficiency of the solution strategy. In this work we consider -multigrid preconditioners for the dG discretization of the Stokes problem (24) and the linearized Navier-Stokes problem (21). Besides incompressible flow problems we will also focus on the performance of -multigrid applied to purely elliptic scalar problems. The interest is twofold: firstly the discretization of the viscous terms relies on a BR2 dG discretization and secondly block preconditioners for the Stokes problem require effective preconditioners for the discrete Laplace operator, see Section 5.
The linear (or linearized) systems arising from dG discretizations of the Navier-Stokes, Stokes and Laplace equation, see Eqs. (23), (34) and (42), are in the form
[TABLE]
Solving (43) allows to compute the degrees of freedom of , where in case of incompressible flow problems. As opposite we deal with a scalar function in case of the Laplace equation, i.e. . In order to accelerate convergence towards the multigrid iteration relies on several coarse grid problems in the form
[TABLE]
Coarse grids are explicitly built and coarse grid solutions belong to piecewise polynomial spaces defined over them, . The purpose of this section is to provide some insight on how coarse grid operators are built and how coarse grid solutions can be effectively employed to speed up the achievement of the fine grid solution . A comprehensive review can be found in Refs. [31, 32, 33], while the analysis of multigrid as a preconditioner for Krylov solvers, is analyzed in detail by Smith, Bjørstad and Gropp [32].
4.1 Multigrid V-cycle iteration
In this work we consider the multigrid V-cycle iteration, that is the simplest way of traversing the mesh sequence generated by agglomeration coarsening of the fine grid, see Section 2.1. The recursive multigrid V-cycle for the problem on level reads:
As a result of invoking the grid sequence is traversed moving towards coarser levels, one grid at a time, until the coarsest level is reached. Note that the coarsest level can be thought to be located at bottom of the V-shaped cycle. The descending phase is followed by ascension towards finer levels, until a new approximation of the exact solution over the fine grid is available. This marks the completion of one V-cycle iteration.
On each level except the coarsest one, three distinct phases take place: pre-smoothing, coarse grid correction and post-smoothing, see Algorithm 2. In the pre-smoothing phase a few iterations (one or two, in this work) of a standard preconditioned iterative solver are performed in order to damp high-frequency modes of the error . Since the convergence of iterative solvers deteriorates when trying to damp low-frequency modes resulting in an inefficient solution process, the error equations are solved on a coarser grid (level ), where low-frequency modes appears more oscillatory. Once the error is computed it is transferred back to level and used to correct the solution: . Before doing so post-smoothing ensures that only smooth components of the error have survived. Note that correction takes place only after the residual equations have been accurately solved on the coarsest level, usually with a direct solver.
It is interesting to remark that, due to the linearity of the original problem, solving with an arbitrary initial guess is equivalent to solving the residual equations with a zero initial guess. As a consequence the coarse grid correction requires to compute the residual but the computation of an initial guess is not needed. The residual is approximated by projecting its fine counterpart to level which requires the definition of the so called restriction operator : . Similarly the error needs to be prolongated to the coarse mesh by means of the prolongation operator : . The implementation of the intergrid transfer operators and is described in detail in the next section.
4.2 Intergrid transfer operators
In any geometric multigrid strategy, transfer operators are required to map functions between two subsequent spaces in the set . Since nested grids are generated by recursive coarsening of a fine grid, also the polynomial spaces are nested, that is . Accordingly, prolongation is the natural injection such that
[TABLE]
while restriction is the projection such that
[TABLE]
The matrix counterpart , of the restriction operator is a sparse block matrix composed of blocks , each defined as
[TABLE]
where
[TABLE]
In particular each row of the matrix is associated to an element and consist of blocks. Similarly the prolongation matrix consists of blocks , each defined as
[TABLE]
Thanks to the use of orthonormal basis functions, the elemental mass matrices reduce to the identity matrix, that is , reducing the computational cost of computing transfer operators. Moreover, since , storing transfer operators requires to store only blocks of size .
Interestingly the same holds true when considering intergrid transfer operators for a vector function . Restriction and prolongation can be efficiently performed componentwise, that is
[TABLE]
without explicitly building the matrix associated to the restriction operator . Matrix-free restriction and prolongation algorithms are implemented as follows.
Note that and are the degrees of freedom associated with the i-th component of the function and of the function , respectively.
4.3 Coarse grid operators
Two possibilities are available for building coarse grid problems in the form of (44), the so called non-inherited multigrid, where discrete operators are assembled on each grid of the mesh sequence, and inherited multigrid, where coarse operators are recursively built by restricting the fine grid operators.
Recently, evidence emerged that non-inherited multigrid might be preferable from the convergence rates viewpoint, in particular Antonietti et al. [8] have analyzed -multigrid Interion Penalty dG discretization of the Laplace equation demonstrating that only non-inherited multigrid provides uniform convergence with respect to the number of levels. Nevertheless, inherited coarse grid operators are significantly cheaper to compute since evaluation of numerical fluxes and assembly of bilinear forms over agglomerated elements grids is avoided. In particular, from the implementation viewpoint
i) numerical integration and basis function orthogonalization over agglomerated elements meshes are required only for the computation of intergrid transfer operators;
ii) the parallel implementation is simpler since flux computation on partition boundaries requires to access data from ghost agglomerated elements (note that ghost agglomerated elements are composed by many layers of fine ghost cells).
Accordingly choosing between inherited and non-inherited version of -multigrid might involve a trade-off between efficiency of the solver strategy and computational cost of assembling coarse grid operators. In order to avoid such an uncomfortable situation, in what follows we propose to heal the convergence degradation of inherited multigrid using a rescaled Galerkin projection of the stabilization terms of the BR2 dG discretization. The possibility to suitably rescale the stabilization terms of dG discretizations to improve the performance of coarse grid solvers was first proposed by Antonietti et al. [34] in the context of two level Schwarz methods for overpenalized Interior Penalty formulations.
4.3.1 BR2 dG discretization
Consider the BR2 bilinear form defined in (36) and the corresponding fine grid operator , see definition (41). The coarse grid operators , read
[TABLE]
where and are the prolongation operators introduced in Section 4.2. Continuity and coercivity bounds for the bilinear form over agglomerated elements meshes were proven by Bassi et al. [16], in particular on level stability holds provided that . Accordingly the coarse grid problems , , arising in the non-inherited version of the multigrid V-cycle iteration, see Section 4.1, are well-posed.
In what follows we demonstrate that, given the BR2 bilinear form in (36), for all
[TABLE]
Accordingly the difference between and hinges on the stabilization term.
Using the local and global lifting operator definitions (9) and (10) the consistency and symmetry BR2 bilinear form in (37) can be rewritten as
[TABLE]
where
[TABLE]
Since , we get
[TABLE]
Since if , we get
[TABLE]
The above result together with (57) prove (53).
Using the local lifting operator definitions (9) the stabilization term in (38) can be rewritten as
[TABLE]
The inherited stabilization term reads
[TABLE]
while its non-inherited counterpart is simply
[TABLE]
The inherited stabilization term (60) introduces an excessive amount of stabilization as compared to (61) having a detrimental effect on the spectral properties of inherited coarse grid operators, see Antonietti et al. [8].
In order to recover the correct amount of stabilization we propose to rescale it introducing the scaling term
[TABLE]
and defining the rescaled stabilization term
[TABLE]
such that
[TABLE]
To prove (64) we recall the following bounds on the local lifting operator: let , for all
[TABLE]
where , see e.g. [27, Lemma 2], [35, Lemma 7.2] or [26, Lemma 4.33 and Lemma 5.18] for a proof. The constant depends on , and the shape regularity of the elements sharing and is inherited from the discrete trace inequality: for all ,
[TABLE]
While trace inequalities in the form of (67) are commonly available in the context of simplicial and quadrilateral/hexahedral meshes we refer to [26, Lemma 1.46] for a version valid in the context of matching simplicial submeshes and to [18, 19] for an optimal version derived in the context of polygonal/polyhedral element meshes.
Using (66) we get the following bounds
[TABLE]
[TABLE]
which prove (64).
In view of (65), using (66), we now infer
[TABLE]
and summing over mesh faces on level we get the desired result. As remarked by Antonietti et al. [13], is influenced by the aspect ratio of the agglomerated element as well as by the ratio between the agglomerated element and the agglomerated face measure. Interestingly enough MGridGen algorithms are designed to optimize the aspect ratio of agglomerates and minimize the number of graph neighbors, which should also limit the occurrence of small degenerate faces (note that according to the definitions given in Section 2.1 the number of faces is equivalent to the number of element neighbors).
Consider now the inherited coarse grid operators
[TABLE]
such that ,
[TABLE]
Since
[TABLE]
stability holds provided that . In practice, motivated by the observation that the stabilization parameter choice suggested by theory is abundant, see e.g. [17], we deliberately neglected the dependence on and in definition (62). Note that a strategy for estimating over agglomerated element meshes has been proposed by [18].
As we already pointed out the main advantage of inherited multigrid is the possibility to build coarse grid operator by means of intergrid transfer operators, avoiding numerical integration over agglomerated elements. The matrix restriction algorithm is described in A and exploit the possibility to recursively inherit operators according to the following identities
[TABLE]
[TABLE]
where and are the restriction and prolongation operators described in Section 4.2.
4.3.2 Stokes dG discretization
Consider the Stokes operator defined in (34), the inherited coarse grid operators employed in this work read
[TABLE]
Consider the bilinear form defined in (25), and the corresponding fine grid operator , see definition (33). The coarse grid operators , read
[TABLE]
where the restriction of a vector function is performed componentwise
[TABLE]
Proceeding as in Section 4.3.1, see in particular (58), it is straightforward to show that .
According to definition (28) the operator , read
[TABLE]
see (69) for the definition of .
To conclude, the coarse operators , read
[TABLE]
see (27) for the definition of the bilinear form . Even if the inherited bilinear form introduces a different (read smaller) amount of stabilization as compared to its non-inherited counterpart the numerical test case corroborate the choice not to modify the scaling of .
Coarse grid operators are built by means of intergrid transfer operators, exploiting the possibility to recursively inherit operators. For example the operators are such that, for all
[TABLE]
where and and . Similarly to vector restriction and prolongation, matrix restriction can be performed matrix-free without requiring to assemble the matrices and . This practice yields large memory savings when the operators act on vector functions.
4.3.3 Navier-Stokes dG discretization
Consider the Navier-Stokes operator defined in (23), the inherited coarse grid operators employed in this work read
[TABLE]
To inherit the viscous operators we follow the same path of the Stokes case. Accordingly we get , , see Definition (74) and note that, according to Definition (28), .
Regarding inviscid operators we consider the trilinear form
[TABLE]
see Definition (18), such that
[TABLE]
We remark that operators and can be restricted in a similar fashion. The coarse grid operators , read
[TABLE]
. The non-inherited version of coarse grid operators is not employed in this work but is included for the sake of comparison.
In practice, given the fine grid operator , the inherited coarse grid operators are defined recursively by means of the Galerkin projection. The operators are such that, for all
[TABLE]
Accordingly
[TABLE]
where and are the restriction and prolongation operators described in Section 4.2 and (78) is performed matrix-free.
5 Multigrid and Block preconditioners
In this work we consider multigrid preconditioners for the Navier-Stokes equations and block preconditioners for the dG discretization of the Stokes problem (34). Both the Stokes and the Navier-Stokes problem have a block structure that can be exploited to devise preconditioners based on Schur complement decompositions, nevertheless pressure Schur complement preconditioners are less trivial in the Navier-Stokes case than in the Stokes case [1]. Comparison between block and -multigrid preconditioners will be performed on a Stokes model problem, while in the Navier-Stokes case we will focus on -multigrid as a preconditioner of a FGMRES backward Euler iteration.
Incompressible flow problem dG discretizations in the form (43) can be solved by preconditioned Krylov iterative methods, say , where the preconditioner is a suitable approximation of (such that the application of to a vector is cheap to compute). For example an Incomplete Lower Upper (ILU) decomposition of the system matrix is a common preconditioner choice, i.e. . Interestingly a Krylov iterative method, say , can serve as a preconditioner by triggering convergence of the iteration on loose tolerances, i.e. . Note that in this case Flexible Generalized Minimal RESidual (FGMRES) is usually employed as a solver as the preconditioner varies at each outer Krylov iteration.
Similarly, the multigrid V-cycle iteration of Section 4.1 can be employed as preconditioner, thus the solver strategy reads: . Building the coarse grid operators as described in Section 4.3, the multigrid V-cycle can be applied as a preconditioner of the Stokes and Navier-Stokes operators , .
Besides multigrid preconditioners, block preconditioners for the Stokes problem (34) are derived by noticing that admits the following LDU factorization
[TABLE]
where
[TABLE]
is the pressure Schur complement matrix. Since
[TABLE]
can be obtained by replacing and with preconditioned Krylov solvers, say and . Whereas computing explicitly is not viable, an approximate solver can be employed with
[TABLE]
Note that applying to a vector involves a nested Krylov iteration. Clearly the performance of the outer solver is strongly influenced by the availability of good preconditioners for the Laplace and the pressure Schur complement operators, read and .
As suggested by Shahbazi et al. [36] , can be constructed by a dG discretization of the Laplace operator with homogeneous Neumann boundary conditions on Dirichlet boundaries. Accordingly the operator is such that
[TABLE]
where
[TABLE]
Note that (83) can be obtained from the BR2 bilinear form in (36) using the local and global lifting operator definitions (9) and (10) and dropping the boundary face terms. As a preconditioner for we employ the -multigrid V-Cycle iteration described in Section 4 using the rescaled-inherited version of coarse grid operators defined in (74).
The solver and preconditioners options are summarized in what follows. Richardson iteration serves as the outer loop, i.e. . The application of the block preconditioner reads:
[TABLE]
see PETSc User manual [37], where
[TABLE]
and .
6 Numerical results
6.1 BR2 dG discretization
In this section we apply the -multigrid V-cycle iteration of Algorithm 2 for solving a Poisson problem discretized by means of the BR2 dG formulation. For the sake of comparison we consider the three strategies for defining coarse grid operators introduced in Section 4.3.1, that is:
i) non-inherited operators defined assembling bilinear forms on each mesh level
ii) inherited operators defined by means of a Galerkin projection
iii) the newly introduced inherited operators with stability rescaling.
We compare these approaches on the basis of convergence rate and computation time and we assess the benefits of using -multigrid as compared to state-of-the-art single grid solvers like the preconditioned Conjugate-Gradient (CG) method and the preconditioned Generalized Minimal RESidual (GMRES) method.
We consider the Poisson problem in (35) on the bi-unit square and cube, with and , respectively, where the forcing term is imposed according to the following smooth analytical solution
[TABLE]
and homogeneous boundary conditions are imposed on .
In order to investigate the growth of computational costs while increasing the mesh size, 2D solutions are computed on three uniform quadrilateral elements meshes of size and three distorted and graded triangular meshes of size , see Figure 2. As for 3D solutions we consider a grid, counting of more than two million hexahedral elements, and we investigate parallel performance of the multigrid algorithm running on up to 128 processes. In both two and three space dimensions we check the influence of raising the polynomial degree on the convergence rate and the computational expense considering , that is first, second and third polynomial degree dG discretizations. The error norm is on the order of for the fourth-order accurate dG discretization on the quadrilateral grid. We do not consider a further raise of the polynomial degree since for higher-order discretizations -multigrid or -multigrid solution strategies might be best suited. Indeed -multigrid is to be applied in the context of large scale computations where the mesh size is constrained by the need to accurately discretize a complex computational domain, note that the design of coarse high-order meshes suited for higher-order discretizations is an open field of research, see e.g. [38]. We remark that the agglomeration strategy does not take advantage of the triviality of the geometry here considered. In Section 6.3.3 the multigrid strategy will be applied without any modification to unstructured, possibly hybrid, meshes of complex computational domains.
To investigate the influence of the number of coarse levels on the convergence rate we consider and for and , respectively, that is we consider a stack of 3 to 6 grids in two space dimensions and a stack of 3 to 5 grids in three space dimensions, see Figure 1 and Figure 3. The number of mesh elements at each level and the maximum and the minimum number of elements among the distributed grid partitions at level is reported in Table (1) and Table (2), respectively. It is interesting to remark that in three (resp. two) space dimensions a 6.7 (resp. 3.3) fold decrease of the number of elements is obtained at each agglomeration step, whereas an 8 (resp. 4) fold decrease would be required in order to halve the mesh step size in uniform hexahedral (resp. quadrilateral) elements mesh sequences. Nevertheless, since agglomerated elements have very general shapes, the element size is not uniform and, as demonstrated in [16], the maximum and average mesh step size are usually bigger as compared to quadrilateral elements meshes of the same cardinality. The (user provided) upper bound for the number of sub-elements to be clustered together by means of the MGridGen library, see Section 2.1 for details, should guarantee that for each .
We complete the definition of the V-cycle preconditioned FGMRES iteration, i.e. specifying the relevant solver and preconditioner options. We employ a single iteration of multigrid V-cycle as a preconditioner for a Flexible GMRES solver (restarted FGMRES with 60 Krylov spaces) [39]. High-order modes of the error are smoothed with a single iteration of a right preconditioned GMRES solver. In 2D serial runs we employ an Incomplete Lower-Upper (ILU) preconditioner while in 3D parallel runs we opt for an Additive Schwarz domain decomposition Method (ASM) with one level of overlap between sub-domains and an ILU decomposition for each sub-domain matrix. On the coarsest level linear systems are solved with a direct solver in 2D. For parallel 3D runs we rely on the solver employed in smoothing steps but, instead of a single iteration, we impose a four order of magnitude decrease of the relative residual norm, that is . The FGMRES solver is forced to reach tight relative residual tolerance, in particular the linear system solution converges in iterations if at the -th iterate .
The numerical results reported in the next section have been computed by exploiting the PCMG multigrid preconditioner framework available in the PETSc library [40, 37, 41]. The MOAB [42] library is employed for storing distributed mesh data at all mesh levels and METIS [43] library is employed to partition the mesh in case of parallel computations.
6.1.1 2D Poisson problem
Tables 3 and 4 report the number of iterations required to solve the Poisson problem (35)-(6.2) discretized with the BR2 method. Single grid solver options mimic multigrid ones: we impose a relative residual decrease of and employ ILU (right) preconditioned Conjugate Gradient (CG) and GMRES solvers setting the number of Krylov spaces to 120 for GMRES.
As expected only non-inherited and inherited multigrid with stabilization term rescaling (rescaled-inherited) are able to provide uniform convergence with respect to the number of levels, note that this is the case even on bad quality triangular meshes. Comparison of the number of iterations on quadrilateral elements meshes highlights that the performance of single grid solvers worsen on finer meshes while the rescaled-inherited multigrid iteration is almost grid independent. Moving towards finer distorted triangular meshes also the number of multigrid iterations increases, but far less dramatically than with single grid solvers. Interestingly the number of iterations of single grid solvers is also affected by raising the polynomial degree while the convergence rates of -multigrid improve increasing the polynomial degree when non-inherited and rescaled-inherited multigrid are employed. Note that for and the average residual decrease exceeds one order of magnitude at each V-cycle iteration on quadrilateral mesh sequences.
Total CPU times (total means the sum of solution and assembly CPU times) reported in Table 5 demonstrate that the newly introduced rescaled-inherited multigrid strategy is the best performing. From the solution time viewpoint inherited and rescaled-inherited multigrid are almost indistinguishable while inherited multigrid is largely affected by the performance degradation increasing the number of grid levels. From the assembly time viewpoint inherited and rescaled-inherited multigrid avoids the burden of numerically integrating bilinear forms over agglomerated elements meshes. Non-inherited multigrid assembly times strongly increase with the number of levels (note that assembly can be twice as expensive than solution) negatively impacting total CPU times. We remark that, if quadrature formulas are defined over sub-elements as described in (8), the number of quadrature points is the same at each mesh level, irrespectively of the mesh density.
Rescaled-inherited total CPU times are almost independent of the number of levels provided that the coarsest grid on level is coarse enough. This is a very important result in view of applying multigrid in real-world computations since it basically removes the burden of choosing of the number of grid levels. Most importantly both non-inherited and rescaled-inherited multigrid are close to the optimal multigrid efficiency. They lead to a four-fold increase of the total computation time with a four-fold increase of the mesh size at all the polynomials degrees, provided that is chosen large enough.
Since CPU times for the 2D Poisson problem are measured running on a 2010 laptop, we recommend not to consider absolute values but rather relative gains. In this regard, the gains with respect to the best performing single grid solver (an ILU preconditioned Conjugate Gradient iteration) are significant, especially at the highest polynomial degrees on fine meshes, see Table 6. As a result, if we consider the finest quadrilateral and triangular meshes, solving a second degree polynomial degree BR2 dG discretization with -multigrid is comparable to solving a first degree dG discretization with CG, see Table 7. Interestingly the time required for solving a third polynomial degree BR2 dG discretization with -multigrid is twice the time required for solving a first degree dG discretization with Conjugate Gradient, which is quite impressive considering the accuracy gap.
To conclude we also report preprocessing CPU times including generation of -coarsened mesh sequence and orthogonalization of shape functions together with computation and storage of intergrid transfer operators. In Table 8 it is possible to appreciate that both these operations are time consuming as compared to setup times of single grid computations. Nevertheless for dG discretizations, even considering preprocessing times, multigrid outperforms single grid solvers.
6.1.2 3D Poisson problem
The Poisson problem in three space dimensions is here considered to assess the performance of the -multigrid solution strategy in parallel computations. The hexahedral mesh is first partitioned and distributed across the processes, thus each process build an -coarsened mesh sequence of its own partition. This practice is optimal from the distribution of computational load viewpoint but requires ad hoc strategies to deal with agglomerated elements whose faces are shared between partitions. While in single grid dG solvers the replication of a single layer of cells (the so called ghost cells) across partition boundaries ensures that the stencil of the discretization is fully accessible, the definition of ghost agglomerated cells is more tricky. Depending on the number of grid levels and the shape of agglomerated elements, many layers of cells of the fine grid might be involved in the process.
Nevertheless, it should be remarked that only non-inherited multigrid requires to numerically integrate bilinear forms over internal faces located on partition boundaries and, contextually, access the discretization stencil at all the mesh levels. As opposite, whenever coarse grid operators are defined restricting the fine grid operator, like in inherited and rescaled-inherited multigrid, only intergrid operators associated to ghost cells are required. In this regard the design decision of storing intergrid transfer operators in preprocessing is very handy, see Section 4.2. Indeed, intergrid transfer operators associated to ghost cells can be communicated across partitions without needing to actually build ghost cells, only adjacencies informations are required. As a consequence the implementation of inherited and rescaled-inherited multigrid is simpler in parallel.
We consider the rescaled-inherited strategy for defining coarse grid operators which has demonstrated to provide uniform convergence with respect to the number of levels and affordable assembly times, see Section 6.1.1. Beside the baseline computations performed with 8, 16 and 32 processes at first, second and third polynomial degree, respectively, we double the number of processes two times for a total of three runs at each polynomial degree. Thanks to the use of an ASM preconditioner for the GMRES smoother the number of FGMRES iterations is independent from the number of processes, see Table 9. The single grid GMRES solver uses the same kind of ASM preconditioner employed by the GMRES smoothers, that is an ASM preconditioner with one level of overlap between the sub-domains and an ILU decomposition in each sub-domain matrix. The single grid CG solver employs a block-Jacobi preconditioner with an ILU decomposition in each sub-domain matrix (but no overlap between sub-domains).
The CPU times reported in Table 10 demonstrate that the -multigrid efficiency does not deteriorate increasing the number of processes. The gains with respect to single grid solvers are comparable to those observed in serial computations in two space dimensions, see Section 6.1.1. Similarly, a sufficient number of grid levels must be employed to ensure that the grid on level is coarse enough, note the poor performance for . Even if a scalability analysis would require to further increase the number of processes we observe a strong linear scaling, that is the computation times halves doubling the number of processes.
To demonstrate that also the preprocessing phase is scalable, in Table 11 and Table 12 we report CPU times for the generation of -coarsened mesh sequence and orthogonalization of shape functions together with computation and storage of intergrid transfer operators, respectively. Remarkably, since the problem size has increased as compared to 2D computations, multigrid outperforms single grid solvers in terms of overall computation time (that is considering assembly, solution and preprocessing CPU times) at all polynomials degrees.
The gains with respect to the best performing single grid solver (the preconditioned Conjugate Gradient iteration) are on pair with those observed in 2D computations and do not deteriorate increasing the number of processes, see Table 13. Similarly to the 2D case solving a second degree polynomial degree BR2 dG discretization with -multigrid is comparable to solving a first degree dG discretization with CG and the time required for solving a third polynomial degree BR2 dG discretization with -multigrid is twice the time required for solving a first degree dG discretization with Conjugate Gradient, see Table 7.
6.2 Stokes dG discretization
In this section we tackle the solution of a model Stokes problem discretized by means of the dG formulation in (24). The computational domain is the bi-unit square, , and we impose Dirichlet boundary conditions on according to the following smooth analytical solution
[TABLE]
In order to investigate the growth of computational costs while increasing the mesh size, solutions are computed on four uniform quadrilateral elements meshes of size and four distorted and graded triangular meshes of size , see Figure 2. We check the influence of raising the polynomial degree and the number of coarse levels considering and . The number of mesh elements at each level of the stack of grids is reported in Table (1).
For the sake of comparison we consider the -multigrid V-cycle preconditioned FGMRES iteration and the Pressure Schur Complement preconditioned Richardson iteration of Section 5. The relevant solvers options are summarized in what follows.
preconditioned FGMRES solver
One iteration of cycle is used as a preconditioner for the FMGRES(60) iteration. On quadrilateral meshes high-order modes of the error are smoothed with a single iteration of a right ILU preconditioned GMRES solver while on distorted and graded triangular meshes we consider one and two smoothing iterations. On the coarsest level we employ the same solver but, instead of fixing the iteration number, we impose a four order of magnitude decrease of the relative residual norm, that is . The FGMRES iteration is forced to reach tight relative residual tolerance, in particular the linear system solution converges in iterations if at the -th iterate .
Pressure Schur Complement preconditioned Richardson solver
To approximatively invert the discrete vector Laplace operator and the pressure Schur complement appearing in the block factorization of , see Section 5, we employ a FGMRES(60) and a GMRES(60) solver, respectively. We set a two order of magnitude decrease of the relative residual norm and limit the maximum number of iterations to 2 and 40, respectively. The pressure Schur Complement GMRES solver is preconditioned with an ILU decomposition of the operator in (82). The FMGRES solver acting on the discrete Laplace operator is preconditioned with one and two iteration of multigrid V-cycle on quadrilateral and triangular grids, respectively. Smoothing options are the same of Section 6.1. The Richardson iteration is forced to reach a relative residual tolerance of .
We compare the solvers on the basis of convergence rate and computation time and we compare execution times with a Lower Upper (LU) decomposition direct solver and the ILU preconditioned GMRES(200) solver. All the numerical results have been computed by exploiting the PCMG multigrid preconditioner and the PCFIELDSPLIT block preconditioner frameworks available in the PETSc library [40, 37]. Provided that the iterative solver’s convergence criterion is satisfied and the LU factorisation is computed without running out of memory, the same error with respect to the exact solution is measured. For the dG discretization on the quadrilateral grid the error norm is on the order of and for velocity and pressure, respectively.
The number of iteration reported in Table 15 and Table 16 confirm uniform converge with respect to the number of levels for multigrid preconditioned solvers, both on quadrilateral and triangular mesh sequences. Nevertheless, while on uniform quadrilateral elements grids the convergence is grid independent, on distorted and graded triangular meshes the number of iterations increases on finer grids. Moreover, only on quadrilateral elements meshes increasing the polynomial degree entails less iterations. This can be better appreciated by inspecting the average residual decrease or convergence factor
[TABLE]
reported in Table 17. Convergence failure after 2000 ILU preconditioned GMRES iterations reads in Tables 15-16.
Looking at the wall clock times (solution times plus assembly times) reported in Table 18 it is clear that both multigrid preconditioned iterative solvers yield significant execution times gains with respect to direct solver on the quadrilateral mesh sequence. Since we get a four-to-five-fold increase of the total computation time with a four-fold increase of the mesh size at all the polynomials degrees, optimal multigrid efficiency is approached. Note that in case of the -multigrid preconditioned FGMRES solver the number of levels must be chosen large enough because of the poor performance of the coarse grid GMRES solver, even on relatively coarse meshes. In this regard uniform convergence with respect to the number of levels is highly beneficial.
The wall clock times measured on the triangular mesh sequence and reported in Table 19 demonstrate that only the preconditioned FGMRES solver allows to bit direct solvers. The performance of the Schur Complement block preconditioner is hit by the poor performance of the Schur complement subsolver which fails to lower the residual by two orders of magnitude in 40 iterations (we verified that increasing the maximum number of iteration beyond 40 is not beneficial in terms of execution times). As opposite the multigrid preconditioned FGMRES solver employed for the Laplace operator performs fairly well on distorted triangular meshes, see Table 4. Also the performance of -multigrid FGMRES degrade as compare to quadrilateral elements meshes: we observe a six-fold increase of the total computation time with a four-fold increase of the mesh size. Worsening of convergence factors is awaited given that the number of stretched triangles increases and their aspect ratios worsen when the mesh is refined, see Figure 2.
The comparison between direct solver and -multigrid FGMRES CPU times proposed in Table 20 confirms that strong gains can be attained by means of multilevel preconditioners, even on unstructured meshes composed of stretched and skewed elements. On fine enough uniform quadrilateral mesh solving a first degree dG discretization with an LU solver is comparable to solving a third degree dG discretization with a multigrid preconditioned FGMRES solver. Similarly, on a fine enough distorted triangular grids, and degree dG discretizations are comparable in terms of exacution time if solved with multigrid preconditioned FGMRES and direct LU solver, respectively.
Since the multigrid V-cycle preconditioner is the best performing and is reliable on low quality grids, in the next section the strategy will be applied for solving non-linear incompressible flow problems.
6.3 Navier-Stokes dG discretization
In this section we assess the performance of the multigrid preconditioned FGMRES solver applied to repeatedly solve the linearized system of equation in (21), as required for advancing in time the dG discretization of the incompressible Navier-Sokes equation in (14) by means of the Backward Euler method.
We consider the 2D Kovasznay and 2D-3D Lid-driven cavity problems, admitting a steady state solution at the Reynolds numbers considered in this work. Thus we tackle a real-life transient hemodynamic application: we consider the possibility to simulate the blood flow behavior all along the cardiac cycle in a 3D cerebral aneurysm geometry reconstructed from medical images.
We remark that the Backward Euler method can be modified to implement a pseudo-transient continuation strategy, see e.g. [44], that can be employed to seek steady state solutions of the incompressible Navier-Stokes equations. Roughly speaking it is sufficient to omit the while loop in Algorithm 1 (which is done for efficiency purposes since accuracy of the time integration is unnecessary) and introduce a time step adaptation strategy, e.g. the Successive Evolution Relaxation Strategy [45], which allows to progressively enlarge the pseudo time step (starting from a sufficiently small initial guess) when the steady state solution is approached. is a globalization of Newton method that guarantees convergence even when the tentative solution is far from the sought steady state solution. Moreover the favourable convergence rates of the Newton method can be exploited when the pseudo time step is large enough.
6.3.1 Kovasznay test case
To assess convergence with respect to the number of levels we consider the 2D Kovasznay problem [46] at Reynolds 40. Dirichlet boundary conditions are imposed according to the exact solution and we seek for the steady state solution starting from fluid at rest. Since the flow regime is diffusion dominated we set the initial pseudo-time step of the continuation strategy to a very large value () and fall back to pure Newton for the steady Navier-Stokes equations. We impose a four order of magnitude decrease of the relative residual norm at each Newton iteration: i.e. at the -th Newton iterate the -th iterate of the multigrid preconditioned FGMRES solver has congerged if . Convergence is achieved is six Newton iterations, the steady state solution is such that .
The smoothing and solver option for the multigrid preconditioner are the same that in the Stokes case. One iteration of cycle is used as a preconditioner for the FMGRES(60) iteration. Smoothing is performed with a single iteration of a right ILU preconditioned GMRES solver while for the ILU preconditioned GMRES solver on level we impose a four order of magnitude residual decrease. Besides the multilevel V-cycle iteration, for the Kovasznay test case we include the results obtained with the W-cycle iteration, see e.g. [33]. The V- and W-cycle iterations differ in terms of the coarse grid correction of Algorithm 2, as outlined below
[TABLE]
In order to investigate the growth of computational costs while increasing the mesh size, 2D solutions are computed on three uniform quadrilateral elements meshes of size of the bi-unit square domain . We check the influence of raising the polynomial degree on the convergence rate and the computational expense considering . To investigate the influence of the number of coarse levels on the convergence rate we consider .
Since we are considering the performance of a linear multigrid iteration applied to each of six Newton method steps required to reach the steady state solution all the numerical results presented in what follows are averaged over the six steps. The number of linear iterations reported in Table 21 and the convergence factors reported in Table 22 show that only the W-cycle iteration yields uniform convergence with respect to the number of levels. The influence of the number of levels on the V-cycle iteration is not dramatic but clearly noticeable. The number of iterations is not grid independent but moving from a to a quadrilateral elements mesh (a sixteen-fold increase of the number of elements) the iterations increase is less than two-fold. Also the polynomial degree dependence is mild: similarly to the Stokes case a slight worsening of the convergence rates is observed for .
The wall clock times comparison of Table 23 confirms that strong gains can be obtained as compared to the ILU preconditioned GMRES(200) iteration. Interestingly, even if the W-cycle iteration is the best performing in terms of convergence rates, the increased computational cost as compared to the V-cycle penalizes execution times. Similarly to the Stokes case we get a four-to-five fold increase of the computational cost with a four fold increase of the number of levels.
As observed for the Stokes problem, the assembly and solution wall clock times of Table 24 confirms that it is important to choose a sufficiently high number of levels not to get penalized by the poor performance of the ILU preconditioned GMRES coarse grid solver.
6.3.2 Lid-Driven cavity test case
To investigate the influence of the Reynolds number on the convergence rates and the performance in three space dimensions we consider the lid-driven cavity problem. We rely on a uniform quadrilateral and a uniform hexahedral grid of the unit square and the unit cube, respectively. We check the influence of raising the polynomial degree on the convergence rate considering in 2D but omitting in 3D. We consider two Reynolds numbers, and , in 2D and in 3D.
Since the Reynolds number is higher than in the Kovasznay case and we approach convection dominated flow regimes, we seek for a steady state solution starting from fluid at rest by means of of the pseudo-transient continuation strategy with SER time stepping. Besides adapting the time step, it is convenient to adapt the forcing terms, that is the relative relative tolerance triggering convergence of the linear system at each continuation step, we adopt the strategy proposed in [47]. The goal is to avoid oversolving of the linear system when the linearization of the residual is not sufficiently accurate to pay off in terms of convergence towards the steady state.
The smoothing and solver option for the multigrid preconditioner are the same that in the Stokes and Kovasznay case. One iteration of cycle is used as a preconditioner for the FMGRES(60) iteration. High-order modes of the error are smoothed with a single iteration of a right ILU preconditioned GMRES solver while we require a four order of magnitude residual decrease for the ILU preconditioned GMRES solver on level . In parallel computations ILU preconditioners are replaced with ASM preconditioners with one level of overlap, as we did for solving elliptic problems in parallel in Section 6.1.
The average and maximum convergence factors measured over the iterations are reported in Table 25. While the maximum convergence factors, usually observed in the terminal phase of the convergence (that is when the time step is large), are significantly affected by raising the Reynolds number, the average convergence factors are satisfactorily small at Reynold 5000. Interestingly increasing the polynomial degree is beneficial from the convergence rates viewpoint, very good performances are observed for .
Parallel 3D computations demonstrate that the convergence rates do not degrade, even if the number of mesh elements in each grid partition is remarkably small on the coarsest level, see Table 26. The trend observed in 2D is confirmed, raising the polynomial degree is advantageous from the convergence rate viewpoint. We remark that the third polynomial degree dG discretization on the hexahedral elements grid tops at approximatively 10M unknowns.
6.3.3 Cerebral aneurysm hemodynamics
In this section we apply the Backward Euler time integration strategy of Algorithm 1 to approximate the blood flow field in a pathological Internal Carotid Artery (ICA) reconstructed from medical images, see Figure 4.
In order to take into account the pulsatile flow behaviour Dirichlet boundary conditions are imposed at the circular inflow section relying on the Womersley analytical solution [48] and considering a physiological flow rate all along the cardiac cycle [49]. The average Reynolds number . Stress-free boundary conditions are imposed at the outflow section and no-slip boundary conditions are imposed at the vessel walls. We apply polynomial degree dG discretizations over the 270K hybrid grid generated with the open-source Vascular Modeling Toolkit (VMTK) [50]. Simulations are performed running in parallel on 16, 32 and 64 processes for first, second and third degree dG discretizations, respectively. The fixed time steps is chosen such that 150 numerical solution are computed in each cardiac cycle. The time integration strategy is initialized with fluid at rest and conducted for three cardiac cycles.
For solving the linearized systems of the BE method (21) one iteration of cycle is used as a preconditioner for the FMGRES(60) solver. High-order modes of the error are smoothed with a single iteration of an ASM preconditioned GMRES solver while we require a four order of magnitude residual decrease for the ASM preconditioned GMRES solver on level . ASM preconditioners employ one level of overlap between sub-domains and an ILU decompositions for each sub-domain matrix.
Table 27 reports the maximum and average convergence rates measured over the third cardiac cycle. Since the time step is fixed, the gap between maximum and average converge factors is narrower than in the pseudo-transient continuation strategy, cf. Table 26, and reflects the influence of varying the Reynolds number. In particular the maximum convergence factor is recorded during systole where convection is more pronounced as compared to diastole.
Even if the average convergence rates reported in Table 27 are less satisfactory that in the lid-driven cavity case, the fact that the linear system residual halves at each FGMRES iteration is a significant achievement. Hemodynamic computations are considered very challenging from the numerical solution viewpoint, to the point that even segregated Pressure Corrections strategies might require ad-hoc preconditioners [51].
7 Conclusions
This work demonstrates the feasibility and effectiveness of -multigrid preconditioners applied to high-order accurate dG discretizations of incompressible flow problems. In view of efficiency agglomeration based -multigrid strategies with inherited coarse grid operators are attractive because the expensive process of numerically integrating over agglomerated elements can be avoided in all but the preprocessing phase. Indeed, intergrid transfer operators can be computed once prior to the non-linear iteration, and stored for later use. In this work we introduced an effective strategy for improving performance of inherited coarse grid operators which exploits a rescaled Galerkin projection of the BR2 dG discretization stabilization term. Using a single iteration of preconditioned GMRES as smoothing strategy, the multigrid convergence is uniform with respect to the number of levels and the typical multigrid efficiency is closely approached on model problems. The ability to beat direct solvers on arbitrarily unstructured low quality grids and the appealing performance obtained on parallel real-life computations might revert the common belief that discontinuous Galerkin discretizations are more expensive to solve as compared to standard finite element and finite volume formulations.
Appendix A Implementation details: restriction of BR2 operators
We provide implementations details about the matrix-free implementation of the restriction of coarse grid operators. For the sake of brevity we consider inheritance of the BR2 bilinear forms, the Stokes and Navier-Stokes coarse grid operators can be obtained in a similar fashion. Note that BR2 coarse grid operators involve Galerkin projections for consistency terms, see Equation (71) and the rescaled Galerkin projection for the stabilization term, see Equation (72).
The matrices counterparts of the operators are sparse block matrices of size (the block size is ) composed of diagonal blocks and off-diagonal blocks . Off-diagonal blocks are responsible of the coupling between neighboring elements sharing a face . The coarse operators are obtained matrix-free as described in Algorithms 5 and 6.
Matrix restriction is performed contextually to fine matrix assembly so that stability term contributions , and consistency-symmetry terms contributions and , see Section 4.3.1, are restricted separately, before being collected into diagonal and off-diagonal blocks of the fine matrix.
It is interesting to remark that only a subset of the internal faces contributions on level is restricted on level . In particular we remark that all diagonal and off-diagonal contributions associated to facets that do not belong to the boundary of agglomerated elements on level are ignored in Algorithm 6. This optimization is permitted thanks to the local conservation properties of dG formulations.
Acknowledgements
We acknowledge the CINECA HPC facility for the availability of high performance computing resources and support within the agreement “Convenzione di Ateneo Università degli Studi di Bergamo”.
References
- [1]
M. Benzi, G. H. Golub, J. Liesen, Numerical solution of saddle point problems, ACTA NUMERICA 14 (2005) 1–137.
- [2]
K. J. Fidkowski, T. A. Oliver, J. Lu, D. L. Darmofal, p-multigrid solution of high-order discontinuous Galerkin discretizations of the compressible Navier-Stokes equations, J. Comput. Phys. 207 (1) (2005) 92–113.
doi:10.1016/j.jcp.2005.01.005.
- [3]
C. R. Nastase, D. J. Mavriplis, High-order discontinuous Galerkin methods using an hp-multigrid approach, Journal of Computational Physics 213 (1) (2006) 330 – 357.
doi:10.1016/j.jcp.2005.08.022.
- [4]
F. Bassi, A. Ghidoni, S. Rebay, P. Tesini, High-order accurate p-multigrid discontinuous Galerkin solution of the Euler equations, International Journal for Numerical Methods in Fluids 60 (8) (2009) 847–865.
- [5]
K. Shahbazi, D. J. Mavriplis, N. K. Burgess, Multigrid algorithms for high-order discontinuous Galerkin discretizations of the compressible Navier–Stokes equations, Journal of Computational Physics 228 (21) (2009) 7917–7940.
doi:10.1016/j.jcp.2009.07.013.
- [6]
J. Gopalakrishnan, G. Kanschat, A multilevel discontinuous Galerkin method, Numerische Mathematik 95 (3) (2003) 527–550.
- [7]
S. Brenner, J. Cui, T. Gudi, L.-Y. Sung, Multigrid algorithms for symmetric discontinuous Galerkin methods on graded meshes, Numerische Mathematik 119 (1) (2011) 21–47.
doi:10.1007/s00211-011-0379-y.
- [8]
P. F. Antonietti, M. Sarti, M. Verani, Multigrid algorithms for hp-discontinuous Galerkin discretizations of elliptic problems, SIAM Journal on Numerical Analysis 53 (1) (2015) 598–618.
- [9]
F. Prill, M. Lukáčová-Medvidová, R. Hartmann, Smoothed aggregation multigrid for the Discontinuous Galerkin method, SIAM Journal on Scientific Computing 31 (5) (2009) 3503–3528.
- [10]
P. F. Antonietti, S. Giani, P. Houston, -version composite discontinuous Galerkin methods for elliptic problems on complicated domains, SIAM Journal on Scientific Computing 35 (3) (2013) A1417–A1439.
- [11]
P. F. Antonietti, S. Giani, P. Houston, Domain decomposition preconditioners for discontinuous Galerkin methods for elliptic problems on complicated domains, Journal of Scientific Computing 60 (1) (2014) 203–227.
doi:10.1007/s10915-013-9792-y.
- [12]
P. F. Antonietti, P. Houston, I. Smears, A note on optimal spectral bounds for nonoverlapping domain decomposition preconditioners for hp-version discontinuous Galerkin methods, International Journal of Numerical Analysis and Modeling 13 (4) (2016) 513–524.
- [13]
P. F. Antonietti, P. Houston, X. Hu, M. Sarti, M. Verani, Multigrid algorithms for hp-version Interior Penalty Discontinuous Galerkin methods on polygonal and polyhedral meshes, eprint arXiv:1412.0913, submitted for publication.
- [14]
M. Wallraff, T. Leicht, Higher order multigrid algorithms for a discontinuous Galerkin RANS solver, in: 52nd Aerospace Sciences Meeting, no. 936 in AIAA SciTech, American Institute of Aeronautics and Astronautics, 2014, pp. 1055–1072.
- [15]
M. Wallraff, R. Hartmann, T. Leicht, Multigrid solver algorithms for DG methods and applications to aerodynamic flows, in: N. Kroll, C. Hirsch, F. Bassi, C. Johnston, K. Hillewaert (Eds.), IDIHOM: Industrialization of High-Order Methods - A Top-Down Approach, Vol. 128 of Notes on Numerical Fluid Mechanics and Multidisciplinary Design, Springer International Publishing, 2015, pp. 153–178.
doi:10.1007/978-3-319-12886-3_9.
- [16]
F. Bassi, L. Botti, A. Colombo, D. A. Di Pietro, P. Tesini, On the flexibility of agglomeration based physical space discontinuous Galerkin discretizations, Journal of Computational Physics 231 (1) (2012) 45 – 65.
doi:10.1016/j.jcp.2011.08.018.
- [17]
F. Bassi, L. Botti, A. Colombo, Agglomeration based physical frame dG discretizations: An attempt to be mesh free, Mathematical Models and Methods in Applied Sciences 24 (08) (2014) 1495–1539.
doi:10.1142/S0218202514400028.
- [18]
A. Cangiani, E. H. Georgoulis, P. Houston, hp-version discontinuous Galerkin methods on polygonal and polyhedral meshes, Mathematical Models and Methods in Applied Sciences 24 (10) (2014) 2009–2041.
doi:10.1142/S0218202514500146.
- [19]
S. Giani, P. Houston, hp-adaptive composite discontinuous Galerkin methods for elliptic problems on complicated domains, Numerical Methods for Partial Differential Equations 30 (4) (2014) 1342–1367.
- [20]
A. Cangiani, Z. Dong, E. H. Georgoulis, P. Houston, hp-version discontinuous Galerkin methods for advection-diffusion-reaction problems on polytopic meshes, Mathematical Modelling and Numerical Analysis 50 (3) (2016) 699–725.
- [21]
F. Bassi, S. Rebay, G. Mariotti, S. Pedinotti, M. Savini, A high-order accurate discontinuous finite element method for inviscid and viscous turbomachinery flows, in: R. Decuypere, G. Dibelius (Eds.), Proceedings of the 2nd European Conference on Turbomachinery Fluid Dynamics and Thermodynamics, Technologisch Instituut, Antwerpen, Belgium, 1997, pp. 99–108.
- [22]
I. Moulitsas, G. Karypis, MGridGen/ParmGridGen, Serial/Parallel library for generating coase meshes for multigrid methods, Technical Report Version 1.0, University of Minnesota, Department of Computer Science/Army HPC Research Center, http://www-users.cs.umn.edu/$\sim$moulitsa/software.html (2001).
- [23]
L. Botti, Influence of reference-to-physical frame mappings on approximation properties of discontinuous piecewise polynomial spaces, Journal of Scientific Computing 52 (3) (2012) 675–703.
- [24]
F. Bassi, L. Botti, A. Colombo, S. Rebay, Agglomeration based discontinuous Galerkin discretization of the Euler and Navier-Stokes equations, Computers & Fluids 61 (2012) 77–85.
doi:10.1016/j.compfluid.2011.11.002.
- [25]
S. C. Brenner, L. R. Scott, The Mathematical Theory of Finite Element Methods, 3rd Edition, Springer-Verlag, New York–Berlin–Heidelberg, 2008.
- [26]
D. A. Di Pietro, A. Ern, Mathematical Aspects of Discontinuous Galerkin Methods, Vol. 69 of Maths & Applications, Springer-Verlag, 2011.
- [27]
F. Brezzi, G. Manzini, D. Marini, P. Pietra, A. Russo, Discontinuous Galerkin approximations for elliptic problems, Numer. Methods Partial Differential Equations 16 (2000) 365–378.
- [28]
D. N. Arnold, F. Brezzi, B. Cockburn, D. Marini, Unified analysis of discontinuous Galerkin methods for elliptic problems, SIAM J. Numer. Anal. 39 (5) (2002) 1749–1779.
- [29]
F. Bassi, A. Crivellini, D. A. Di Pietro, S. Rebay, An artificial compressibility flux for the discontinuous Galerkin solution of the incompressible Navier-Stokes equations, J. Comput. Phys. 218 (2006) 794–815.
- [30]
D. A. Di Pietro, Analysis of a discontinuous Galerkin approximation of the Stokes problem based on an artificial compressibility flux, International Journal for Numerical Methods in Fluids 55 (8) (2007) 793–813.
- [31]
W. L. Briggs, V. E. Henson, S. F. McCormick, A multigrid tutorial (2nd ed.), Society for Industrial and Applied Mathematics, Philadelphia, PA, USA, 2000.
- [32]
B. F. Smith, P. E. Bjørstad, W. Gropp, Domain Decomposition: Parallel Multilevel Methods for Elliptic Partial Differential Equations, Cambridge University Press, 2004.
- [33]
U. Trottenberg, C. W. Oosterlee, A. Schüller, Multigrid, Academic Press, Inc., 2001.
- [34]
P. F. Antonietti, B. A. de Dios, S. C. Brenner, L. yeng Sung, Schwarz methods for a preconditioned WOPSIP method for elliptic problems, Computational Methods in Applied Mathematics Comput. Methods Appl. Math. 12 (3) (2012) 241–272.
- [35]
D. Schötzau, C. Schwab, A. Toselli, Mixed hp-DGFEM for incompressible flows, SIAM Journal on Numerical Analysis 40 (6) (2002) 2171–2194.
doi:10.1137/S0036142901399124.
- [36]
K. Shahbazi, P. F. Fischer, C. R. Ethier, A high-order Discontinuous Galerkin method for the unsteady incompressible Navier-Stokes equations, J. Comput. Phys. 222 (1) (2007) 391–407.
doi:10.1016/j.jcp.2006.07.029.
- [37]
S. Balay, S. Abhyankar, M. F. Adams, J. Brown, P. Brune, K. Buschelman, L. Dalcin, V. Eijkhout, W. D. Gropp, D. Kaushik, M. G. Knepley, L. C. McInnes, K. Rupp, B. F. Smith, S. Zampini, H. Zhang, H. Zhang, PETSc users manual, Tech. Rep. ANL-95/11 - Revision 3.7, Argonne National Laboratory (2016).
URL http://www.mcs.anl.gov/petsc
- [38]
T. Toulorge, C. Geuzaine, J.-F. Remacle, J. Lambrechts, Robust untangling of curvilinear meshes, Journal of Computational Physics 254 (2013) 8 – 26.
doi:10.1016/j.jcp.2013.07.022.
- [39]
Y. Saad, A flexible inner-outer preconditioned GMRES algorithm, SIAM Journal on Scientific Computing 14 (2) (1993) 461–469.
- [40]
S. Balay, S. Abhyankar, M. F. Adams, J. Brown, P. Brune, K. Buschelman, L. Dalcin, V. Eijkhout, W. D. Gropp, D. Kaushik, M. G. Knepley, L. C. McInnes, K. Rupp, B. F. Smith, S. Zampini, H. Zhang, H. Zhang, PETSc Web page, http://www.mcs.anl.gov/petsc (2016).
URL http://www.mcs.anl.gov/petsc
- [41]
S. Balay, W. D. Gropp, L. C. McInnes, B. F. Smith, Efficient management of parallelism in object oriented numerical software libraries, in: E. Arge, A. M. Bruaset, H. P. Langtangen (Eds.), Modern Software Tools in Scientific Computing, Birkhäuser Press, 1997, pp. 163–202.
- [42]
T. J. Tautges, R. Meyers, K. Merkley, C. Stimpson, C. Ernst, MOAB: a mesh-oriented database, SAND2004-1592, Sandia National Laboratories, report (Apr. 2004).
- [43]
G. Karypis, V. Kumar, METIS, a software package for partitioning unstructured graphs, partitioning meshes, and computing fill-reducing orderings of sparse matrices, Technical Report Version 4.0, University of Minnesota, Department of Computer Science/Army HPC Research Center (1998).
- [44]
C. Kelley, D. Keyes, Convergence analysis of pseudo-transient continuation, SIAM Journal on Numerical Analysis 35 (2) (1998) 508–523.
doi:10.1137/S0036142996304796.
- [45]
W. A. Mulder, B. Van Leer, Experiments with implicit upwind methods for the Euler equations, Journal of Computational Physics 59 (2) (1985) 232–246.
- [46]
L. I. G. Kovasznay, Laminar flow behind a two-dimensional grid, Mathematical Proceedings of the Cambridge Philosophical Society 44 (1948) 58–62.
doi:10.1017/S0305004100023999.
- [47]
L. Botti, A choice of forcing terms in inexact Newton iterations with application to pseudo-transient continuation for incompressible fluid flow computations, Applied Mathematics and Computation 266 (2015) 713 – 737.
doi:10.1016/j.amc.2015.05.136.
- [48]
J. R. Womersley, Method for the calculation of velocity, rate of flow and viscous drag in arteries when the pressure gradient is known, The Journal of Physiology 127 (3) (1955) 553–563.
- [49]
J. L. Cezeaux, A. van Grondelle, Accuracy of the inverse Womersley method for the calculation of hemodynamic variable, Annals of Biomedical Engineering 25 (3) (1997) 536–546.
- [50]
L. Antiga, M. Piccinelli, L. Botti, B. Ene-Iordache, A. Remuzzi, D. A. Steinman, An image-based modeling framework for patient-specific computational hemodynamics, Medical & Biological Engineering & Computing 46 (11) (2008) 1097–1112.
doi:10.1007/s11517-008-0420-1.
- [51]
F. Mut, R. Aubry, R. Löhner, J. R. Cebral, Fast numerical solutions of patient-specific blood flows in 3d arterial systems, International Journal for Numerical Methods in Biomedical Engineering 26 (1) (2010) 73–85.
The reference list from the paper itself. Each links out to its DOI / PubMed record.
- 1[1] M. Benzi, G. H. Golub, J. Liesen, Numerical solution of saddle point problems, ACTA NUMERICA 14 (2005) 1–137.
- 2[2] K. J. Fidkowski, T. A. Oliver, J. Lu, D. L. Darmofal, p-multigrid solution of high-order discontinuous Galerkin discretizations of the compressible Navier-Stokes equations, J. Comput. Phys. 207 (1) (2005) 92–113. doi:10.1016/j.jcp.2005.01.005 . · doi ↗
- 3[3] C. R. Nastase, D. J. Mavriplis, High-order discontinuous Galerkin methods using an hp-multigrid approach, Journal of Computational Physics 213 (1) (2006) 330 – 357. doi:10.1016/j.jcp.2005.08.022 . · doi ↗
- 4[4] F. Bassi, A. Ghidoni, S. Rebay, P. Tesini, High-order accurate p-multigrid discontinuous Galerkin solution of the Euler equations, International Journal for Numerical Methods in Fluids 60 (8) (2009) 847–865. doi:10.1002/fld.1917 . · doi ↗
- 5[5] K. Shahbazi, D. J. Mavriplis, N. K. Burgess, Multigrid algorithms for high-order discontinuous Galerkin discretizations of the compressible Navier–Stokes equations, Journal of Computational Physics 228 (21) (2009) 7917–7940. doi:10.1016/j.jcp.2009.07.013 . · doi ↗
- 6[6] J. Gopalakrishnan, G. Kanschat, A multilevel discontinuous Galerkin method, Numerische Mathematik 95 (3) (2003) 527–550. doi:10.1007/s 002110200392 . · doi ↗
- 7[7] S. Brenner, J. Cui, T. Gudi, L.-Y. Sung, Multigrid algorithms for symmetric discontinuous Galerkin methods on graded meshes, Numerische Mathematik 119 (1) (2011) 21–47. doi:10.1007/s 00211-011-0379-y . · doi ↗
- 8[8] P. F. Antonietti, M. Sarti, M. Verani, Multigrid algorithms for hp-discontinuous Galerkin discretizations of elliptic problems, SIAM Journal on Numerical Analysis 53 (1) (2015) 598–618. doi:10.1137/130947015 . · doi ↗
