#### Prof. Dr. Axel Klawonn

associated member

Mathematical Institute

University of Cologne

##### Contact

- axel[dot]klawonn[at]uni-koeln[dot]de
- +49 221 4707868
- personal website

##### Hub

**Combining machine learning and domain decomposition methods for the solution of partial differential equations—A review**

Heinlein, A. and Klawonn, A. and Lanser, M. and Weber, J.*GAMM Mitteilungen*44 (2021)Scientific machine learning (SciML), an area of research where techniques from machine learning and scientific computing are combined, has become of increasing importance and receives growing attention. Here, our focus is on a very specific area within SciML given by the combination of domain decomposition methods (DDMs) with machine learning techniques for the solution of partial differential equations. The aim of the present work is to make an attempt of providing a review of existing and also new approaches within this field as well as to present some known results in a unified framework; no claim of completeness is made. As a concrete example of machine learning enhanced DDMs, an approach is presented which uses neural networks to reduce the computational effort in adaptive DDMs while retaining their robustness. More precisely, deep neural networks are used to predict the geometric location of constraints which are needed to define a robust coarse space. Additionally, two recently published deep domain decomposition approaches are presented in a unified framework. Both approaches use physics-constrained neural networks to replace the discretization and solution of the subdomain problems of a given decomposition of the computational domain. Finally, a brief overview is given of several further approaches which combine machine learning with ideas from DDMs to either increase the performance of already existing algorithms or to create completely new methods. © 2021 The Authors. GAMM - Mitteilungen published by Wiley-VCH GmbH.view abstract 10.1002/gamm.202100001 **Estimating the time-dependent contact rate of sir and seir models in mathematical epidemiology using physics-informed neural networks**

Grimm, V. and Heinlein, A. and Klawonn, A. and Lanser, M. and Weber, J.*Electronic Transactions on Numerical Analysis*56 (2021)The course of an epidemic can often be successfully described mathematically using compartment models. These models result in a system of ordinary differential equations. Two well-known examples are the SIR and the SEIR models. The transition rates between the different compartments are defined by certain parameters that are specific for the respective virus. Often, these parameters are known from the literature or can be determined using statistics. However, the contact rate or the related effective reproduction number are in general not constant in time and thus cannot easily be determined. Here, a new machine learning approach based on physics-informed neural networks is presented that can learn the contact rate from given data for the dynamical systems given by the SIR and SEIR models. The new method generalizes an already known approach for the identification of constant parameters to the variable or time-dependent case. After introducing the new method, it is tested for synthetic data generated by the numerical solution of SIR and SEIR models. The case of exact and perturbed data is considered. In all cases, the contact rate can be learned very satisfactorily. Finally, the SEIR model in combination with physics-informed neural networks is used to learn the contact rate for COVID-19 data given by the course of the epidemic in Germany. The simulation of the number of infected individuals over the course of the epidemic, using the learned contact rate, shows a very promising accordance with the data. Copyright © 2022, Kent State University.view abstract 10.1553/etna_vol56s1 **Fully Algebraic Two-Level Overlapping Schwarz Preconditioners for Elasticity Problems**

Heinlein, A. and Hochmuth, C. and Klawonn, A.*Lecture Notes in Computational Science and Engineering*139 (2021)Different parallel two-level overlapping Schwarz preconditioners with Generalized Dryja–Smith–Widlund (GDSW) and Reduced dimension GDSW (RGDSW) coarse spaces for elasticity problems are considered. GDSW type coarse spaces can be constructed from the fully assembled system matrix, but they additionally need the index set of the interface of the corresponding nonoverlapping domain decomposition and the null space of the elasticity operator, i.e., the rigid body motions. In this paper, fully algebraic variants, which are constructed solely from the uniquely distributed system matrix, are compared to the classical variants which make use of this additional information; the fully algebraic variants use an approximation of the interface and an incomplete algebraic null space. Nevertheless, the parallel performance of the fully algebraic variants is competitive compared to the classical variants for a stationary homogeneous model problem and a dynamic heterogenous model problem with coefficient jumps in the shear modulus; the largest parallel computations were performed on 4096 MPI (Message Passing Interface) ranks. The parallel implementations are based on the Trilinos package FROSch. © 2021, Springer Nature Switzerland AG.view abstract 10.1007/978-3-030-55874-1_52 **Fully-coupled micro–macro finite element simulations of the Nakajima test using parallel computational homogenization**

Klawonn, A. and Lanser, M. and Rheinbach, O. and Uran, M.*Computational Mechanics*68 (2021)The Nakajima test is a well-known material test from the steel and metal industry to determine the forming limit of sheet metal. It is demonstrated how FE2TI, our highly parallel scalable implementation of the computational homogenization method FE2, can be used for the simulation of the Nakajima test. In this test, a sample sheet geometry is clamped between a blank holder and a die. Then, a hemispherical punch is driven into the specimen until material failure occurs. For the simulation of the Nakajima test, our software package FE2TI has been enhanced with a frictionless contact formulation on the macroscopic level using the penalty method. The appropriate choice of suitable boundary conditions as well as the influence of symmetry assumptions regarding the symmetric test setup are discussed. In order to be able to solve larger macroscopic problems more efficiently, the balancing domain decomposition by constraints (BDDC) approach has been implemented on the macroscopic level as an alternative to a sparse direct solver. To improve the computational efficiency of FE2TI even further, additionally, an adaptive load step approach has been implemented and different extrapolation strategies are compared. Both strategies yield a significant reduction of the overall computing time. Furthermore, a strategy to dynamically increase the penalty parameter is presented which allows to resolve the contact conditions more accurately without increasing the overall computing time too much. Numerically computed forming limit diagrams based on virtual Nakajima tests are presented. © 2021, The Author(s).view abstract 10.1007/s00466-021-02063-9 **Machine Learning in Adaptive FETI-DP: Reducing the Effort in Sampling**

Heinlein, A. and Klawonn, A. and Lanser, M. and Weber, J.*Lecture Notes in Computational Science and Engineering*139 (2021)The convergence rate of classic domain decomposition methods in general deteriorates severely for large discontinuities in the coefficient functions of the considered partial differential equation. To retain the robustness for such highly heterogeneous problems, the coarse space can be enriched by additional coarse basis functions. These can be obtained by solving local generalized eigenvalue problems on subdomain edges. In order to reduce the number of eigenvalue problems and thus the computational cost, we use a neural network to predict the geometric location of critical edges, i.e., edges where the eigenvalue problem is indispensable. As input data for the neural network, we use function evaluations of the coefficient function within the two subdomains adjacent to an edge. In the present article, we examine the effect of computing the input data only in a neighborhood of the edge, i.e., on slabs next to the edge. We show numerical results for both the training data as well as for a concrete test problem in form of a microsection subsection for linear elasticity problems. We observe that computing the sampling points only in one half or one quarter of each subdomain still provides robust algorithms. © 2021, Springer Nature Switzerland AG.view abstract 10.1007/978-3-030-55874-1_58 **Stationary Flow Predictions Using Convolutional Neural Networks**

Eichinger, M. and Heinlein, A. and Klawonn, A.*Lecture Notes in Computational Science and Engineering*139 (2021)Computational Fluid Dynamics (CFD) simulations are a numerical tool to model and analyze the behavior of fluid flow. However, accurate simulations are generally very costly because they require high grid resolutions. In this paper, an alternative approach for computing flow predictions using Convolutional Neural Networks (CNNs) is described; in particular, a classical CNN as well as the U-Net architecture are used. First, the networks are trained in an expensive offline phase using flow fields computed by CFD simulations. Afterwards, the evaluation of the trained neural networks is very cheap. Here, the focus is on the dependence of the stationary flow in a channel on variations of the shape and the location of an obstacle. CNNs perform very well on validation data, where the averaged error for the best networks is below 3%. In addition to that, they also generalize very well to new data, with an averaged error below 10%. © 2021, Springer Nature Switzerland AG.view abstract 10.1007/978-3-030-55874-1_53 **Topical issue scientific machine learning (2/2)**

Benner, P. and Klawonn, A. and Stoll, M.*GAMM Mitteilungen*44 (2021)view abstract 10.1002/gamm.202100010 **A Closer Look at Local Eigenvalue Solvers for Adaptive FETI-DP and BDDC**

Klawonn, A. and Kühn, M.J. and Rheinbach, O.*Lecture Notes in Computational Science and Engineering*138 (2020)In order to obtain a scalable domain decomposition method (DDM) for elliptic problems, a coarse space is necessary and an associated coarse problem has to besolved in each iteration. In the presence of arbitrary, large coefficient jumps or in case of almost incompressible elastic materials, the convergence rate of standard DDM deteriorates. © 2020, Springer Nature Switzerland AG.view abstract 10.1007/978-3-030-56750-7_26 **A frugal FETI-DP and BDDC coarse space for heterogeneous problems**

Heinlein, A. and Klawonn, A. and Lanser, M. and Weber, J.*Electronic Transactions on Numerical Analysis*53 (2020)The convergence rate of domain decomposition methods is generally determined by the eigenvalues of the preconditioned system. For second-order elliptic partial differential equations, coefficient discontinuities with a large contrast can lead to a deterioration of the convergence rate. Only by implementing an appropriate coarse space, or second level, a robust domain decomposition method can be obtained. In this article, a new frugal coarse space for FETI-DP (Finite Element Tearing and Interconnecting-Dual Primal) and BDDC (Balancing Domain Decomposition by Constraints) methods is presented, which has a lower set-up cost than competing adaptive coarse spaces. In particular, in contrast to adaptive coarse spaces, it does not require the solution of any local generalized eigenvalue problems. The approach considered here aims at a low-dimensional approximation of the adaptive coarse space by using appropriate weighted averages, and it is robust for a broad range of coefficient distributions for diffusion and elasticity problems. However, in general, for completely arbitrary coefficient distributions with high contrast, some additional, adaptively chosen constraints are necessary in order to guarantee robustness. In this article, the robustness is heuristically justified as well as numerically shown for several coefficient distributions. The new coarse space is compared to adaptive coarse spaces, and parallel scalability up to 262 144 parallel cores for a parallel BDDC implementation with the new coarse space is shown. The superiority of the new coarse space over classic coarse spaces with respect to parallel weak scalability and time-to-solution is confirmed by numerical experiments. Since the new frugal coarse space is computationally inexpensive, it could serve as a new default coarse space, which, for very challenging coefficient distributions, could then still be enhanced by adaptively chosen constraints. Copyright © 2020, Kent State University.view abstract 10.1553/ETNA_VOL53S562 **Coarse spaces for feti-DP and BDDC methods for heterogeneous problems: Connections of deflation and a generalized transformation-of-basis approach**

Klawonn, A. and Klawonn, A. and Kuhn, M. and Rheinbach, O.*Electronic Transactions on Numerical Analysis*52 (2020)In FETI-DP (Finite Element Tearing and Interconnecting) and BDDC (Balancing Domain Decomposition by Constraints) domain decomposition methods, the convergence behavior of the iterative scheme can be improved by implementing a coarse space using a transformation of basis and local assembly. This is an alternative to coarse spaces implemented by deflation or balancing. The transformation-of-basis approaches are more robust with respect to inexact solvers than deflation and therefore more suitable for multilevel extensions. In this paper, we show a correspondence of FETI-DP or BDDC methods using a generalized transformation-of-basis approach and of FETI-DP methods using deflation or balancing, where the deflation vectors are obtained from the transformation of basis. These methods then have essentially the same eigenvalues. As opposed to existing theory, this result also applies to general scalings and highly heterogeneous problems. We note that the new methods differ slightly from the classic FETI-DP and BDDC methods using a transformation of basis and that the classic theory has to be replaced. An important application for the theory presented in this paper are FETI-DP and BDDC methods with adaptive coarse spaces, i.e., where deflation vectors are obtained from approximating local eigenvectors. These methods have recently gained considerable interest. © 2020 Kent State University.view abstract 10.1553/etna_vol52s43 **Computational homogenization with million-way parallelism using domain decomposition methods**

Klawonn, A. and Köhler, S. and Lanser, M. and Rheinbach, O.*Computational Mechanics*65 (2020)Parallel computational homogenization using the well-knwon FE 2 approach is described and combined with domain decomposition and algebraic multigrid solvers. It is the purpose of this paper to show that and how the FE 2 method can take advantage of the largest supercomputers available and those of the upcoming exascale era for virtual material testing of micro-heterogeneous materials such as advanced steel. The FE 2 method is a computational micro-macro homogenization approach where at each Gauss integration point of the macroscopic finite element problem a microscopic finite element problem, defined on a representative volume element (RVE), is attached. Note that the FE 2 method is not embarrassingly parallel since the RVE problems are coupled through the macroscopic problem. Numerical results considering different grids on both, the macroscopic and microscopic level as well as weak scaling results for up to a million parallel processes are presented. © 2019, Springer-Verlag GmbH Germany, part of Springer Nature.view abstract 10.1007/s00466-019-01749-5 **Energy efficiency of nonlinear domain decomposition methods**

Klawonn, A. and Lanser, M. and Rheinbach, O. and Wellein, G. and Wittmann, M.*International Journal of High Performance Computing Applications*(2020)A nonlinear domain decomposition (DD) solver is considered with respect to improved energy efficiency. In this method, nonlinear problems are solved using Newton’s method on the subdomains in parallel and in asynchronous iterations. The method is compared to the more standard Newton-Krylov approach, where a linear domain decomposition solver is applied to the overall nonlinear problem after linearization using Newton’s method. It is found that in the nonlinear domain decomposition method, making use of the asynchronicity, some processor cores can be set to sleep to save energy and to allow better use of the power and thermal budget. Energy savings on average for each socket up to 77% (due to the RAPL hardware counters) are observed compared to the more traditional Newton-Krylov approach, which is synchronous by design, using up to 5120 Intel Broadwell (Xeon E5-2630v4) cores. The total time to solution is not affected. On the contrary, remaining cores of the same processor may be able to go to turbo mode, thus reducing the total time to solution slightly. Last, we consider the same strategy for the ASPIN (Additive Schwarz Preconditioned Inexact Newton) nonlinear domain decomposition method and observe a similar potential to save energy. © The Author(s) 2020.view abstract 10.1177/1094342020953891 **Exasteel: Towards a virtual laboratory for the multiscale simulation of dual-phase steel using high-performance computing**

Klawonn, A. and Lanser, M. and Uran, M. and Rheinbach, O. and Köhler, S. and Schröder, J. and Scheunemann, L. and Brands, D. and Balzani, D. and Gandhi, A. and Wellein, G. and Wittmann, M. and Schenk, O. and Janalík, R.*Lecture Notes in Computational Science and Engineering*136 (2020)We present a numerical two-scale simulation approach of the Nakajima test for dual-phase steel using the software package FE2TI, a highly scalable implementation of the well known homogenization method FE2. We consider the incorporation of contact constraints using the penalty method as well as the sample sheet geometries and adequate boundary conditions. Additional software features such as a simple load step strategy and prediction of an initial value by linear extrapolation are introduced. The macroscopic material behavior of dual-phase steel strongly depends on its microstructure and has to be incorporated for an accurate solution. For a reasonable computational effort, the concept of statistically similar representative volume elements (SSRVEs) is presented. Furthermore, the highly scalable nonlinear domain decomposition methods NL-FETI-DP and nonlinear BDDC are introduced and weak scaling results are shown. These methods can be used, e.g., for the solution of the microscopic problems. Additionally, some remarks on sparse direct solvers are given, especially to PARDISO. Finally, we come up with a computationally derived Forming Limit Curve (FLC). © The Author(s) 2020.view abstract 10.1007/978-3-030-47956-5_13 **FROSch: A Fast And Robust Overlapping Schwarz Domain Decomposition Preconditioner Based on Xpetra in Trilinos**

Heinlein, A. and Klawonn, A. and Rajamanickam, S. and Rheinbach, O.*Lecture Notes in Computational Science and Engineering*138 (2020)This article describes a parallel implementation of a two-level overlapping Schwarz preconditioner with the GDSW (Generalized Dryja–Smith–Widlund) coarse space described in previous work [12, 10, 15] into the Trilinos framework; cf. [16]. The software is a significant improvement of a previous implementation [12]; see Sec. 4 for results on the improved performance. © 2020, Springer Nature Switzerland AG.view abstract 10.1007/978-3-030-56750-7_19 **Local Spectra of Adaptive Domain Decomposition Methods**

Heinlein, A. and Klawonn, A. and Kühn, M.J.*Lecture Notes in Computational Science and Engineering*138 (2020)For second order elliptic partial differential equations, such as diffusion or elasticity, with arbitrary and high coefficient jumps, the convergence rate of domain decomposition methods with classical coarse spaces typically deteriorates. One remedy is the use of adaptive coarse spaces, which use eigenfunctions computed from local generalized eigenvalue problems to enrich the standard coarse space; see, e.g., [19, 6, 5, 4, 22, 23, 3, 16, 17, 14, 7, 8, 24, 1, 20, 2, 13, 21, 10, 9, 11]. This typically results in a condition number estimate of the form © 2020, Springer Nature Switzerland AG.view abstract 10.1007/978-3-030-56750-7_18 **Machine Learning in Adaptive FETI-DP – A Comparison of Smart and Random Training Data**

Heinlein, A. and Klawonn, A. and Lanser, M. and Weber, J.*Lecture Notes in Computational Science and Engineering*138 (2020)The convergence rate of classical domain decomposition methods for diffusion or elasticity problems usually deteriorates when large coefficient jumps occur along or across the interface between subdomains. In fact, the constant in the classical condition number bounds [11, 12] will depend on the coefficient jump. © 2020, Springer Nature Switzerland AG.view abstract 10.1007/978-3-030-56750-7_24 **Parallel adaptive FETI-DP using lightweight asynchronous dynamic load balancing**

Klawonn, A. and Kühn, M.J. and Rheinbach, O.*International Journal for Numerical Methods in Engineering*121 (2020)A parallel FETI-DP domain decomposition method using an adaptive coarse space is presented. The implementation builds on a recently introduced adaptive FETI-DP approach for elliptic problems in three dimensions and uses small, local eigenvalue problems for faces and, additionally, for a small number of edges. The condition number of the preconditioned operator then satisfies a bound that is independent of coefficient heterogeneities in the problem. The computational cost of the local eigenvalue problems is not negligible, and also a significant load imbalance can be introduced. As a remedy, certain eigenvalue problems are discarded by a theory-guided heuristic strategy, based on the diagonal entries of the stiffness matrices. Additionally, a lightweight pairwise dynamic load balancing strategy is implemented for the eigenvalue problems. The load balancing is supervised by an orchestrating rank using asynchronous point-to-point communication. The resulting method shows good weak and strong scalability up to thousands of cores while fast convergence is obtained even for heterogeneous problems. © 2019 John Wiley & Sons, Ltd.view abstract 10.1002/nme.6237 **Reduced dimension GDSW coarse spaces for monolithic Schwarz domain decomposition methods for incompressible fluid flow problems**

Heinlein, A. and Hochmuth, C. and Klawonn, A.*International Journal for Numerical Methods in Engineering*121 (2020)Monolithic preconditioners for incompressible fluid flow problems can significantly improve the convergence speed compared with preconditioners based on incomplete block factorizations. However, the computational costs for the setup and the application of monolithic preconditioners are typically higher. In this article, several techniques are applied to monolithic two-level generalized Dryja-Smith-Widlund (GDSW) preconditioners to further improve the convergence speed and the computing time. In particular, reduced dimension GDSW coarse spaces, restricted and scaled versions of the first level, hybrid, and parallel coupling of the levels, and recycling strategies are investigated. Using a combination of all these improvements, for a small time-dependent Navier-Stokes problem on 240 message passing interface (MPI) ranks, a reduction of 86% of the time-to-solution can be obtained. Even without applying recycling strategies, the time-to-solution can be reduced by more than 50% for a larger steady Stokes problem on 4608 MPI ranks. For the largest problems with 11 979 MPI ranks, the scalability deteriorates drastically for the monolithic GDSW coarse space. On the other hand, using the reduced dimension coarse spaces, good scalability up to 11 979 MPI ranks, which corresponds to the largest problem configuration fitting on the employed supercomputer, could be achieved. © 2019 The Authors. International Journal for Numerical Methods in Engineering published by John Wiley & Sons, Ltd.view abstract 10.1002/nme.6258 **A Three-Level Extension of the GDSW Overlapping Schwarz Preconditioner in Two Dimensions**

Heinlein, A. and Klawonn, A. and Rheinbach, O. and Röver, F.*Lecture Notes in Computational Science and Engineering*128 (2019)A three-level extension of the GDSW overlapping Schwarz preconditioner in two dimensions is presented, constructed by recursively applying the GDSW preconditioner to the coarse problem. Numerical results, obtained for a parallel implementation using the Trilinos software library, are presented for up to 90,000 cores of the JUQUEEN supercomputer. The superior weak parallel scalability of the three-level method is verified. For large problems and a large number of cores, the three-level method is faster by more than a factor of two, compared to the standard two-level method. The three-level method can also be expected to scale when the classical method will already be out-of-memory. © Springer Nature Switzerland AG 2019.view abstract 10.1007/978-3-030-14244-5_10 **Adaptive GDSW coarse spaces for overlapping schwarz methods in three dimensions**

Heinlein, A. and Klawonn, A. and Knepper, J. and Rheinbach, O.*SIAM Journal on Scientific Computing*41 (2019)A robust two-level overlapping Schwarz method for scalar elliptic model problems with highly varying coefficient functions is introduced. While the convergence of standard coarse spaces may depend strongly on the contrast of the coefficient function, the condition number bound of the new method is independent of the coefficient function. Indeed, the condition number only depends on a user-prescribed tolerance. The coarse space is based on discrete harmonic extensions of vertex, edge, and face interface functions, which are computed from the solutions of corresponding local generalized edge and face eigenvalue problems. The local eigenvalue problems are of the size of the edges and faces of the decomposition, and the eigenvalue problems can be constructed solely from the local subdomain stiffness matrices and the fully assembled global stiffness matrix. The new AGDSW (adaptive generalized Dryja-Smith-Widlund) coarse space always contains the classical GDSW coarse space by construction of the generalized eigenvalue problems. Numerical results supporting the theory are presented for several model problems in three dimensions using structured as well as unstructured meshes and unstructured decompositions. © 2019 Alexander Heinlein, Axel Klawonn, Jascha Knepper, Oliver Rheinbachview abstract 10.1137/18M1220613 **Machine learning in adaptive domain decomposition methods - Predicting the geometric location of constraints**

Heinlein, A. and Klawonn, A. and Lanser, M. and Weber, J.*SIAM Journal on Scientific Computing*41 (2019)Domain decomposition methods are robust and parallel scalable, preconditioned iterative algorithms for the solution of the large linear systems arising in the discretization of elliptic partial differential equations by finite elements. The convergence rate of these methods is generally determined by the eigenvalues of the preconditioned system. For second-order elliptic partial differential equations, coefficient discontinuities with a large contrast can lead to a deterioration of the convergence rate. A remedy can be obtained by enhancing the coarse space with elements, which are often called constraints, that are computed by solving small eigenvalue problems on portions of the interface of the domain decomposition, i.e., edges in two dimensions or faces and edges in three dimensions. In the present work, without restriction of generality, the focus is on two dimensions. In general, it is difficult to predict where these constraints have to be added, and therefore the corresponding local eigenvalue problems have to be computed, i.e., on which edges. Here, a machine learning based strategy using neural networks is suggested to predict the geometric location of these edges in a preprocessing step. This reduces the number of eigenvalue problems that have to be solved before the iteration. Numerical experiments for model problems and realistic microsections using regular decompositions as well as decompositions from graph partitioners are provided, showing very promising results. © 2019 Alexander Heinlein, Axel Klawonn, Martin Lanser, Janine Weber.view abstract 10.1137/18M1205364 **Monolithic overlapping Schwarz domain decomposition methods with GDSW coarse spaces for incompressible fluid flow problems**

Heinlein, A. and Hochmuth, C. and Klawonn, A.*SIAM Journal on Scientific Computing*41 (2019)Monolithic overlapping Schwarz preconditioners for saddle point problems of Stokes and Navier-Stokes type are presented. In order to obtain numerically scalable algorithms, coarse spaces obtained from the generalized Dryja-Smith-Widlund (GDSW) approach are used. Numerical results of our parallel implementation are presented for various incompressible fluid flow problems. In particular, cases are considered where the problem cannot or should not be reduced using local static condensation, e.g., Stokes or Navier-Stokes problems with continuous pressure spaces. In the new monolithic preconditioners, the local overlapping problems and the coarse problem are saddle point problems with the same structure as the original problem. Our parallel implementation of these preconditioners is based on the fast and robust overlapping Schwarz (FROSch) library, which is part of the Trilinos package ShyLU. The implementation is essentially algebraic in the sense that, for the class of problems presented here, the preconditioners can be constructed from the fully assembled stiffness matrix and information about the block structure of the problem. Further information about the geometry or the null space of the underlying problem can improve the performance compared to the default settings. Parallel scalability results for several thousand cores for Stokes and Navier-Stokes model problems are reported. Each of the local problems is solved using a direct solver in serial mode, whereas the coarse problem is solved using a direct solver in serial or message passing interface (MPI)-parallel mode or using an MPI-parallel iterative Krylov solver. © 2019 Alexander Heinlein, Christian Hochmuth, and Axel Klawonnview abstract 10.1137/18M1184047 **Multicore Performance Engineering of Sparse Triangular Solves Using a Modified Roofline Model**

Wittmann, M. and Hager, G. and Janalik, R. and Lanser, M. and Klawonn, A. and Rheinbach, O. and Schenk, O. and Wellein, G.*Proceedings - 2018 30th International Symposium on Computer Architecture and High Performance Computing, SBAC-PAD 2018*(2019)The Roofline model is widely used to visualize the performance of executed code together with the upper performance bounds given by the memory bandwidth and the processor peak performance. The model can thus provide an insightful visualization of bottlenecks. In this paper, we try to establish realistic bandwidth ceilings for the sparse triangular solve step of PARDISO, a leading sparse direct solver package, which is also part of the Intel MKL library. The performance of the forward and backward substitution process is analyzed and benchmarked for a representative set of sparse matrices on seven modern x86-type multicore architectures and the Knights Landing manycore architecture. It is shown how to accurately measure the necessary quantities also for threaded code, and the measurement approach, its validation, as well as limitations are discussed. Our modeling approach covers the serial and parallel execution phases, allowing for in-socket performance predictions. © 2018 IEEE.view abstract 10.1109/CAHPC.2018.8645938 **Preconditioning the coarse problem of BDDC methods-three-level, algebraic multigrid, and vertex-based preconditioners**

Klawonn, A. and Lanser, M. and Rheinbach, O. and Weber, J.*Electronic Transactions on Numerical Analysis*51 (2019)A comparison of three Balancing Domain Decomposition by Constraints (BDDC) methods with an approximate coarse space solver using the same software building blocks is attempted for the first time. The comparison is made for a BDDC method with an algebraic multigrid preconditioner for the coarse problem, a three-level BDDC method, and a BDDC method with a vertex-based coarse preconditioner. It is new that all methods are presented and discussed in a common framework. Condition number bounds are provided for all approaches. All methods are implemented in a common highly parallel scalable BDDC software package based on PETSc to allow for a simple and meaningful comparison. Numerical results showing the parallel scalability are presented for the equations of linear elasticity. For the first time, this includes parallel scalability tests for a vertex-based approximate BDDC method. Copyright © 2019, Kent State University.view abstract 10.1553/etna_vol51s432 **Adaptive FETI-DP and BDDC methods with a generalized transformation of basis for heterogeneous problems**

Klawonn, A. and Kühn, M. and Rheinbach, O.*Electronic Transactions on Numerical Analysis*49 (2018)In FETI-DP (Finite Element Tearing and Interconnecting) and BDDC (Balancing Domain Decomposition by Constraints) domain decomposition methods, the transformation-of-basis approach is used to improve the convergence by combining the local assembly with a change of basis. Suitable basis vectors can be constructed by the recently introduced adaptive coarse space approaches. The resulting FETI-DP and BDDC methods fulfill a condition number bound independent of heterogeneities in the problem. The adaptive method with a transformation of basis presented here builds on a recently introduced adaptive FETI-DP approach for elliptic problems in three dimensions and uses a coarse space constructed from solving small, local eigenvalue problems on closed faces and on a small number of edges. In contrast to our earlier work on adaptive FETI-DP, the coarse space correction is not implemented by using balancing (or deflation), which requires the use of an exact coarse space solver, but by using local transformations. This will make it simpler to extend the method to a large number of subdomains and large supercomputers. The recently established theory of a generalized transformation-of-basis approach yields a condition number estimate for the preconditioned operator that is independent of jumps of the coefficients across and inside subdomains when using the local adaptive constraints. It is shown that all results are also valid for BDDC. Numerical results are presented in three dimensions for FETI-DP and BDDC. We also provide a comparison of different scalings, i.e., deluxe, rho, stiffness, and multiplicity for our adaptive coarse space in 3D. Copyright © 2018, Kent State University.view abstract 10.1553/etna-vol49s1 **An adaptive gdsw coarse space for two-level overlapping schwarz methods in two dimensions**

Heinlein, A. and Klawonn, A. and Knepper, J. and Rheinbach, O.*Lecture Notes in Computational Science and Engineering*125 (2018)We propose robust coarse spaces for two-level overlapping Schwarz preconditioners, which are extensions of the energy minimizing coarse space known as GDSW (Generalized Dryja, Smith, Widlund). The resulting two-level methods with adaptive coarse spaces are robust for second order elliptic problems in two dimensions, even in presence of a highly heterogeneous coefficient function, and reduce to the standard GDSW algorithm if no additional coarse basis functions are used. © 2018, Springer International Publishing AG, part of Springer Nature.view abstract 10.1007/978-3-319-93873-8_35 **Improving the parallel performance of overlapping schwarz methods by using a smaller energy minimizing coarse space**

Heinlein, A. and Klawonn, A. and Rheinbach, O. and Widlund, O.B.*Lecture Notes in Computational Science and Engineering*125 (2018)We consider a recent overlapping Schwarz method with an energy-minimizing coarse space of reduced size. In numerical experiments for up to 64,000 cores, we show that the parallel efficiency and the total time to solution is improved significantly, compared to our previous overlapping Schwarz method using an alternative energy-minimizing coarse space. © 2018, Springer International Publishing AG, part of Springer Nature.view abstract 10.1007/978-3-319-93873-8_36 **Multiscale coarse spaces for overlapping schwarz methods based on the ACMS space in 2D**

Heinlein, A. and Klawonn, A. and Knepper, J. and Rheinbach, O.*Electronic Transactions on Numerical Analysis*48 (2018)Two-level overlapping Schwarz domain decomposition methods for second-order elliptic problems in two dimensions are proposed using coarse spaces constructed from the Approximate Component Mode Synthesis (ACMS) multiscale discretization approach. These coarse spaces are based on eigenvalue problems using Schur complements on subdomain edges. It is then shown that the convergence of the resulting preconditioned Krylov method can be controlled by a user-specified tolerance and thus can be made independent of heterogeneities in the coefficient of the partial differential equation. The relations of this new approach to other known adaptive coarse space approaches for overlapping Schwarz methods are also discussed. Compared to one of the competing adaptive approaches, the new coarse space can be significantly smaller. Compared to other competing approaches, the eigenvalue problems are significantly cheaper to solve, i.e., the dimension of the eigenvalue problems is minimal among the competing adaptive approaches under consideration. Our local eigenvalue problems can be solved using one iteration of LobPCG for essentially the same cost as a Cholesky-decomposition of a Schur complement on a subdomain edge. Copyright © 2018, Kent State University.view abstract 10.1553/etna_vol48s156 **Nonlinear BDDC methods with approximate solvers**

Klawonn, A. and Lanser, M. and Rheinbach, O.*Electronic Transactions on Numerical Analysis*49 (2018)New nonlinear BDDC (Balancing Domain Decomposition by Constraints) domain decomposition methods using inexact solvers for the subdomains and the coarse problem are proposed. In nonlinear domain decomposition methods, the nonlinear problem is decomposed before linearization to improve concurrency and robustness. For linear problems, the new methods are equivalent to known inexact BDDC methods. The new approaches are therefore discussed in the context of other known inexact BDDC methods for linear problems. Relations are pointed out, and the advantages of the approaches chosen here are highlighted. For the new approaches, using an algebraic multigrid method as a building block, parallel scalability is shown for more than half a million (524 288) MPI ranks on the JUQUEEN IBM BG/Q supercomputer (JSC Jülich, Germany) and on up to 193 600 cores of the Theta Xeon Phi supercomputer (ALCF, Argonne National Laboratory, USA), which is based on the recent Intel Knights Landing (KNL) many-core architecture. One of our nonlinear inexact BDDC domain decomposition methods is also applied to three-dimensional plasticity problems. Comparisons to standard Newton-Krylov-BDDC methods are provided. Copyright © 2018, Kent State University.view abstract 10.1553/etna_vol49s244 **On the accuracy of the inner newton iteration in nonlinear domain decomposition**

Klawonn, A. and Lanser, M. and Rheinbach, O. and Uran, M.*Lecture Notes in Computational Science and Engineering*125 (2018)We introduce an energy minimizing nonlinear preconditioner for our nonlinear FETI-DP methods, and we will show numerical results for some problems in two dimensions based on the scaled p-Laplace operator. The equivalence of nonlinear FETI-DP methods and specific right-preconditioned Newton-Krylov methods was already shown. In nonlinear FETI-DP methods, the preconditioner describes a nonlinear elimination process. In the variants proposed here, the evolution of a problem dependent global energy is controlled during the elimination process, which guarantees that the application of the nonlinear preconditioner does not increase the global energy. Often, stopping the inner Newton iteration early, based on the energy criterion, gives better performance of the overall method. In this paper, a comparison of the classical nonlinear FETI-DP methods with nonlinear FETI-DP methods using an energy minimizing nonlinear preconditioner is provided. © 2018, Springer International Publishing AG, part of Springer Nature.view abstract 10.1007/978-3-319-93873-8_41 **Preconditioning of iterative eigenvalue problem solvers in adaptive FETI-DP**

Klawonn, A. and Kühn, M. and Rheinbach, O.*Lecture Notes in Computational Science and Engineering*125 (2018)Adaptive FETI-DP and BDDC methods are robust methods that can be used for highly heterogeneous problems when standard approaches fail. In these approaches, local generalized eigenvalue problems are solved approximately, and the eigenvectors are used to enhance the coarse problem. Here, a few iterations of an approximate eigensolver are usually sufficient. Different preconditioning options for the iterative LOBPCG eigenvalue problem solver are considered. Numerical results are presented for linear elasticity problems with heterogeneous coefficients. © 2018, Springer International Publishing AG, part of Springer Nature.view abstract 10.1007/978-3-319-93873-8_39 **Using algebraic multigrid in inexact BDDC domain decomposition methods**

Klawonn, A. and Lanser, M. and Rheinbach, O.*Lecture Notes in Computational Science and Engineering*125 (2018)A highly scalable implementation of an inexact BDDC (Balancing Domain Decomposition by Constraints) method is presented, and scalability results for linear elasticity problems in two and three dimensions for up to 131,072 computational cores of the JUQUEEN BG/Q are shown. In this method, the inverse action of the partially coupled stiffness matrix is replaced by V-cycles of an AMG (algebraic multigrid) method. The use of classical AMG for systems of PDEs, based on a nodal coarsening approach is compared with a recent AMG method using an explicit interpolation of the rigid body motions (global matrix approach; GM). It is illustrated, that for systems of PDEs an appropriate AMG interpolation is mandatory for fast convergence, i.e., using exact interpolation of rigid body modes in elasticity. © 2018, Springer International Publishing AG, part of Springer Nature.view abstract 10.1007/978-3-319-93873-8_40 **Adaptive coarse spaces for FETI-DP in three dimensions with applications to heterogeneous diffusion problems**

Klawonn, A. and Kühn, M. and Rheinbach, O.*Lecture Notes in Computational Science and Engineering*116 (2017)A new adaptive coarse space approach including a condition number bound for FETI-DP or BDDC methods for problems with coefficient jumps inside subdomains and across subdomain boundaries in three dimensions is presented. The approach is based on a known adaptive coarse space approach enriched by a small number of additional local edge eigenvalue problems. Numerical results are presented for diffusion problems with heterogeneous coefficients supporting our theoretical findings. The problems considered also include random coefficients. © Springer International Publishing AG 2017.view abstract 10.1007/978-3-319-52389-7_18 **New nonlinear FETI-DP methods based on a partial nonlinear elimination of variables**

Klawonn, A. and Lanser, M. and Rheinbach, O. and Uran, M.*Lecture Notes in Computational Science and Engineering*116 (2017)We introduce two new nonlinear FETI-DP (Finite Element Tearing and Interconnecting-Dual-Primal) methods based on a partial nonlinear elimination of variables and provide a comparison to Newton-Krylov-FETI-DP, Nonlinear-FETI-DP-1, and Nonlinear-FETI-DP-2, which have already been described earlier. In contrast to classical Newton-Krylov-FETI-DP methods, where a geometrical decomposition after a Newton linearization is performed, in nonlinear FETI-DP methods, the discretized nonlinear problem is decomposed before linearization. The approaches helps to localize work and reduce communication and thus are better suited for modern computer architectures. © Springer International Publishing AG 2017.view abstract 10.1007/978-3-319-52389-7_20 **Newton-Krylov-FETI-DP with adaptive coarse spaces**

Klawonn, A. and Lanser, M. and Niehoff, B. and Radtke, P. and Rheinbach, O.*Lecture Notes in Computational Science and Engineering*116 (2017)A Newton-Krylov-FETI-DP method for solving nonlinear partial differential equations is presented. The FETI-DP method, which is applied in each Newton step, has an adaptively enriched coarse space to deal with ill-conditioned linearized operators. The adaptive coarse spaces are obtained by solving local generalized eigenvalue problems. Heuristic strategies to reduce the overhead caused by the adaptive computation of constraints are discussed and numerical examples for the p-Laplace equation are presented. © Springer International Publishing AG 2017.view abstract 10.1007/978-3-319-52389-7_19 **Parallel overlapping Schwarz with an energy-minimizing coarse space**

Heinlein, A. and Klawonn, A. and Rheinbach, O.*Lecture Notes in Computational Science and Engineering*116 (2017)Parallel results obtained with a new implementation of an overlapping Schwarz method using an energy minimizing coarse space are presented. We consider structured and unstructured domain decompositions for scalar elliptic and linear elasticity model problems in two dimensions. In particular, strong and weak parallel scalability studies for up to 1024 processor cores are presented for both types of problems. Additionally, weak scalability results for a three-dimensional linear elasticity model problem using up to 4096 processor cores are discussed. Finally, an application from fully-coupled fluid-structure interaction using a nonlinear hyperelastic material model for the structure is shown. © Springer International Publishing AG 2017.view abstract 10.1007/978-3-319-52389-7_36 **A highly scalable implementation of inexact nonlinear FETI-DP without sparse direct solvers**

Klawonn, A. and Lanser, M. and Rheinbach, O.*Lecture Notes in Computational Science and Engineering*112 (2016)A variant of a nonlinear FETI-DP domain decomposition method is considered. It is combined with a parallel algebraic multigrid method (Boomer-AMG) in a way which completely removes sparse direct solvers from the algorithm. Scalability to 524,288 MPI ranks is shown for linear elasticity and nonlinear hyperelasticity using more than half of the JUQUEEN supercomputer (JSC, Jülich; TOP500 rank: 11th). © Springer International Publishing Switzerland 2016.view abstract 10.1007/978-3-319-39929-4_25 **A Newton-Krylov-FETI-DP method with an adaptive coarse space applied to Elastoplasticity**

Klawonn, A. and Radtke, P. and Rheinbach, O.*Lecture Notes in Computational Science and Engineering*104 (2016)view abstract 10.1007/978-3-319-18827-0_28 **A nonlinear FETI-DP method with an inexact coarse problem**

Klawonn, A. and Lanser, M. and Rheinbach, O.*Lecture Notes in Computational Science and Engineering*104 (2016)view abstract 10.1007/978-3-319-18827-0_4 **A parallel implementation of a two-level overlapping schwarz method with energy-minimizing coarse space based on trilinos**

Heinlein, A. and Klawonn, A. and Rheinbach, O.*SIAM Journal on Scientific Computing*38 (2016)We describe a new implementation of a two-level overlapping Schwarz preconditioner with energy-minimizing coarse space (GDSW: generalized Dryja-Smith-Widlund) and show numerical results for an additive and a hybrid additive-multiplicative version. Our parallel implementation makes use of the Trilinos software library and provides a framework for parallel two-level Schwarz methods. We show parallel scalability for two- and three-dimensional scalar second-order elliptic and linear elasticity problems for several thousands of cores. We also discuss techniques for the parallel construction of coarse spaces which are also of interest for other parallel preconditioners and discretization methods using energy minimizing coarse functions. We finally show an application in monolithic fluid-structure interaction, where significant improvements are achieved compared to a standard algebraic, one-level overlapping Schwarz method. © 2016 Society for Industrial and Applied Mathematics.view abstract 10.1137/16M1062843 **Adaptive coarse spaces for BDDC with a transformation of basis**

Klawonn, A. and Radtke, P. and Rheinbach, O.*Lecture Notes in Computational Science and Engineering*104 (2016)view abstract 10.1007/978-3-319-18827-0_29 **Adaptive coarse spaces for FETI-DP in three dimensions**

Klawonn, A. and Kühn, M. and Rheinbach, O.*SIAM Journal on Scientific Computing*38 (2016)An adaptive coarse space approach including a condition number bound for dual primal finite element tearing and interconnecting (FETI-DP) methods applied to three dimensional problems with coefficient jumps inside subdomains and across subdomain boundaries is presented. The approach is based on a known adaptive coarse space approach enriched by a small number of additional local edge eigenvalue problems. These edge eigenvalue problems serve to make the method robust and permit a condition number bound which depends only on the tolerance of the local eigenvalue problems and some properties of the domain decomposition. The introduction of the edge eigenvalue problems thus turns a well-known condition number indicator for FETI-DP and balancing domain decomposition by constraints (BDDC) methods into a condition number estimate. Numerical results are presented for linear elasticity and heterogeneous materials supporting our theoretical findings. The problems considered include those with random coefficients and almost incompressible material components. © 2016 Axel Klawonn, Martin Kühn, Oliver Rheinbach.view abstract 10.1137/15M1049610 **FE2TI: Computational scale bridging for dual-phase steels**

Klawonn, A. and Lanser, M. and Rheinbach, O.*Advances in Parallel Computing*27 (2016)A scale bridging approach combining the FE2 method with parallel domain decomposition (FE2TI) is presented. The FE2TI approach is used in the project "EXASTEEL-Bridging Scales for Multiphase Steels" (within the German priority program "Software for Exascale Computing-SPPEXA") for the simulation of modern dual-phase steels. This approach incorporates phenomena on the microscale into the macroscopic problem by solving many independent microscopic problems on representative volume elements (RVEs). The results on the RVEs replace a phenomenological material law on the macroscale. In order to bring large micro-macro simulations to modern supercomputers, in the FE2TI approach a highly scalable implementation of the inexact reduced FETI-DP (Finite Element Tearing and Interconnecting-Dual Primal) domain decomposition method (scalable up to 786 432 Mira BlueGene/Q cores) is used as a solver on the RVEs. Weak scalability results for the FE2TI method are presented, filling the complete JUQUEEN at JSC Jülich (458 752 cores) and the complete Mira at Argonne National Laboratory (786 432 cores). © 2016 The authors and IOS Press. All rights reserved.view abstract 10.3233/978-1-61499-621-7-797 **Numerical modeling of fluid–structure interaction in arteries with anisotropic polyconvex hyperelastic and anisotropic viscoelastic material models at finite strains**

Balzani, D. and Deparis, S. and Fausten, S. and Forti, D. and Heinlein, A. and Klawonn, A. and Quarteroni, A. and Rheinbach, O. and Schröder, J.*International Journal for Numerical Methods in Biomedical Engineering*32 (2016)The accurate prediction of transmural stresses in arterial walls requires on the one hand robust and efficient numerical schemes for the solution of boundary value problems including fluid–structure interactions and on the other hand the use of a material model for the vessel wall that is able to capture the relevant features of the material behavior. One of the main contributions of this paper is the application of a highly nonlinear, polyconvex anisotropic structural model for the solid in the context of fluid–structure interaction, together with a suitable discretization. Additionally, the influence of viscoelasticity is investigated. The fluid–structure interaction problem is solved using a monolithic approach; that is, the nonlinear system is solved (after time and space discretizations) as a whole without splitting among its components. The linearized block systems are solved iteratively using parallel domain decomposition preconditioners. A simple – but nonsymmetric – curved geometry is proposed that is demonstrated to be suitable as a benchmark testbed for fluid–structure interaction simulations in biomechanics where nonlinear structural models are used. Based on the curved benchmark geometry, the influence of different material models, spatial discretizations, and meshes of varying refinement is investigated. It turns out that often-used standard displacement elements with linear shape functions are not sufficient to provide good approximations of the arterial wall stresses, whereas for standard displacement elements or F-bar formulations with quadratic shape functions, suitable results are obtained. For the time discretization, a second-order backward differentiation formula scheme is used. It is shown that the curved geometry enables the analysis of non-rotationally symmetric distributions of the mechanical fields. For instance, the maximal shear stresses in the fluid–structure interface are found to be higher in the inner curve that corresponds to clinical observations indicating a high plaque nucleation probability at such locations. Copyright © 2015 John Wiley & Sons, Ltd. Copyright © 2015 John Wiley & Sons, Ltd.view abstract 10.1002/cnm.2756 **One-way and fully-coupled FE2 methods for heterogeneous elasticity and plasticity problems: Parallel scalability and an application to thermo-elastoplasticity of dual-phase steels**

Balzani, D. and Gandhi, A. and Klawonn, A. and Lanser, M. and Rheinbach, O. and Schröder, J.*Lecture Notes in Computational Science and Engineering*113 (2016)In this paper, aspects of the two-scale simulation of dual-phase steels are considered. First, we present two-scale simulations applying a top-down oneway coupling to a full thermo-elastoplastic model in order to study the emerging temperature field. We find that, for our purposes, the consideration of thermomechanics at the microscale is not necessary. Second, we present highly parallel fully-coupled two-scale FE2 simulations, now neglecting temperature, using up to 458;752 cores of the JUQUEEN supercomputer at Forschungszentrum Jülich. The strong and weak parallel scalability results obtained for heterogeneous nonlinear hyperelasticity exemplify the massively parallel potential of the FE2 multiscale method. © Springer International Publishing Switzerland 2016.view abstract 10.1007/978-3-319-40528-5_5 **Parallel two-level overlapping Schwarz methods in fluid-structure interaction**

Heinlein, A. and Klawonn, A. and Rheinbach, O.*Lecture Notes in Computational Science and Engineering*112 (2016)Parallel overlapping Schwarz preconditioners are considered and applied to the structural block in monolithic fluid-structure interaction (FSI). The twolevel overlapping Schwarz method uses a coarse level based on energy minimizing functions. Linear elastic as well as nonlinear, anisotropic hyperelastic structural models are considered in an FSI problem of a pressure wave in a tube. Using our recent parallel implementation of a two-level overlapping Schwarz preconditioner based on the Trilinos library, the total computation time of our FSI benchmark problem was reduced by more than a factor of two compared to the algebraic onelevel overlapping Schwarz method used previously. Finally, also strong scalability for our FSI problem is shown for up to 512 processor cores. © Springer International Publishing Switzerland 2016.view abstract 10.1007/978-3-319-39929-4_50 **Scalability of classical algebraic multigrid for elasticity to half a million parallel tasks**

Baker, A.H. and Klawonn, A. and Kolev, T. and Lanser, M. and Rheinbach, O. and Yang, U.M.*Lecture Notes in Computational Science and Engineering*113 (2016)The parallel performance of several classical AlgebraicMultigrid (AMG) methods applied to linear elasticity problems is investigated. These methods include standard AMG approaches for systems of partial differential equations such as the unknown and hybrid approaches, as well as the more recent globalmatrix (GM) and local neighborhood (LN) approaches, which incorporate rigid body modes (RBMs) into the AMG interpolation operator. Numerical experiments are presented for both two- and three-dimensional elasticity problems on up to 131,072 cores (and 262,144 MPI processes) on the Vulcan supercomputer (LLNL, USA) and up to 262,144 cores (and 524,288 MPI processes) on the JUQUEEN supercomputer (JSC, Jülich, Germany). It is demonstrated that incorporating all RBMs into the interpolation leads generally to faster convergence and improved scalability. © Springer International Publishing Switzerland 2016.view abstract 10.1007/978-3-319-40528-5_6 **A deflation based coarse space in dual-primal feti methods for almost incompressible elasticity**

Gippert, S. and Klawonn, A. and Rheinbach, O.*Lecture Notes in Computational Science and Engineering*103 (2015)A new coarse space for FETI-DP domain decomposition methods for mixed finite element discretizations of almost incompressible linear elasticity problems in 3D is presented. The mixed finite element discretization uses continuous piecewise triquadratic displacements and discontinuous piecewise constant pressures. The piecewise constant pressure variables are statically condensated on the element level. The new coarse space is significantly smaller than earlier known coarse spaces for FETI-DP or BDDC methods for the equations of almost incompressible elasticity or Stokes’ equations. For discretizations with discontinuous pressure elements it is well-known that a zero net flux condition on each subdomain is needed to ensure a good condition number. Usually, this constraint is enforced for each vertex, edge, and face of each subdomain separately. Here, a coarse space is discussed where all vertex and edge constraints are treated as usual but where all faces of each subdomain contribute only a single constraint. This approach is presented within a deflation based framework for the implementation of coarse spaces into FETI-DP methods. © Springer International Publishing Switzerland 2015.view abstract 10.1007/978-3-319-10705-9_56 **FETI-DP methods with an adaptive coarse space**

Klawonn, A. and Radtke, P. and Rheinbach, O.*SIAM Journal on Numerical Analysis*53 (2015)A coarse space is constructed for the dual-primal finite element tearing and interconnecting (FETI-DP) domain decomposition method applied to highly heterogeneous problems by solving local generalized eigenvalue problems. For certain problems with highly varying coefficients, e.g., from multiscale simulations, the coefficient jump will appear in the condition number bound even if standard techniques such as scaling and the weighting of constraints are used. The FETI-DP theory is revisited and two central estimates are identified where the dependency on the coefficient contrast can enter the condition number bound. The first is a Poincaré inequality and the second an extension theorem. These estimates are replaced by local eigenvalue problems. Enriching the FETI-DP coarse space by a few numerically computed eigenvectors yields independence of the contrast of the coefficients even in challenging situations. © 2015 Axel Klawonn, Patrick Radtke, Oliver Rheinbach.view abstract 10.1137/130939675 **Hybrid mpi/openmp parallelization in feti-dp methods**

Klawonn, A. and Lanser, M. and Rheinbach, O. and Stengel, H. and Wellein, G.*Lecture Notes in Computational Science and Engineering*105 (2015)We present an approach to hybrid MPI/OpenMP parallelization in FETIDP methods using OpenMP with PETScCMPI in the finite element assembly and using the shared memory parallel direct solver Pardiso in the FETI-DP solution phase. Our approach thus uses OpenMP parallelization on subdomains and MPI in between subdomains. We investigate the efficiency of this approach for a benchmark problem from two dimensional nonlinear hyperelasticity. We observe good scalability for up to four threads for each MPI rank on a state-of-the-art Ivy Bridge architecture and incremental improvements for up to ten OpenMP threads for each MPI rank. © Springer International Publishing Switzerland 2015view abstract 10.1007/978-3-319-22997-3_4 **The approximate component mode synthesis special finite element method in two dimensions: Parallel implementation and numerical results**

Heinlein, A. and Hetmaniuk, U. and Klawonn, A. and Rheinbach, O.*Journal of Computational and Applied Mathematics*289 (2015)Abstract A special finite element method based on approximate component mode synthesis (ACMS) was introduced in Hetmaniuk and Lehoucq (2010). ACMS was developed for second order elliptic partial differential equations with rough or highly varying coefficients. Here, a parallel implementation of ACMS is presented and parallel scalability issues are discussed for representative examples. Additionally, a parallel domain decomposition preconditioner (FETI-DP) is applied to solve the ACMS finite element system. Weak parallel scalability results for ACMS are presented for up to 1024 cores. Our numerical results also suggest a quadratic-logarithmic condition number bound for the preconditioned FETI-DP method applied to ACMS discretizations. © 2015 Elsevier B.V.view abstract 10.1016/j.cam.2015.02.053 **Toward extremely scalable nonlinear domain decomposition methods for elliptic partial differential equations**

Klawonn, A. and Lanser, M. and Rheinbach, O.*SIAM Journal on Scientific Computing*37 (2015)The solution of nonlinear problems, e.g., in material science, requires fast and highly scalable parallel solvers. Finite element tearing and interconnecting dual primal (FETI-DP) domain decomposition methods are parallel solution methods for implicit problems discretized by finite elements. Recently, nonlinear versions of the well-known FETI-DP methods for linear problems have been introduced. In these methods, the nonlinear problem is decomposed before linearization. This approach can be viewed as a strategy to further localize computational work and to extend the parallel scalability of FETI-DP methods toward extreme-scale supercomputers. Here, a recent nonlinear FETI-DP method is combined with an approach that allows an inexact solution of the FETI-DP coarse problem. We combine the nonlinear FETI-DP domain decomposition method with an algebraic multigrid (AMG) method and thus obtain a hybrid nonlinear domain decomposition/multigrid method. We consider scalar nonlinear problems as well as nonlinear hyperelasticity problems in two and three space dimensions. For the first time for a domain decomposition method, weak parallel scalability can be shown beyond half a million cores and subdomains. We can show weak parallel scalability for up to 524 288 cores on the Mira Blue Gene/Q supercomputer for our new implementation and discuss the steps necessary to obtain these results. We solve a heterogeneous nonlinear hyperelasticity problem discretized using piecewise quadratic finite elements with a total of 42 billion degrees of freedom in about six minutes. Our analysis reveals that scalability beyond 524 288 cores depends critically on both efficient construction and solution of the coarse problem. © 2015 Society for Industrial and Applied Mathematics.view abstract 10.1137/140997907 **Nonlinear feti-dp and bddc methods**

Klawonn, A. and Lanser, M. and Rheinbach, O.*SIAM Journal on Scientific Computing*36 (2014)New nonlinear FETI-DP (dual-primal finite element tearing and interconnecting) and BDDC (balancing domain decomposition by constraints) domain decomposition methods are introduced. In all these methods, in each iteration, local nonlinear problems are solved on the subdomains. The new approaches can significantly reduce communication and show a significantly improved performance, especially for problems with localized nonlinearities, compared to a standard Newton-Krylov-FETI-DP or BDDC approach. Moreover, the coarse space of the nonlinear FETI-DP methods can be used to accelerate the Newton convergence. It is also found that the new nonlinear FETI-DP and nonlinear BDDC methods are not as closely related as in the linear context. Numerical results for the p-Laplace operator are presented. © 2014 Society for Industrial and Applied Mathematics.view abstract 10.1137/130920563 **On an Adaptive Coarse Space and on Nonlinear Domain Decomposition**

Klawonn, A. and Lanser, M. and Radtke, P. and Rheinbach, O.*Domain Decomposition Methods in Science and Engineering Xxi*98 (2014)view abstract 10.1007/978-3-319-05789-7_6 **A Simultaneous Augmented Lagrange Approach for the Simulation of Soft Biological Tissue**

Böse, D. and Brinkhues, S. and Erbel, R. and Klawonn, A. and Rheinbach, O. and Schröder, J.*Lecture Notes in Computational Science and Engineering*91 (2013)In this paper, we consider the elastic deformation of arterial walls as occurring, e.g., in the process of a balloon angioplasty, a common treatment in the case of atherosclerosis. Soft biological tissue is an almost incompressible material. To account for this property in finite element simulations commonly used free energy functions contain terms penalizing volumetric changes. The incorporation of such penalty terms can, unfortunately, spoil the convergence of the nonlinear iteration scheme, i.e., of Newton's method, as well as of iterative solvers applied for the solution of the linearized systems of equations. We show that the augmented Lagrange method can improve the convergence of the linear and nonlinear iteration schemes while, at the same time, implementing a guaranteed bound for the volumetric change. Our finite element model of an atherosclerotic arterial segment, see Fig. 1, is constructed from intravascular ultrasound images; for details see [4]. © Springer-Verlag Berlin Heidelberg 2013.view abstract 10.1007/978-3-642-35275-1_43 **Augmented Lagrange methods for quasi-incompressible materials-Applications to soft biological tissue**

Brinkhues, S. and Klawonn, A. and Rheinbach, O. and Schröder, J.*International Journal for Numerical Methods in Biomedical Engineering*29 (2013)Arterial walls in the healthy physiological regime are characterized by quasi-incompressible, anisotropic, hyperelastic material behavior. Polyconvex material functions representing such materials typically incorporate a penalty function to account for the incompressibility. Unfortunately, the penalty will affect the conditioning of the stiffness matrices. For high penalty parameters, the performance of iterative solvers will degrade, and when direct solvers are used, the quality of the solutions will deteriorate. In this paper, an augmented Lagrange approach is used to cope with the quasi-incompressibility condition. Here, the penalty parameter can be chosen much smaller, and as a consequence, the arising linear systems of equations have better properties. An improved convergence is then observed for the finite element tearing and interconnecting-dual primal domain decomposition method, which is used as an iterative solver. Numerical results for an arterial geometry obtained from ultrasound imaging are presented. © 2012 John Wiley & Sons, Ltd.view abstract 10.1002/cnm.2504 **FETI-DP for Elasticity with Almost Incompressible Material Components**

Gippert, S. and Klawonn, A. and Rheinbach, O.*Lecture Notes in Computational Science and Engineering*91 (2013)view abstract 10.1007/978-3-642-35275-1_41 **Analysis of FETI-DP and BDDC for linear elasticity in 3D with almost incompressible components and varying coefficients inside subdomains**

Gippert, S. and Klawonn, A. and Rheinbach, O.*SIAM Journal on Numerical Analysis*50 (2012)FETI-DP (dual-primal finite element tearing and interconnecting) methods are nonoverlapping domain decomposition methods which are used to solve large algebraic systems of equations that arise, e.g., from problems in linear elasticity. Good convergence bounds for problems of compressible linear elasticity are well known for two- and three-dimensional problems. More recently, FETI-DP and BDDC (balancing domain decomposition by constraints) methods have been developed that are robust also in the regime of homogeneous almost incompressible linear elasticity. The coarse space of such methods is large especially in 3D (three dimensions) and its implementation needs knowledge of geometrical information. Here, the convergence of FETI-DP methods for problems in 3D with almost incompressible inclusions or compressible inclusions with different material parameters embedded in a compressible matrix material is analyzed. For such problems, where the material is compressible in the vicinity of the subdomain interface, a polylogarithmic condition number estimate is shown for the preconditioned FETI-DP system. This bound depends only on the thickness of the compressible hull but is otherwise independent of coefficient jumps between subdo-mains and also between the hull and the inclusion. The bound is also valid for corresponding BDDC methods. The new contribution of the current paper is a theory that provides condition number bounds for the case of varying incompressibility and also varying Young moduli inside subdomains without changing the coarse space. © 2012 Society for Industrial and Applied Mathematics.view abstract 10.1137/110838315 **Deflation, projector preconditioning, and balancing in iterative substructuring methods: Connections and new results**

Klawonn, A. and Rheinbach, O.*SIAM Journal on Scientific Computing*34 (2012)In this paper, projector preconditioning, also known as the deflation method, as well as the balancing preconditioner are applied to the dual-primal finite element tearing and interconnecting (FETI-DP) and balancing domain decomposition by constraints (BDDC) methods in order to create a second, independent coarse problem. This may help to extend the parallel scalability of classical FETI-DP and BDDC methods without the use of inexact solvers and may also be used to improve the robustness, e.g., for almost incompressible elasticity problems. Connections of FETIDP methods applying a transformation of basis using a larger coarse space with a corresponding FETI-DP method using projector preconditioning or balancing are pointed out. It is then shown that the methods have essentially the same spectrum. Numerical results for compressible and almost incompressible linear elasticity are provided. The sensitivity of the projection methods to an inexact computation of the projections is numerically investigated and a different behavior for projector preconditioning and the balancing preconditioner is found. © 2012 Society for Industrial and Applied Mathematics.view abstract 10.1137/100811118 **Parallel simulation of patient-specific atherosclerotic arteries for the enhancement of intravascular ultrasound diagnostics**

Balzani, D. and Böse, D. and Brands, D. and Erbel, R. and Klawonn, A. and Rheinbach, O. and Schröder, J.*Engineering Computations (Swansea, Wales)*29 (2012)Purpose - The purpose of this paper is to present a computational framework for the simulation of patient-specific atherosclerotic arterial walls. Such simulations provide information regarding the mechanical stress distribution inside the arterial wall and may therefore enable improved medical indications for or against medical treatment. In detail, the paper aims to provide a framework which takes into account patient-specific geometric models obtained by in vivo measurements, as well as a fast solution strategy, giving realistic numerical results obtained in reasonable time. Design/methodology/approach - A method is proposed for the construction of three-dimensional geometrical models of atherosclerotic arteries based on intravascular ultrasound virtual histology data combined with angiographic X-ray images, which are obtained on a routine basis in the diagnostics and medical treatment of cardiovascular diseases. These models serve as a basis for finite element simulations where a large number of unknowns need to be calculated in reasonable time. Therefore, the finite element tearing and interconnecting-dual primal (FETI-DP) domain decomposition method is applied, to achieve an efficient parallel solution strategy. Findings - It is shown that three-dimensional models of patient-specific atherosclerotic arteries can be constructed from intravascular ultrasound virtual histology data. Furthermore, the application of the FETI-DP domain decomposition method leads to a fast numerical framework. In a numerical example, the importance of three-dimensional models and thereby fast solution algorithms is illustrated by showing that two-dimensional approximations differ significantly from the 3D solution. Originality/value - The decision for or against intravascular medical treatment of atherosclerotic arteries strongly depends on the mechanical situation of the arterial wall. The framework presented in this paper provides computer simulations of stress distributions, which therefore enable improved indications for medical methods of treatment. © Emerald Group Publishing Limited.view abstract 10.1108/02644401211271645 **Projector preconditioning and transformation of basis in FETI-DP algorithms for contact problems**

Jarošová, M. and Klawonn, A. and Rheinbach, O.*Mathematics and Computers in Simulation*82 (2012)Two strategies, using edge averages, for FETI-DP (dual-primal finite element tearing and interconnecting) methods for contact problems are considered. The first one is a preconditioning technique by a conjugate projector, where the Lagrange multipliers corresponding to the variables of the coinciding edges are aggregated. The second one is an explicit transformation of basis introducing edge averages as new, additional primal variables. It is shown that both methods iterate in the same space and thus have the same rate of convergence. The theoretical result is confirmed by the solution of a model boundary variational inequality. © 2011 IMACS. Published by Elsevier B.V. All rights reserved.view abstract 10.1016/j.matcom.2010.10.031 **FETI-DP domain decomposition methods for elasticity with structural changes: P-elasticity**

Klawonn, A. and Neff, P. and Rheinbach, O. and Vanis, S.*ESAIM: Mathematical Modelling and Numerical Analysis*45 (2011)We consider linear elliptic systems which arise in coupled elastic continuum mechanical models. In these systems, the strain tensor εP:= sym (P-1∇u) is redefined to include a matrix valued inhomogeneity P(x) which cannot be described by a space dependent fourth order elasticity tensor. Such systems arise naturally in geometrically exact plasticity or in problems with eigenstresses. The tensor field P induces a structural change of the elasticity equations. For such a model the FETI-DP method is formulated and a convergence estimate is provided for the special case that P-T = ∇ψ is a gradient. It is shown that the condition number depends only quadratic-logarithmically on the number of unknowns of each subdomain. The dependence of the constants of the bound on P is highlighted. Numerical examples confirm our theoretical findings. Promising results are also obtained for settings which are not covered by our theoretical estimates. © EDP Sciences, SMAI, 2010.view abstract 10.1051/m2an/2010067 **Highly scalable parallel domain decomposition methods with an application to biomechanics**

Klawonn, A. and Rheinbach, O.*ZAMM Zeitschrift fur Angewandte Mathematik und Mechanik*90 (2010)Highly scalable parallel domain decomposition methods for elliptic partial differential equations are considered with a special emphasis on problems arising in elasticity. The focus of this survey article is on Finite Element Tearing and Interconnecting (FETI) methods, a family of nonoverlapping domain decomposition methods where the continuity between the subdomains, in principle, is enforced by the use of Lagrange multipliers. Exact onelevel and dual-primal FETI methods as well as related inexact dual-primal variants are described and theoretical convergence estimates are presented together with numerical results confirming the parallel scalability properties of these methods. New aspects such as a hybrid onelevel FETI/FETI-DP approach and the behavior of FETI-DP for anisotropic elasticity problems are presented. Parallel and numerical scalability of the methods for more than 65 000 processor cores of the JUGENE supercomputer is shown. An application of a dual-primal FETI method to a nontrivial biomechanical problem from nonlinear elasticity, modeling arterial wall stress, is given, showing the robustness of our domain decomposition methods for such problems. © 2010 WILEY-VCH Verlag GmbH & Co. KGaA, Weinheim.view abstract 10.1002/zamm.200900329 **On the mechanical modeling of anisotropic biological soft tissue and iterative parallel solution strategies**

Balzani, D. and Brands, D. and Klawonn, A. and Rheinbach, O. and Schröder, J.*Archive of Applied Mechanics*80 (2010)Biological soft tissues appearing in arterial walls are characterized by a nearly incompressible, anisotropic, hyperelastic material behavior in the physiological range of deformations. For the representation of such materials we apply a polyconvex strain energy function in order to ensure the existence of minimizers and in order to satisfy the Legendre-Hadamard condition automatically. The 3D discretization results in a large system of equations; therefore, a parallel algorithm is applied to solve the equilibrium problem. Domain decomposition methods like the Dual-Primal Finite Element Tearing and Interconnecting (FETI-DP) method are designed to solve large linear systems of equations, that arise from the discretization of partial differential equations, on parallel computers. Their numerical and parallel scalability, as well as their robustness, also in the incompressible limit, has been shown theoretically and in numerical simulations. We are using a dual-primal FETI method to solve nonlinear, anisotropic elasticity problems for 3D models of arterial walls and present some preliminary numerical results. © 2009 Springer-Verlag.view abstract 10.1007/s00419-009-0379-x **Solving geometrically exact micromorphic elasticity with a staggered algorithm**

Klawonn, A. and Neff, P. and Rheinbach, O. and Vanis, S.*GAMM Mitteilungen*33 (2010)A minimization problem modeling geometrically exact generalized continua of micromorphic type is considered. The solution consists of two fields, the elastic deformation φ{symbol} of a given body and a tensorial field P which can model different additional features needed for a more reliable description of solids. For the solution of this minimization problem, a staggered algorithm is introduced which decouples the original problem into two separate problems. In each of these subproblems, one of the variables, φ{symbol} or P, respectively, is kept fixed and the subproblem is solved for the remaining variables, i.e., P or φ{symbol} respectively. Each of the problems is discretized with finite elements. Numerical results are presented for a cubic and a cylindrical geometry. © 2010 WILEY-VCH Verlag GmbH & Co. KGaA, Weinheim.view abstract 10.1002/gamm.201010005

#### elasticity

#### finite element method

#### number theory

#### numerical methods