#### Prof. Dr. Godehard Sutmann

Institute for Advanced Simulation/ICAMS

Forschungszentrum Jülich, Ruhr-Universität Bochum

##### Contact

- g[dot]sutmann[at]fz-juelich[dot]de
- +49 2461 61 6746
- personal website

##### Author IDs

- ORCID: 0000-0002-9004-604X
- Scopus: 6701490227
- Google Scholar: EmHxA6YAAAAJ

##### Hub

**Parallel hybrid Monte Carlo / Molecular Statics for Simulation of Solute Segregation in Solids**

Ganesan, H. and Longsworth, M. and Sutmann, G.*Journal of Physics: Conference Series*1740 (2021)A parallel hybrid Monte Carlo/molecular statics method is presented for studying segregation of interstitial atoms in the solid state. The method is based on the efficient use of virtual atoms as placeholders to find energetically favorable sites for interstitials in a distorted environment. MC trial moves perform an exchange between a randomly chosen virtual atom with a carbon atom followed by a short energy minimization via MS to relax the lattice distortion. The proposed hybrid method is capable of modeling solute segregation in deformed crystalline metallic materials with a moderate MC efficiency. To improve sampling efficiency, the scheme is extended towards a biased MC approach, which takes into account the history of successful trial moves in the system. Parallelization of the hybrid MC/MS method is achieved by a Manager-Worker model which applies a speculative execution of trial moves, which are asynchronously executed on the cores. The technique is applied to an Fe-C system including a dislocation as a symmetry breaking perturbation in the system. © Published under licence by IOP Publishing Ltd.view abstract 10.1088/1742-6596/1740/1/012001 **Towards blood flow in the virtual human: Efficient self-coupling of HemeLB: Virtual Human Blood Flow with HemeLB**

McCullough, J.W.S. and Richardson, R.A. and Patronis, A. and Halver, R. and Marshall, R. and Ruefenacht, M. and Wylie, B.J.N. and Odaker, T. and Wiedemann, M. and Lloyd, B. and Neufeld, E. and Sutmann, G. and Skjellum, A. and Kranzlmüller, D. and Coveney, P.V.*Interface Focus*11 (2021)Many scientific and medical researchers are working towards the creation of a virtual human- A personalized digital copy of an individual-that will assist in a patient's diagnosis, treatment and recovery. The complex nature of living systems means that the development of this remains a major challenge. We describe progress in enabling the HemeLB lattice Boltzmann code to simulate 3D macroscopic blood flow on a full human scale. Significant developments in memory management and load balancing allow near linear scaling performance of the code on hundreds of thousands of computer cores. Integral to the construction of a virtual human, we also outline the implementation of a self-coupling strategy for HemeLB. This allows simultaneous simulation of arterial and venous vascular trees based on human-specific geometries. © 2020 The Authors.view abstract 10.1098/rsfs.2019.0119rsfs20190119 **Examining Performance Portability with Kokkos for an Ewald Sum Coulomb Solver**

Halver, R. and Meinke, J.H. and Sutmann, G.*Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics)*12044 LNCS (2020)We have implemented the computation of Coulomb interactions in particle systems using the performance portable C++ framework Kokkos. Coulomb interactions are evaluated with an Ewald-sum-based solver, where the interactions are split into long- and short-range contributions. The short-range contributions are calculated using pair-wise contributions of particles while long-range interactions are calculated using Fourier sums. We evaluate the performance portability of the implementation on Intel CPUs, including Intel Xeon Phi, and Nvidia GPUs. © 2020, Springer Nature Switzerland AG.view abstract 10.1007/978-3-030-43222-5_4 **Kokkos implementation of an Ewald Coulomb solver and analysis of performance portability**

Halver, R. and Meinke, J.H. and Sutmann, G.*Journal of Parallel and Distributed Computing*138 (2020)We have implemented the computation of Coulomb interactions in particle systems using the performance portable C++ framework Kokkos. For the computation of the electrostatic interactions in particle systems we used an Ewald summation. This implementation we consider as a basis for a performance portability study. As target architectures we used Intel CPUs, including Intel Xeon Phi, as well as Nvidia GPUs. To provide a measure for performance portability we compute the number of needed operations and required cycles, i.e. runtime, and compare these with the measured runtime. Results indicate a similar quality of performance portability on all investigated architectures. © 2019 Elsevier Inc.view abstract 10.1016/j.jpdc.2019.12.003 **Optimized parallel simulations of analytic bond-order potentials on hybrid shared/distributed memory with MPI and OpenMP**

Teijeiro, C. and Hammerschmidt, T. and Drautz, R. and Sutmann, G.*International Journal of High Performance Computing Applications*33 (2019)Analytic bond-order potentials (BOPs) allow to obtain a highly accurate description of interatomic interactions at a reasonable computational cost. However, for simulations with very large systems, the high memory demands require the use of a parallel implementation, which at the same time also optimizes the use of computational resources. The calculations of analytic BOPs are performed for a restricted volume around every atom and therefore have shown to be well suited for a message passing interface (MPI)-based parallelization based on a domain decomposition scheme, in which one process manages one big domain using the entire memory of a compute node. On the basis of this approach, the present work focuses on the analysis and enhancement of its performance on shared memory by using OpenMP threads on each MPI process, in order to use many cores per node to speed up computations and minimize memory bottlenecks. Different algorithms are described and their corresponding performance results are presented, showing significant performance gains for highly parallel systems with hybrid MPI/OpenMP simulations up to several thousands of threads. © The Author(s) 2017.view abstract 10.1177/1094342017727060 **Spontaneous Fluctuations in Mesoscopic Simulations of Nematic Liquid Crystals**

Híjar, H. and Halver, R. and Sutmann, G.*Fluctuation and Noise Letters*18 (2019)We analyzed hydrodynamic fluctuations in nematic liquid crystals simulated by Multi-particle Collision Dynamics. Velocity effects on orientation were incorporated by allowing mesoscopic velocity gradients to exert torques on nematic particles. Backflow was included through an explicit application of angular momentum conservation during the collision events. We measured the spectra of hydrodynamic fluctuations and compared them with those derived from a linearized hydrodynamic scheme. Numerical results were found to reproduce the expected coupling between hydrodynamic modes, thus showing that the implementation simulates proper nematodynamic effects at the mesoscopic level. © 2019 World Scientific Publishing Company.view abstract 10.1142/S0219477519500111 **Benchmarking molecular dynamics with OpenCL on many-core architectures**

Halver, R. and Homberg, W. and Sutmann, G.*Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics)*10778 LNCS (2018)Molecular Dynamics (MD) is a widely used tool for simulations of particle systems with pair-wise interactions. Since large scale MD simulations are very demanding in computation time, parallelisation is an important factor. As in the current HPC environment different heterogeneous computing architectures are emerging, a benchmark tool for a representative number of these architectures is desirable. OpenCL as a platform-overarching standard provides the capabilities for such a benchmark. This paper describes the implementation of an OpenCL MD benchmark code and discusses the results achieved on different types of computing hardware. © Springer International Publishing AG, part of Springer Nature 2018.view abstract 10.1007/978-3-319-78054-2_23 **Function portability of molecular dynamics on heterogeneous parallel architectures with OpenCL**

Halver, R. and Homberg, W. and Sutmann, G.*Journal of Supercomputing*74 (2018)Classical molecular dynamics simulation for atomistic systems is implemented in OpenCL and benchmarked on a variety of different hardware platforms. Modifying the number of particles and system size in the study provides insight into characteristics of parallel compute platforms, where latency, data transfer, memory access characteristics and compute intense work can be identified as fingerprints in benchmark runs. Data layouts are compared, for which the access of structure-of-arrays shows best performance in most cases. It is demonstrated that function portability can be achieved straightforwardly with OpenCL, while performance portability lacks behind as various architectures strongly depend on specific vectorisation optimisation. © 2018, Springer Science+Business Media, LLC, part of Springer Nature.view abstract 10.1007/s11227-017-2232-2 **MC/MD coupling for scale bridging simulations of solute segregation in solids: An application study**

Ganesan, H. and Begau, C. and Sutmann, G.*Communications in Computer and Information Science*889 (2018)A parallel hybrid Monte Carlo/Molecular Dynamics coupled framework has been developed to overcome the time scale limitation in simulations of segregation of interstitial atoms in solids. Simulations were performed using the proposed coupling approach to demonstrate its potential to model carbon segregation in ferritic steels with a single dislocation. Many simulations were carried out for different background carbon concentrations. This paper is a first step towards understanding the effect of segregation of interstitial atoms in solids and its influence on dislocation mobility in external fields. To this end, we carried out MD simulations, where shear forces were applied to mechanically drive screw dislocation on configurations with segregated carbon atoms. The results are compared with a reference system containing homogeneously distributed carbon atoms where the influence of segregated carbon on dislocation mobility could be observed. Simulation results gave qualitative evidence that the local concentration of interstitial solutes like carbon provides a significant pinning effect for the dislocation. © Springer Nature Switzerland AG 2018.view abstract 10.1007/978-3-319-96271-9_7 **Parallelization comparison and optimization of a scale-bridging framework to model Cottrell atmospheres**

Ganesan, H. and Teijeiro, C. and Sutmann, G.*Computational Materials Science*155 (2018)Low carbon steels undergo strain aging when heat treated, which causes an increased yield strength that can be observed macroscopically. Such strengthening mechanism is driven by atomistic scale processes, i.e., solute segregation of carbon (C) or nitrogen interstitial atoms. Due to its low solubility, alloying elements can diffuse to defects (e.g., dislocations) and form the so-called Cottrell atmospheres. Consequently, the mobility of defects is strongly reduced because of the interaction with solutes, and higher stresses are needed to unpin them from the Cottrell atmosphere. As C segregation and atomistic motion take place at separate timescales, Classical Molecular Dynamics (MD) and Metropolis Monte Carlo (MC) are coupled in a unified framework to capture collective effects with underlying slow dynamics. The number of degrees of freedom and the need for large computational resources in this simulation requires the choice of an optimal parallelization technique for the MC part of such multi-scale simulations using an unbiased sampling of the configuration space. In the present work, two different parallel approaches for the MC routine applied to the simulation of Cottrell atmospheres are implemented and compared: (i) a manager-worker speculative scheme and (ii) a distributed manager-worker over a cell-based domain decomposition approach augmented by an efficient load balancing scheme. The parallel performance of different Fe-C containing defects with several millions of atoms is analyzed, and also the possible optimization of the efficiency of the MC solute segregation process is evaluated regarding energy minimization. © 2018 Elsevier B.V.view abstract 10.1016/j.commatsci.2018.08.055 **Cluster formation in stochastic disk systems**

Sutmann, G. and Ganesan, H. and Begau, C.*AIP Conference Proceedings*1863 (2017)The problem of randomly distributed disks is considered in the dilute regime in a two-dimensional domain. Disks are allowed to overlap and to form clusters which may be isolated or percolating. Depending on the number and size of the disks, distribution functions are obtained for different size and bond configurations of clusters. A statistical geometrical approach is taken to derive analytical probabilities for cluster formation in systems, where a maximum of four overlapping disks is considered. Monte Carlo computations are carried out to verify our theoretical approach which is shown to be in close agreement with numerical simulations. © 2017 Author(s).view abstract 10.1063/1.4992772 **Parallel multiphase field simulations with OpenPhase**

Tegeler, M. and Shchyglo, O. and Kamachali, R.D. and Monas, A. and Steinbach, I. and Sutmann, G.*Computer Physics Communications*215 (2017)The open-source software project OpenPhase allows the three-dimensional simulation of microstructural evolution using the multiphase field method. The core modules of OpenPhase and their implementation as well as their parallelization for a distributed-memory setting are presented. Especially communication and load-balancing strategies are discussed. Synchronization points are avoided by an increased halo-size, i.e. additional layers of ghost cells, which allow multiple stencil operations without data exchange. Load-balancing is considered via graph-partitioning and sub-domain decomposition. Results are presented for performance benchmarks as well as for a variety of applications, e.g. grain growth in polycrystalline materials, including a large number of phase fields as well as Mg–Al alloy solidification. Program summary Program Title: OpenPhase Program Files doi: http://dx.doi.org/10.17632/2mnv2fvkkk.1 Licensing provisions: GPLv3 Programming language: C++ Nature of problem: OpenPhase[1] allows the simulation of microstructure evolution during materials processing using the multiphase field method. In order to allow an arbitrary number of phase fields active parameter tracking is used, which can cause load imbalances in parallel computations. Solution method: OpenPhase solves the phase field equations using an explicit finite difference scheme. The parallel version of OpenPhase provides load-balancing using over-decomposition of the computational domain and graph-partitioning. Adaptive sub-domain sizes are used to minimize the computational overhead of the over-decomposition, while allowing appropriate load-balance. Additional comments including Restrictions and Unusual features: The distributed-memory parallelism in OpenPhase uses MPI. Shared-memory parallelism is implemented using OpenMP. The library uses C++11 features and therefore requires GCC version 4.7 or higher. [1] www.openphase.de © 2017 Elsevier B.V.view abstract 10.1016/j.cpc.2017.01.023 **Polymer conformations in ionic microgels in the presence of salt: Theoretical and mesoscale simulation results**

Kobayashi, H. and Halver, R. and Sutmann, G. and Winkler, R.G.*Polymers*9 (2017)We investigate the conformational properties of polymers in ionic microgels in the presence of salt ions by molecular dynamics simulations and analytical theory. A microgel particle consists of coarse-grained linear polymers, which are tetra-functionally crosslinked. Counterions and salt ions are taken into account explicitly, and charge-charge interactions are described by the Coulomb potential. By varying the charge interaction strength and salt concentration, we characterize the swelling of the polyelectrolytes and the charge distribution. In particular, we determine the amount of trapped mobile charges inside the microgel and the Debye screening length. Moreover, we analyze the polymer extension theoretically in terms of the tension blob model taking into account counterions and salt ions implicitly by the Debye-Hückel model. Our studies reveal a strong dependence of the amount of ions absorbed in the interior of the microgel on the electrostatic interaction strength, which is related to the degree of the gel swelling. This implies a dependence of the inverse Debye screening length k on the ion concentration; we find a power-law increase of k with the Coulomb interaction strength with the exponent 3/5 for a salt-free microgel and an exponent 1/2 for moderate salt concentrations. Additionally, the radial dependence of polymer conformations and ion distributions is addressed. © 2017 by the authors.view abstract 10.3390/polym9010015 **Complexity analysis of simulations with analytic bond-order potentials**

Teijeiro, C. and Hammerschmidt, T. and Seiser, B. and Drautz, R. and Sutmann, G.*Modelling and Simulation in Materials Science and Engineering*24 (2016)The modeling of materials at the atomistic level with interatomic potentials requires a reliable description of different bonding situations and relevant system properties. For this purpose, analytic bond-order potentials (BOPs) provide a systematic and robust approximation to density functional theory (DFT) and tight binding (TB) calculations at reasonable computational cost. This paper presents a formal analysis of the computational complexity of analytic BOP simulations, based on a detailed assessment of the most computationally intensive parts. Different implementation algorithms are presented alongside with optimizations for efficient numerical processing. The theoretical complexity study is complemented by systematic benchmarks of the scalability of the algorithms with increasing system size and accuracy level of the BOP approximation. Both approaches demonstrate that the computation of atomic forces in analytic BOPs can be performed with a similar scaling as the computation of atomic energies. © 2016 IOP Publishing Ltd.view abstract 10.1088/0965-0393/24/2/025008 **Efficient parallelization of analytic bond-order potentials for large-scale atomistic simulations**

Teijeiro, C. and Hammerschmidt, T. and Drautz, R. and Sutmann, G.*Computer Physics Communications*204 (2016)Analytic bond-order potentials (BOPs) provide a way to compute atomistic properties with controllable accuracy. For large-scale computations of heterogeneous compounds at the atomistic level, both the computational efficiency and memory demand of BOP implementations have to be optimized. Since the evaluation of BOPs is a local operation within a finite environment, the parallelization concepts known from short-range interacting particle simulations can be applied to improve the performance of these simulations. In this work, several efficient parallelization methods for BOPs that use three-dimensional domain decomposition schemes are described. The schemes are implemented into the bond-order potential code BOPfox, and their performance is measured in a series of benchmarks. Systems of up to several millions of atoms are simulated on a high performance computing system, and parallel scaling is demonstrated for up to thousands of processors. © 2016 Elsevier B.V. All rights reserved.view abstract 10.1016/j.cpc.2016.03.008 **Green's function enriched Poisson solver for electrostatics in many-particle systems**

Sutmann, G.*AIP Conference Proceedings*1738 (2016)A highly accurate method is presented for the construction of the charge density for the solution of the Poisson equation in particle simulations. The method is based on an operator adjusted source term which can be shown to produce exact results up to numerical precision in the case of a large support of the charge distribution, therefore compensating the discretization error of finite difference schemes. This is achieved by balancing an exact representation of the known Green's function of regularized electrostatic problem with a discretized representation of the Laplace operator. It is shown that the exact calculation of the potential is possible independent of the order of the finite difference scheme but the computational efficiency for higher order methods is found to be superior due to a faster convergence to the exact result as a function of the charge support. © 2016 Author(s).view abstract 10.1063/1.4952328 **Hydrodynamics in adaptive resolution particle simulations: Multiparticle collision dynamics**

Alekseeva, U. and Winkler, R.G. and Sutmann, G.*Journal of Computational Physics*314 (2016)A new adaptive resolution technique for particle-based multi-level simulations of fluids is presented. In the approach, the representation of fluid and solvent particles is changed on the fly between an atomistic and a coarse-grained description. The present approach is based on a hybrid coupling of the multiparticle collision dynamics (MPC) method and molecular dynamics (MD), thereby coupling stochastic and deterministic particle-based methods. Hydrodynamics is examined by calculating velocity and current correlation functions for various mixed and coupled systems. We demonstrate that hydrodynamic properties of the mixed fluid are conserved by a suitable coupling of the two particle methods, and that the simulation results agree well with theoretical expectations. © 2016 Elsevier Inc.view abstract 10.1016/j.jcp.2016.02.065 **Multi-threaded construction of neighbour lists for particle systems in openMP**

Halver, R. and Sutmann, G.*Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics)*9574 (2016)The construction of neighbour lists based on the linked cell method is investigated in the context of particle simulation methods within the OpenMP shared memory programming model. Various implementations are studied which avoid memory collisions and race conditions. Performance and optimisation considerations are made along with run time behaviour and memory requirements. Performance models are proposed, which reproduce the measured runtime behaviour and which provide insight into the performance dependence on specific system parameters. Benchmarks are performed for different implementations on a number of multi-core architectures and thread numbers up to 240 are considered on the Xeon Phi architecture in the SMT mode, so that performance can be studied for a large number of threads working concurrently on the construction of linked cells on a shared memory partition. © Springer International Publishing Switzerland 2016.view abstract 10.1007/978-3-319-32152-3_15 **Adaptive dynamic load-balancing with irregular domain decomposition for particle simulations**

Begau, C. and Sutmann, G.*Computer Physics Communications*190 (2015)We present a flexible and fully adaptive dynamic load-balancing scheme, which is designed for particle simulations of three-dimensional systems with short ranged interactions. The method is based on domain decomposition with non-orthogonal non-convex domains, which are constructed based on a local repartitioning of computational work between neighbouring processors. Domains are dynamically adjusted in a flexible way under the condition that the original topology is not changed, i.e. neighbour relations between domains are retained, which guarantees a fixed communication pattern for each domain during a simulation. Extensions of this scheme are discussed and illustrated with examples, which generalise the communication patterns and do not fully restrict data exchange to direct neighbours. The proposed method relies on a linked cell algorithm, which makes it compatible with existing implementations in particle codes and does not modify the underlying algorithm for calculating the forces between particles. The method has been implemented into the molecular dynamics community code IMD and performance has been measured for various molecular dynamics simulations of systems representing realistic problems from materials science. It is found that the method proves to balance the work between processors in simulations with strongly inhomogeneous and dynamically changing particle distributions, which results in a significant increase of the efficiency of the parallel code compared both to unbalanced simulations and conventional load-balancing strategies. © 2015 Elsevier B.V. All rights reserved.view abstract 10.1016/j.cpc.2015.01.009 **Large scale Molecular Dynamics simulation of microstructure formation during thermal spraying of pure copper**

Wang, T. and Begau, C. and Sutmann, G. and Hartmaier, A.*Surface and Coatings Technology*280 (2015)Thermal spray processes are widely used for the manufacture of advanced coating systems, e.g. metallic coatings for wear and corrosion protection. The desired coating properties are closely related to the microstructure, which is highly influenced by the processing parameters, such as temperature, size and velocity of the sprayed particles. In this paper, large scale Molecular Dynamics simulations are conducted to investigate the microstructure formation mechanisms during the spraying process of hot nano-particles onto a substrate at room temperature using pure copper as a benchmark material representing for a wider class of face-centered-cubic metals. To evaluate the influence of processing parameters on the coating morphology, a number of simulations are performed in which the initial temperature, size and velocity of copper particles are systematically varied in order to investigate the thermal and microstructural evolution during impaction. Two distinct types of microstructural formation mechanisms, resulting in different coating morphologies, are observed in the present investigation, which are either governed by plastic deformation or by the process of melting and subsequent solidification. Furthermore, a thermodynamically motivated model as a function of the particle temperature and velocity is developed, which predicts the microstructural mechanisms observed in the simulations. The results provide an elementary insight into the microstructure formation mechanisms on an atomistic scale, which can serve as basic input for continuum modeling of thermal spray process. © 2015 Published by Elsevier B.V.view abstract 10.1016/j.surfcoat.2015.08.034 **Comparison of scalable fast methods for long-range interactions**

Arnold, A. and Fahrenberger, F. and Holm, C. and Lenz, O. and Bolten, M. and Dachsel, H. and Halver, R. and Kabadshow, I. and Gähler, F. and Heber, F. and Iseringhausen, J. and Hofmann, M. and Pippig, M. and Potts, D. and Sutmann, G.*Physical Review E - Statistical, Nonlinear, and Soft Matter Physics*88 (2013)Based on a parallel scalable library for Coulomb interactions in particle systems, a comparison between the fast multipole method (FMM), multigrid-based methods, fast Fourier transform (FFT)-based methods, and a Maxwell solver is provided for the case of three-dimensional periodic boundary conditions. These methods are directly compared with respect to complexity, scalability, performance, and accuracy. To ensure comparable conditions for all methods and to cover typical applications, we tested all methods on the same set of computers using identical benchmark systems. Our findings suggest that, depending on system size and desired accuracy, the FMM- and FFT-based methods are most efficient in performance and stability. © 2013 American Physical Society.view abstract 10.1103/PhysRevE.88.063308 **GASPI - A partitioned global address space programming interface**

Alrutz, T. and Backhaus, J. and Brandes, T. and End, V. and Gerhold, T. and Geiger, A. and Grünewald, D. and Heuveline, V. and Jägersküpper, J. and Knüpfer, A. and Krzikalla, O. and Kügeler, E. and Lojewski, C. and Lonsdale, G. and Müller-Pfefferkorn, R. and Nagel, W. and Oden, L. and Pfreundt, F.-J. and Rahn, M. and Sattler, M. and Schmidtobreick, M. and Schiller, A. and Simmendinger, C. and Soddemann, T. and Sutmann, G. and Weber, H. and Weiss, J.-P.At the threshold to exascale computing, limitations of the MPI programming model become more and more pronounced. HPC programmers have to design codes that can run and scale on systems with hundreds of thousands of cores. Setting up accordingly many communication buffers, point-to-point communication links, and using bulk-synchronous communication phases is contradicting scalability in these dimensions. Moreover, the reliability of upcoming systems will worsen. © 2013 Springer-Verlag Berlin Heidelberg.view abstract 10.1007/978-3-642-35893-7-18 **Parallel Brownian dynamics simulations with the message-passing and PGAS programming models**

Teijeiro, C. and Sutmann, G. and Taboada, G.L. and Touriño, J.*Computer Physics Communications*184 (2013)The simulation of particle dynamics is among the most important mechanisms to study the behavior of molecules in a medium under specific conditions of temperature and density. Several models can be used to compute efficiently the forces that act on each particle, and also the interactions between them. This work presents the design and implementation of a parallel simulation code for the Brownian motion of particles in a fluid. Two different parallelization approaches have been followed: (1) using traditional distributed memory message-passing programming with MPI, and (2) using the Partitioned Global Address Space (PGAS) programming model, oriented towards hybrid shared/distributed memory systems, with the Unified Parallel C (UPC) language. Different techniques for domain decomposition and work distribution are analyzed in terms of efficiency and programmability, in order to select the most suitable strategy. Performance results on a supercomputer using up to 2048 cores are also presented for both MPI and UPC codes. © 2012 Elsevier B.V. All rights reserved.view abstract 10.1016/j.cpc.2012.12.015 **Parallel simulation of brownian dynamics on shared memory systems with OpenMP and unified parallel C**

Teijeiro, C. and Sutmann, G. and Taboada, G.L. and Touriño, J.*Journal of Supercomputing*65 (2013)The simulation of particle dynamics is an essential method to analyze and predict the behavior of molecules in a given medium. This work presents the design and implementation of a parallel simulation of Brownian dynamics with hydrodynamic interactions for shared memory systems using two approaches: (1) OpenMP directives and (2) the Partitioned Global Address Space (PGAS) paradigm with the Unified Parallel C (UPC) language. The structure of the code is described, and different techniques for work distribution are analyzed in terms of efficiency, in order to select the most suitable strategy for each part of the simulation. Additionally, performance results have been collected from two representative NUMA systems, and they are studied and compared against the original sequential code. © 2012 Springer Science+Business Media New York.view abstract 10.1007/s11227-012-0843-1 **Hydrodynamic fluctuations in thermostatted multiparticle collision dynamics**

Híjar, H. and Sutmann, G.*Physical Review E - Statistical, Nonlinear, and Soft Matter Physics*83 (2011)In this work we study the behavior of mesoscopic fluctuations of a fluid simulated by Multiparticle Collision Dynamics when this is applied together with a local thermostatting procedure that constrains the strength of temperature fluctuations. We consider procedures in which the thermostat interacts with the fluid at every simulation step as well as cases in which the thermostat is applied only at regular time intervals. Due to the application of the thermostat temperature fluctuations are forced to relax to equilibrium faster than they do in the nonthermostatted, constant-energy case. Depending on the interval of application of the thermostat, it is demonstrated that the thermodynamic state changes gradually from isothermal to adiabatic conditions. In order to exhibit this effect we compute from simulations diverse correlation functions of the hydrodynamic fluctuating fields. These correlation functions are compared with those predicted by a linearized hydrodynamic theory of a simple fluid in which a thermostat is applied locally. We find a good agreement between the model and the numerical results, which confirms that hydrodynamic fluctuations in Multiparticle Collision Dynamics in the presence of the thermostat have the properties expected for spontaneous fluctuations in fluids in contact with a heat reservoir. © 2011 American Physical Society.view abstract 10.1103/PhysRevE.83.046708 **Tumbling of polymers in semidilute solution under shear flow**

Huang, C.-C. and Sutmann, G. and Gompper, G. and Winkler, R.G.*EPL*93 (2011)The tumbling dynamics of individual polymers in semidilute solution is studied by large-scale non-equilibrium mesoscale hydrodynamic simulations. We find that the tumbling time is equal to the non-equilibrium relaxation time of the polymer end-to-end distance along the flow direction and strongly depends on concentration. In addition, the normalized tumbling frequency as well as the widths of the alignment distribution functions for a given concentration- dependent Weissenberg number exhibit a weak concentration dependence in the cross-over regime from a dilute to a semidilute solution. For semidilute solutions a universal behavior is obtained. This is a consequence of screening of hydrodynamic interactions at polymer concentrations exceeding the overlap concentration. Copyright © 2011 Europhysics Letters Association.view abstract 10.1209/0295-5075/93/54004 **Cell-level canonical sampling by velocity scaling for multiparticle collision dynamics simulations**

Huang, C.C. and Chatterji, A. and Sutmann, G. and Gompper, G. and Winkler, R.G.*Journal of Computational Physics*229 (2010)A local Maxwellian thermostat for the multiparticle collision dynamics algorithm is proposed. The algorithm is based on a scaling of the relative velocities of the fluid particles within a collision cell. The scaling factor is determined from the distribution of the kinetic energy within such a cell. Thereby the algorithm ensures that the distribution of the relative velocities is given by the Maxwell-Boltzmann distribution. The algorithm is particularly useful for non-equilibrium systems, where temperature has to be controlled locally. We perform various non-equilibrium simulations for fluids in shear and pressure-driven flow, which confirm the validity of the proposed simulation scheme. In addition, we determine the dynamic structure factors for fluids with and without thermostat, which exhibit significant differences due to suppression of the diffusive part of the energy transport of the isothermal system. © 2009 Elsevier Inc. All rights reserved.view abstract 10.1016/j.jcp.2009.09.024 **High-throughput parallel-I/O using sionlib for mesoscopic particle dynamics simulations on massively parallel computers**

Freche, J. and Frings, W. and Sutmann, G.*Advances in Parallel Computing*19 (2010)The newly developed parallel Input/Output-libray SIONlib is applied to the highly scalable parallel multiscale code MP2C, which couples a mesoscopic fluid method based on multi-particle collision dynamics to molecular dynamics. It is demonstrated that for fluid-benchmark systems, a significant improvement of scalability under production conditions can be achieved. It is shown that for the BlueGene/P architecture at Jülich a performance close to the bandwidth capacity of 4.7 GByte/sec can be obtained. The article discusses the ease of use of SIONlib from the point of view of application. © 2010 The authors and IOS Press. All rights reserved.view abstract 10.3233/978-1-60750-530-3-371 **Particle based simulations of complex systems with MP2C: Hydrodynamics and electrostatics**

Sutmann, G. and Westphal, L. and Bolten, M.*AIP Conference Proceedings*1281 (2010)Particle based simulation methods are well established paths to explore system behavior on microscopic to mesoscopic time and length scales. With the development of new computer architectures it becomes more and more important to concentrate on local algorithms which do not need global data transfer or reorganisation of large arrays of data across processors. This requirement strongly addresses long-range interactions in particle systems, i.e. mainly hydrodynamic and electrostatic contributions. In this article, emphasis is given to the implementation and parallelization of the Multi-Particle Collision Dynamics method for hydrodynamic contributions and a splitting scheme based on Multigrid for electrostatic contributions. Implementations are done for massively parallel architectures and are demonstrated for the IBM Blue Gene/P architecture Jugene in Jülich. © 2010 American Institute of Physics.view abstract 10.1063/1.3498216 **Particle methods on multicore architectures: Experiences and future plans**

Schiller, A. and Sutmann, G. and Martinell, L. and Bellens, P. and Badia, R.*AIP Conference Proceedings*1281 (2010)The requirement of high performance and memory for computer simulations is still growing. Due to hardware constraints like power consumption, heat dissipation and other physical limitations the development trend in high performance computing (HPC) tends to multicore design patterns. As new computational platforms become increasingly more complicated and heterogeneous, there is the need for portable programming models that easily enable the exploitation of these architectures. Additionally, algorithms are needed that are able to match the platform specific requirements and exploit their potential power. This work focuses on the particle-based algorithm Multiparticle Collision Dynamics (MPC) for the calculation of hydrody-namic properties of fluid and flow phenomena. This algorithm has already been ported to Cell Broadband Engine (Cell/BE) by using the high-level programming model Cell Superscalar (CellSs). Performance results of the Cell/BE implementation and a recently developed OpenMP version are presented. Furthermore, the possibilities for porting this application also to GPUs with minor effort are pointed out and a strategy for hybrid implementations to use multiple nodes in a cluster are examined. © 2010 American Institute of Physics.view abstract 10.1063/1.3498233 **Semidilute polymer solutions at equilibrium and under shear flow**

Huang, C.-C. and Winkler, R.G. and Sutmann, G. and Gompper, G.*Macromolecules*43 (2010)The properties of semidilute polymer solutions are investigated at equilibrium and under shear flow by mesoscale simulations, which combine molecular dynamics simulations and the multiparticle collision dynamics approach. In semidilute solution, intermolecular hydrodynamic and excluded volume interactions become increasingly important due to the presence of polymer overlap. At equilibrium, the dependence of the radius of gyration, the structure factor, and the zero-shear viscosity on the polymer concentration is determined and found to be in good agreement with scaling predictions. In shear flow, the polymer alignment and deformation are calculated as a function of concentration. Shear thinning, which is related to flow alignment and finite polymer extensibility, is characterized by the shear viscosity and the normal stress coefficients. © 2010 American Chemical Society.view abstract 10.1021/ma101836x

#### computational materials science

#### long range interactions

#### many-particle systems

#### modelling and simulation

#### molecular dynamics simulations

#### numerical methods

#### parallel computing

#### soft matter

#### statistical physics