"A three-level MPI+NUMA+Threads method for constructing parallel programs to solve hydrodynamic problems for cluster systems with multiprocessor NUMA nodes"
Bogachev K.Yu., Zhabitskiy Ya.V., Klimovsky A.A., Mirgasimov A.R., Semenko A.E.

A parallel implementation of the program solving the three-phase filtration problem for a viscous compressible fluid to run on cluster systems with distributed memory with multiprocessor NUMA nodes is discussed. The traditional approach of constructing parallel programs to run on clusters with distributed memory uses the MPI library. Due to the specificity of the problem, we have to deal with unstructured computational grids and with the necessity of modeling dynamically changing sets of wells passing through a large number of grid blocks. This fact introduces a number of difficulties when using the traditional approach: a disbalance in loading the computing nodes, an increased transfer between MPI processes, and an increased amount of memory in use. It is proposed a three-level MPI + NUMA + Threads method of constructing a parallel program whose purpose is to eliminate the above difficulties. The method implements the idea of projecting the cluster node architecture (multi-core and nonuniform memory access nodes) onto the architecture of the parallel program. A comparison of programs implemented by using the proposed method and the traditional approach in terms of speed and memory usage and some results of numerical experiments performed on a large number of real problems is analyzed.

Keywords: high-performance computing, hybrid MPI-threaded programs, NUMA systems, load balancing, filtration problem

Bogachev K.Yu., e-mail: bogachev@mech.math.msu.su;   Zhabitskiy Ya.V., e-mail: jjv@fromru.com;   Klimovsky A.A., e-mail: arseny@klimovsky.ru;   Mirgasimov A.R., e-mail: mirgasimov.almaz@gmail.com;   Semenko A.E., e-mail: asemenko@gmail.com – Moscow State University, Faculty of Mechanics and Mathematics; Leninskiye Gory 1, Moscow, 119899, Russia