Reproducibility and Performance of the Feltor Code on Parallel Architectures

Abstract. Feltor is both a numerical library and a scientific software package built on top it. Its main target are two- and three-dimensional drift- and gyro-fluid simulations with discontinuous Galerkin methods as the main numerical discretization technique. Feltor allows developing platform independent code that runs on a variety of parallel computer architectures ranging from laptop CPUs to hybrid CPU+GPU distributed memory systems. We investigate reproducibility since we observe that numerical simulations of a recently developed gyro-fluid model produces non-deterministic results in parallel computations. We show how we can restore bitwise reproducibility algorithmically and programmatically. Furthermore, we explore important performance tuning considerations and discuss latencies and bandwidths of elementary subroutines necessary to implement the aforementioned algorithms and equations. We propose a parallel performance model that predicts the execution time of algorithms implemented in Feltor and test our model on a selection of parallel hardware architectures. We are able to predict the execution time of more complex algorithms with a relative error of less than 25\,% for problem sizes between 10^−1
and 103 MB.

Authors

    Roman Iakymchuk, KTH Royal Institute of Technology, Sweden and Fraunhofer ITWM, Germany, riakymch@kth.se
    Matthias Wiesenberger, Technical University of Denmark, Denmark, mattwi@fysik.dtu.dk
    Stef Graillat, Sorbonne University, France, stef.graillat@lip6.fr