Reproducible and Accurate Algorithms for Numerical Linear Algebra

Abstract. On modern multi-core, many-core, and heterogeneous architectures, floating-point computations,
especially reductions, may become non-deterministic and, therefore, non-reproducible mainly due to the
non-associativity of floating-point operations and the dynamic scheduling. We address the problem of
reproducibility in the context of fundamental linear algebra operations -- like the ones included in the Basic
Linear Algebra Subprograms (BLAS) library -- and propose algorithms that yields both reproducible and accurate
results. We extend this approach to the higher level linear algebra algorithms, e.g. the LU factorization, that are
built on top of these BLAS kernels.

We present these reproducible and accurate algorithms for the BLAS routines and the LU factorization as well
as their implementations in parallel environments such as Intel server CPUs, Intel Xeon Phi, and both NVIDIA
and AMD GPUs. We show that the performance of the proposed implementations is comparable to the standard
ones.

Authors

    Roman Iakymchuk, KTH Royal Institute of Technology, Sweden , riakymch@kth.se
    David Defour, University of Perpignan, France, david.defour@univ-perp.fr
    Sylvain Collange, Inria Rennes, France , sylvain.collange@inria.fr
    Stef Graillat, University Pierre and Marie Curie (UPMC), France, stef.graillat@lip6.fr