Reproducible and Accurate Algorithms for Numerical Linear Algebra Abstract. On modern multi-core, many-core, and heterogeneous architectures, floating-point computations, especially reductions, may become non-deterministic and, therefore, non-reproducible mainly due to the non-associativity of floating-point operations and the dynamic scheduling. We address the problem of reproducibility in the context of fundamental linear algebra operations -- like the ones included in the Basic Linear Algebra Subprograms (BLAS) library -- and propose algorithms that yields both reproducible and accurate results. We extend this approach to the higher level linear algebra algorithms, e.g. the LU factorization, that are built on top of these BLAS kernels. We present these reproducible and accurate algorithms for the BLAS routines and the LU factorization as well as their implementations in parallel environments such as Intel server CPUs, Intel Xeon Phi, and both NVIDIA and AMD GPUs. We show that the performance of the proposed implementations is comparable to the standard ones. Authors Roman Iakymchuk, KTH Royal Institute of Technology, Sweden , riakymch@kth.se David Defour, University of Perpignan, France, david.defour@univ-perp.fr Sylvain Collange, Inria Rennes, France , sylvain.collange@inria.fr Stef Graillat, University Pierre and Marie Curie (UPMC), France, stef.graillat@lip6.fr