A Guide For Achieving High Performance With Very Small Matrices On GPU: A case Study of Batched LU and Cholesky Factorizations

Published in IEEE Transactions on Parallel and Distributed Systems, 2017

Recommended citation: A. Haidar, A. Abdelfattah, M. Zounon, S. Tomov and J. Dongarra, "A Guide For Achieving High Performance With Very Small Matrices On GPU: A case Study of Batched LU and Cholesky Factorizations," in IEEE Transactions on Parallel and Distributed Systems, vol. PP, no. 99, pp. 1-1. doi: 10.1109/TPDS.2017.2783929. http://ieeexplore.ieee.org/stamp/stamp.jsp?tp=&arnumber=8214236&isnumber=4359390