Nowadays, GPU processors are widely used for general-purpose parallel computation applications. In the GPU programming, thread and block configuration is one of the most important decisions to be made, which increases parallelism and hides instruction latency. However, in many cases, it is often difficult to have sufficient parallelism to hide all the latencies, where the high latencies are often caused by the global memory accesses. In order to reduce the number of those accesses, the shared memory is instead used which is much faster than the global memory being located on a chip. The performance of the proposed thread configuration is evaluated on the GPU 960 processor. The experimental result shows that the best configuration improves the performance by 7.3 times compared to the worst configuration in the experiment. The experiences are also discussed for the shared memory performance when compared to that of the global memory.
E. S. Larsen, D. McAllister, “Fast matrix multiplies using graphics hardware,” in Proceedings of Supercomputing 2001, Denver, CO, 2001.
S. Mittal, and J. S. Vetter. “A survey of cpu-gpu heterogeneous computing techniques,” ACM Computing Survey, 47(4), pp.1–35, July 2015.
J. D. Owens, D. Luebke, N. Govindaraju, M. Harris, J. Kruger, A. E. Lefohn, and T. J. Purcell, “A survey of general-purpose computation on graphics hardware,” in Proceedings of European Association for Computer Graphics, pp. 21–51, 2005.
J. D. Owens, D. Luebke, N. Govindaraju, M. Harris, J. Kruger, A. E. Lefohn, and T. J. Purcell, “A Survey of General-Purpose Computation on Graphics Hardware,” in Computer Graphics Forum, Volume 26, number 1, pp. 80-113, 2007
NVIDIA, CUDA C Programming Guide 8.0, 2017.
NVIDIA, CUDA C Best Practices Guide 8.0, 2017.
A. Munshi, The OpenCL Specification, Khronos OpenCL Working Group, version: 1.0, Document Revision:48, 2009.
D. Kirk, and W. W. Hwu, University of Illinois, “Programming Massively Parallel Processors,” Urbana-Champaign, 2010.
NVIDIA, https://www.nvidia.com/en-us/geforce/ products/10series/titan-x-pascal/
The names and email addresses entered in this journal site will be used exclusively for the stated purposes of this journal and will not be made available for any other purpose or to any other party.
Submission of the manuscript represents that the manuscript has not been published previously and is not considered for publication elsewhere.