Dynamic Thread Block Launch: A Lightweight Execution Mechanism to Support Irregular Applications on GPUs
Jin Wang, Norm Rubin, Albert Sidelnik and Sudhakar Yalamanchili. “Dynamic Thread Block Launch: A Lightweight Execution Mechanism to Support Irregular Applications on GPUs.” The 42nd International Symposium on Computer Architecture (ISCA). June 2015.
Abstract
GPUs have been proven effective for structured applications that map well to the rigid 1D-3D grid of threads in modern bulk synchronous parallel (BSP) programming languages. However, less success has been encountered in mapping data intensive irregular applications such as graph analytics, relational databases, and machine learning. Recently introduced nested device-side kernel launching functionality in the GPU is a step in the right direction, but still falls short of being able to effectively harness the GPUs performance potential.We propose a new mechanism called Dynamic Thread Block Launch (DTBL) to extend the current bulk synchronous parallel model underlying the current GPU execution model by supporting dynamic spawning of lightweight thread blocks. This mechanism supports the nested launching of thread blocks rather than kernels to execute dynamically occurring parallel work elements. This paper describes the execution model of DTBL, device-runtime support, and microarchitecture extensions to track and execute dynamically spawned thread blocks. Experiments with a set of irregular data intensive CUDA applications executing on a cycle-level simulator show that DTBL achieves average 1.21x speedup over the original flat implementation and average 1.40x over the implementation with device-side kernel launches using CUDA Dynamic Parallelism.
Download
Citation
@inproceedings{wang-dtbl-isca42,
author={Jin Wang and Norm Rubin and Albert Sidelnik and Sudhakar Yalamanchili},
booktitle={Proceeding of the 42nd Annual International Symposium on Computer Architecuture (ISCA-42)},
title={Dynamic Thread Block Launch: A Lightweight Execution Mechanism to Support Irregular Applications on GPUs},
year={2015},
month={June},
}
author={Jin Wang and Norm Rubin and Albert Sidelnik and Sudhakar Yalamanchili},
booktitle={Proceeding of the 42nd Annual International Symposium on Computer Architecuture (ISCA-42)},
title={Dynamic Thread Block Launch: A Lightweight Execution Mechanism to Support Irregular Applications on GPUs},
year={2015},
month={June},
}