A Framework for Dynamically Instrumenting GPU Compute Applications within GPU Ocelot
Naila Farooqui, Andrew Kerr, Gregory Diamos, Sudhakar Yalamanchili, and Karsten Schwan. “A Framework for Dynamically Instrumenting GPU Compute Applications within GPU Ocelot.” Fourth Workshop on General-Purpose Computation on Graphics Procesing Units. March 2011.
Abstract
In this paper we present the design and implementation of a dynamic instrumentation infrastructure for PTX programs that procedurally transforms kernels and manages related data structures. We show how performing instrumentation within the GPU Ocelot dynamic compiler infrastructure provides unique capabilities not available to other profiling and instrumentation toolchains for GPU computing. We demonstrate the utility of this instrumentation capability with three example scenarios – (1) performing workload characterization accelerated by a GPU, (2) providing load imbalance information for use by a resource allocator, and (3) providing compute utilization feedback to be used online by a simulated process scheduler that might be found in a hypervisor. Additionally, we measure both (1) the compilation overheads of performing dynamic compilation and (2) the increases in runtimes when executing instrumented kernels. On average, compilation overheads due to instrumentation consisted of $69\%$ of the time needed to parse a kernel module, in the case of the Parboil benchmark suite. Slowdowns for instrumenting each basic block ranged from 1.5x to 5.5x, with the largest slowdowns attributed to kernels with large numbers of short, compute-bound blocks.
Download
A Framework for Dynamically Instrumenting GPU Compute Applications within GPU Ocelot [PDF]
Citation
@inproceedings{Farooqui:2011:FDI:1964179.1964192,
author = {Farooqui, Naila and Kerr, Andrew and Diamos, Gregory and Yalamanchili, S. and Schwan, K.},
title = {A framework for dynamically instrumenting GPU compute applications within GPU Ocelot},
booktitle = {Proceedings of the Fourth Workshop on General Purpose Processing on Graphics Processing Units},
series = {GPGPU-4},
year = {2011},
isbn = {978-1-4503-0569-3},
location = {Newport Beach, California},
pages = {9:1–9:9},
articleno = {9},
numpages = {9},
url = {http://doi.acm.org/10.1145/1964179.1964192},
doi = {10.1145/1964179.1964192},
acmid = {1964192},
publisher = {ACM},
address = {New York, NY, USA},
keywords = {CUDA, GPGPU, GPU computing, Ocelot, OpenCL, PTX, Parboil, Rodinia, dynamic binary compilation, instrumentation},
}
author = {Farooqui, Naila and Kerr, Andrew and Diamos, Gregory and Yalamanchili, S. and Schwan, K.},
title = {A framework for dynamically instrumenting GPU compute applications within GPU Ocelot},
booktitle = {Proceedings of the Fourth Workshop on General Purpose Processing on Graphics Processing Units},
series = {GPGPU-4},
year = {2011},
isbn = {978-1-4503-0569-3},
location = {Newport Beach, California},
pages = {9:1–9:9},
articleno = {9},
numpages = {9},
url = {http://doi.acm.org/10.1145/1964179.1964192},
doi = {10.1145/1964179.1964192},
acmid = {1964192},
publisher = {ACM},
address = {New York, NY, USA},
keywords = {CUDA, GPGPU, GPU computing, Ocelot, OpenCL, PTX, Parboil, Rodinia, dynamic binary compilation, instrumentation},
}