Lynx: A Dynamic Instrumentation System for Data-Parallel Applications on GPGPU Architectures

Lynx: A Dynamic Instrumentation System for Data-Parallel Applications on GPGPU Architectures

Naila Farooqui, Andrew Kerr, Greg Eisenhauer, Karsten Schwan, Sudhakar Yalamanchili. “Lynx: A Dynamic Instrumentation System for Data-Parallel Applications on GPGPU Architectures.” ISPASS. April 2012.


As parallel execution platforms continue to proliferate, there is a growing need for real-time introspection tools to provide insight into platform behavior for performance debugging, correctness checks, and to drive effective resource management schemes. To address this need, we present the {\it Lynx} dynamic instrumentation system. Lynx provides the capability to write instrumentation routines that are (1) selective, instrumenting only what is needed, (2) transparent, without changes to the applications’ source code, (3) customizable, and (4) efficient. Lynx is embedded into the broader GPU Ocelot system, which provides run-time code generation of CUDA programs for heterogeneous architectures. This paper describes (1) the Lynx framework and implementation, (2) its language constructs geared to the Single Instruction Multiple Data (SIMD) model of data-parallel programming used in current general-purpose GPU (GPGPU) based systems, and (3) useful performance metrics described via Lynx’s instrumentation language that provide insights into the design of effective instrumentation routines for GPGPU systems. The paper concludes with a comparative analysis of Lynx with existing GPU profiling tools and a quantitative assessment of Lynx’s instrumentation performance, providing insights into optimization opportunities for running instrumented GPU kernels.



author={Farooqui, N. and Kerr, A. and Eisenhauer, G. and Schwan, K. and Yalamanchili, S.},
booktitle={Performance Analysis of Systems and Software (ISPASS), 2012 IEEE International Symposium on}, title={Lynx: A dynamic instrumentation system for data-parallel applications on GPGPU architectures},
pages={58 -67},
keywords={CUDA programs;GPGPU architectures;Lynx;SIMD model;correctness checks;data-parallel programming;dynamic instrumentation system;general-purpose GPU;parallel execution platforms;performance debugging;real-time introspection tools;resource management;run-time code generation;single instruction multiple data model;graphics processing units;parallel architectures;parallel programming;program debugging;program verification;resource allocation;},