Lynx: A Dynamic Instrumentation System for GPGPU Computing


  • Naila Farooqui
  • Andrew Kerr
  • Karsten Schwan
  • Sudhakar Yalamanchili



Lynx: A Dynamic Instrumentation System for GPGPU Computing

As parallel execution platforms continue to proliferate, there is a growing need for real-time introspection tools to provide insight into platform behavior for performance debugging, correctness checks, and to drive effective resource management schemes. To address this need, we present the Lynx dynamic instrumentation system. Lynx provides the capability to write instrumentation routines that are (1) selective, instrumenting only what is needed, (2) transparent, without changes to the applications’ source code, (3) customizable, and (4) efficient.

Lynx was originally implemented as a branch of GPU Ocelot, a framework that provides run-time code generation of CUDA programs for heterogeneous architectures. Lynx now exists as a stand-alone, PTX editing tool, encapsulating only the necessary Ocelot dependencies (namely, Ocelot’s PTX Parser, PTX IR and CFG/DFG Analyses components).

To instrument a CUDA application with Lynx, the user can specify instrumentation specifications as C code snippets and provide it to the framework. The Lynx engine JIT compiles the C code specification and translates it into PTX.

CUDA applications compiled by nvcc are converted into C++ programs, with PTX kernels embedded as string literals. When such a program links with our framework, our CUDA runtime parses these PTX kernels into an internal representation. The original PTX kernel is provided as input to the PTX-PTX transformation engine, together with the instrumentation PTX generated from the C code specification. The transformation engine applies a sequence of PTX kernel transformations to the original PTX kernel, resulting in the final instrumented PTX kernel, which is then executed on the CUDA device.




  • N. Farooqui, A. Kerr, G. Diamos, S. Yalamanchili, K. Schwan. “A Framework for Dynamically Instrumenting GPU Compute Applications within GPU Ocelot.” 4th Workshop on General Purpose Processing Using GPUs (GPGPU). March 2011. paper
  • N. Farooqui, A. Kerr, G. Eisenhauer, K. Schwan, S. Yalamanchili. “Lynx: A Dynamic Instrumentation System for Data-Parallel Applications on GPGPU Architectures.” IEEE International Symposium on Performance Analysis of Systems and Software (ISPASS). April 2012. paper
  • S. Li, N. Farooqui, S. Yalamanchili. “Software Reliability Enhancements for GPU Architectures.” Sixth Workshop on Programmability Issues for Heterogeneous Multicores (MULTIPROG). January 2012. paper

The papers are provided for personal use and are subject to copyright of the publishers.