Exploring The Latency and Bandwidth Tolerance of CUDA Applications
Sudnya Padalikar and Gregory Diamos. “Exploring The Latency and Bandwidth Tolerance of CUDA Applications.” NFinTes Tech Report. December 2009.
Abstract
CUDA applications represent a new body of parallel programs. Although several paradigms exist for programming distributed systems and many-core processors, many users struggle to achieve a program that is scalable across systems with different hardware characteristics. This paper explores the scalability of CUDA applications on systems with varying interconnect latencies, hiding a hardware detail from the programmer and making parallel programming more accessible to non-experts. We use a combination of the Ocelot PTX emulator and a discrete event simulator to evaluate the UIUC Parboil benchmarks on three distinct GPU configurations. We find that these applications are sensitive to neither interconnect latency nor bandwidth, and that integrated GPU-CPU systems are not likely to perform any better than discrete GPUs or GPU clusters.
Download
Exploring The Latency and Bandwidth Tolerance of CUDA Applications [PDF]
Citation
@techreport{exploring-the-latency-and-bandwidth-tolerance-of-cuda-applications,
title = {Exploring The Latency and Bandwidth Tolerance of CUDA Applications},
company = {Georgia Institute of Technology},
author = {Sudnya Padalikar and Gregory Diamos},
year = {2009},
month = {December},
}
title = {Exploring The Latency and Bandwidth Tolerance of CUDA Applications},
company = {Georgia Institute of Technology},
author = {Sudnya Padalikar and Gregory Diamos},
year = {2009},
month = {December},
}