Optimizing Data Warehousing Applications for GPUs using Kernel Fusion/Fission
Haicheng Wu , Gregory Diamos, Ashwin Lele, Jin Wang, Srihari Cadambi, Sudhakar Yalamanchili, and Srimat Chakradhar. “Optimizing Data Warehousing Applications for GPUs using Kernel Fusion/Fission.” Workshop on Multicore and GPU Programming Models, Languages and Compilers. May 2012.
Abstract
Inspired in part by loop fusion/fission optimizations in the scientific computing community, we propose kernel fusion and kernel fission. Kernel fusion fuses the code bodies of two GPU kernels to i) eliminate redundant operations across dependent kernels, ii) reduce data movement between GPU registers and GPU memory, iii) reduce data movement between GPU memory and CPU memory, and iv) improve spatial and temporal locality of memory references. Kernel fission partitions a kernel into segments such that segment computations and data transfers between the GPU and host CPU can be overlapped. Fusion and fission can also be applied concurrently to a set of kernels. We empirically evaluate the benefits of fusion/fission on relational algebra operators drawn from the TPC-H benchmark suite. All kernels are implemented in CUDA and the experiments are performed with NVIDIA Fermi GPUs. In general, we observed data throughput improvements ranging from 13.1% to 41.4% for the SELECT operator and queries Q1 and Q21 in the TPC-H benchmark suite. We present key insights, lessons learned, and opportunities for further improvements.
Download
Citation
author = {Haicheng Wu and Gregory Diamos and Ashwin Lele and Jin
Wang and Srihari Cadambi and Sudhakar Yalamanchili and Srimat Chakradhar},
title = “Optimizing Data Warehousing Applications for GPUs Using
Kernel Fusion/Fission”,
booktitle = {Multicore and GPU Programming Models, Languages and
Compilers Workshop},
month = May,
year = 2012
}