Efficient Scientific Computing 2024

GPU Portability

Mapping CUDA Concepts to Other Models

Kernel Launch Variations

CUDA

Launching kernels with execution configuration: kernel<<<gridDim, blockDim>>>(args);

HIP

Similar syntax with slight differences: hipLaunchKernelGGL(kernel, gridDim, blockDim, sharedMem, stream, args);

SYCL

Using command groups and lambda expressions:

queue.submit([&](handler &h) {
    h.parallel_for(nd_range<1>(globalRange, localRange), [=](nd_item<1> item) {
    // Kernel code
    });
});

OpenMP

Offloading code blocks with pragmas:


#pragma omp target teams distribute parallel for
for (int i = 0; i < N; i++) {
  // Loop body
}

Examples

Have a look in the hands-on/portable_stencil directory for examples of CUDA, HIP, SYCL, and OpenMP code.