GPU Portability
Mapping CUDA Concepts to Other Models
- Thread Organization
- CUDA: Grids, blocks, and threads.
- OpenMP: Teams and threads.
- SYCL: ND-ranges, work groups, and work items.
- HIP: Similar to CUDA, with minor syntax changes.
- Kernel Execution
- CUDA: global functions launched with «<»> syntax.
- OpenMP: #pragma omp target directives to offload code.
- SYCL: Using queue.submit() with lambda functions.
- HIP: global functions launched with hipLaunchKernelGGL.
- Memory Management
- Explicit Control: All models require managing data movement between host and device.
- Unified Memory Options: Some models offer unified memory spaces to simplify development.
- Memory Allocation
- CUDA: cudaMalloc(&ptr, size);
- HIP: hipMalloc(&ptr, size);
- SYCL: malloc_device(size, queue);
- OpenMP: Memory mapped with map clauses.
Kernel Launch Variations
Launching kernels with execution configuration: kernel<<<gridDim, blockDim>>>(args);
Similar syntax with slight differences: hipLaunchKernelGGL(kernel, gridDim, blockDim, sharedMem, stream, args);
Using command groups and lambda expressions:
queue.submit([&](handler &h) {
h.parallel_for(nd_range<1>(globalRange, localRange), [=](nd_item<1> item) {
// Kernel code
Offloading code blocks with pragmas:
#pragma omp target teams distribute parallel for
for (int i = 0; i < N; i++) {
// Loop body
Have a look in the hands-on/portable_stencil
directory for examples of CUDA, HIP, SYCL, and OpenMP code.