Efficient Scientific Computing 2025

The goal of this section is to introduce you to GPU programming using CuPy.

set up

ssh to the machine with this command:

ssh -L XXXX:localhost:XXXX <username>@<machine-address>

where XXXX is a 4-digit number, unique for each user.

Then setup the Python enviromenti with:

source /usr/share/esc-env/bin/activate

Now you can start JupyterLab with the command:

jupyter-lab --port XXXX

Once JupyterLab is running, you will find in the output a URL like this:

...
    Or copy and paste one of these URLs:
        http://localhost:9004/lab?token=8928e7071...
        http://127.0.0.1:9004/lab?token=8928e7071...
...

Paste one of these URLs in your local browser, and you should see the JupyterLab page.

Exercises

Exercise 1: port NumPy code to CuPy

Take the code written with NumPy and modify it to use CuPy. Time both executions to compare CPU vs. GPU speedup.

Note: GPU operations are asynchronous. Use:

cp.cuda.Device().synchronize()

before stopping your timer to get accurate timings.

You can try the same with the Python code of your personal project!

Exercise 2: transfers and GPU allocation

Create two arrays of N random numbers on CPU
Copy them to the GPU
Sum them
Copy them back to the CPU.

Now avoid the copy to the GPU by creating the random arrays directly on the device with CuPy.

Exercise 3: write a kernel

Write a kernel (take one from the CUDA exercises or write your own) with CuPy using:

cp.RawKernel
cp.ElementwiseKernel (you can use the variable i for the the index within the loop and method _ind.size() for the total number of elements to apply the elementwise operation)

Exercise 4: reduction

Implement the reduction kernel:

using cp.RawKernel and the CUDA kernel you wrote during the CUDA part
using cp.ReductionKernel
using cp.sum

Tips: