Hello World from CUDA

CUDA is built upon ANSI C, which is successive standards for the C programming language published by the American National Standards Institute (ANSI) and ISO/IEC JTC 1/SC 22/WG 14 of the International Organization for Standardization (ISO) and the International Electrotechnical Commission (IEC).

In C programming language we write a simple Hello World like,

#include <stdio.h>

int main(void) {
 printf("Hello World from CPU!\n");
 return 0;
}

save this code in HelloWorld.cu file and compile with,

nvcc HelloWorld.cu -o HelloWorld

nvcc is the compiler for CUDA which comes with the CUDA Toolkit. If you run the executable file HelloWorld, it will print,

Hello World from CPU!

Next, we will write a GPU kernel helloWorldfromGPU to print Hello World,

__global__ void helloFromGPU() {
 printf("Hello World from GPU (device)\n");
}

The qualifier __global__ tells the compiler that the function will be called from the CPU and executes on the GPU. We can launch the Kernel function with the following call,

helloFromGPU<<<1,32>>();

The triple angle bracket marks a call from the host thread to the code on the device side. A kernel is executed by an array of threads and all threads run the same code. The parameters within the triple angle brackets are the execution configuration, which specifies how many threads will execute the kernel. In this example, you will run 32 GPU threads. Putting everything together we get,

#include <cuda_runtime.h>
#include <stdio.h>

__global__ void helloFromGPU() {
 printf("Hello World from GPU (device)\n");
}

int main(void) {
    printf("Hello from CPU (host) before kernel execution\n");
    helloFromGPU<<<1, 32>>>();
    cudaDeviceSynchronize();
    printf("Hello from CPU (host) after kernel execution\n");

    return 0;
}

The function cudaDeviceSynchronize blocks all the preceding requested tasks. This program is saved in HelloWorld.cu and after compilation and running, we get,

Hello from CPU (host) before kernel execution
Hello World from GPU (device)
Hello World from GPU (device)
Hello World from GPU (device)
Hello World from GPU (device)
Hello World from GPU (device)
Hello World from GPU (device)
Hello World from GPU (device)
Hello World from GPU (device)
Hello World from GPU (device)
Hello World from GPU (device)
Hello World from GPU (device)
Hello World from GPU (device)
Hello World from GPU (device)
Hello World from GPU (device)
Hello World from GPU (device)
Hello World from GPU (device)
Hello World from GPU (device)
Hello World from GPU (device)
Hello World from GPU (device)
Hello World from GPU (device)
Hello World from GPU (device)
Hello World from GPU (device)
Hello World from GPU (device)
Hello World from GPU (device)
Hello World from GPU (device)
Hello World from GPU (device)
Hello World from GPU (device)
Hello World from GPU (device)
Hello World from GPU (device)
Hello World from GPU (device)
Hello World from GPU (device)
Hello World from GPU (device)
Hello from CPU (host) after kernel execution

Congratulations, we have written our first GPU kernel.