Timing a Kernel

It is helpful to know how long a kernel takes to execute during the performance tuning of a kernel. There are several ways to measure it. The simplest of all is to use either a CPU timer to measure kernel executions from the host side.

To build a CPU timer we can use gettimeofday system call to get the system's wall-clock time, which returns the number of seconds since the epoch[^1]. We can create a small functions for it,

#include <sys/time.h>

double cpuSecond() {
    struct timeval tp;
    gettimeofday(&tp, NULL);
    return ((double)tp.tv_sec + (double)tp.tv_usec * 1.e-6);
}

We can sandwich the lines of code of interest between the cpuSecond calls to measure their timing. In case of a kernel we will call it like,

const double iStart = cpuSecond();
kernel_name<<<grid, block>>>(argument list);
cudaDeviceSynchronize(); // because a kernel call is asynchronous with respect to the host
const double iElaps = cpuSecond() - iStart;
printf("kernel: kernel_name took %5.2f seconds to complete...\n", iElaps);

See a detailed code example of CPU timing Qazalbash/CUDAForge/code/3_Timer/CpuTimer.cu.

[^1]: Epoch: time since the computer is powered on.