Timing a Kernel
It is helpful to know how long a kernel takes to execute during the performance tuning of a kernel. There are several ways to measure it. The simplest of all is to use either a CPU timer to measure kernel executions from the host side.
To build a CPU timer we can use gettimeofday
system call to get the system's wall-clock time, which returns the number of seconds since the epoch[^1]. We can create a small functions for it,
#include <sys/time.h>
double cpuSecond() {
struct timeval tp;
gettimeofday(&tp, NULL);
return ((double)tp.tv_sec + (double)tp.tv_usec * 1.e-6);
}
We can sandwich the lines of code of interest between the cpuSecond
calls to measure their timing. In case of a kernel we will call it like,
const double iStart = cpuSecond();
kernel_name<<<grid, block>>>(argument list);
cudaDeviceSynchronize(); // because a kernel call is asynchronous with respect to the host
const double iElaps = cpuSecond() - iStart;
printf("kernel: kernel_name took %5.2f seconds to complete...\n", iElaps);
See a detailed code example of CPU timing Qazalbash/CUDAForge/code/3_Timer/CpuTimer.cu
.
[^1]: Epoch: time since the computer is powered on.