Skip to content
Qazalbash
Go back

Summation of Two Vectors in CUDA C

Summation of two vectors is an embarrassingly parallel problem, which means solution to the problem can easily be parallelized. If we have two vectors A,BRnA,B\in\mathbb{R}^{n}, where nNn\in\mathbb{N} and their summation is CC. An arbitrary component CiC_i for 1in1\leq i\leq n can be shown as Ci=Ai+BiC_i=A_i+B_i, which means it only depends on the corresponding component in vector AA and BB. The parallel algorithm would be to add every component of the vector by a unique GPU thread. A GPU kernel for this problem would be,

__global__ void ArrSumOnDevice(const float* A, const float* B, float* C, const int N) {
    const int idx = blockIdx.x * blockDim.x + threadIdx.x;
    if (idx < N) {
        C[idx] = A[idx] + B[idx];
    }
}

here, idx is the index mapped to the global memory and if they index is within the range to access a single component of a vector then it will take respective component from each vector, add them, and save their result to the respective final component. An equivalent CPU function would look like,

void ArrSumOnHost(const float* A, const float* B, float* C, const int N) {
    for (int idx = 0; idx < N; ++idx) {
        C[idx] = A[idx] + B[idx];
    }
}

Complete code is available Qazalbash/CUDAForge/code/1_ArrSum/ArrSum.cu.


Checkout Qazalbash/CUDAForge for more CUDA examples.


Share this post on:

Previous Post
Memory Management in CUDA C
Next Post
Error Handling in CUDA C