Table of contents
Open Table of contents
Step 1: Download the Toolkit
Go to the website cuda-downloads, and select the relevant operating systems and architecture. In this tutorial we are only looking at local download. In my case it is Debian 12 and x86. The downloading site shows the link, which is something like,
wget https://developer.download.nvidia.com/compute/cuda/12.X.Y/local_installers/cuda-repo-debian12-12-X-local_12.X.Y-Z-1_amd64.deb
paste it in the terminal. For my case it is,
wget https://developer.download.nvidia.com/compute/cuda/12.8.1/local_installers/cuda-repo-debian12-12-8-local_12.8.1-570.124.06-1_amd64.deb
After which we will get a Debian installation file at the path.
Step 2: Install the Toolkit
Install the Debian file using command,
sudo dpkg -i cuda-repo-debian12-12-8-local_12.8.1-570.124.06-1_amd64.deb
After installation run following command,
sudo cp /var/cuda-repo-debian12-12-8-local/cuda-*-keyring.gpg /usr/share/keyrings/
Update the package index and complete the installation,
sudo apt-get update
sudo apt-get -y install cuda-toolkit-12-8
Step 3: Install CUDA Drivers
The final step is to install the CUDA Drivers. Nvidia provides two flavors of drivers; open kernel module flavor installable via,
sudo apt-get install -y nvidia-open
and proprietary kernel module flavor.
sudo apt-get install -y cuda-drivers
My preference is to install proprietary kernel module flavor.
Step 4: Verifying Installation
We can test the codes available in NVIDIA/cuda-samples repository to verify the installation. We have to clone the repository,
git clone https://github.com/NVIDIA/cuda-samples.git
and run following command to build the examples,
cmake -Bbuild
cmake --build build
which will build all the examples. Run any one of the example,
./build/Samples/0_Introduction/matrixMul/matrixMul
if it passes, in this case,
[Matrix Multiply Using CUDA] - Starting...
GPU Device 0: "Ada" with compute capability 8.9
MatrixA(320,320), MatrixB(640,320)
Computing result using CUDA Kernel...
done
Performance= 580.52 GFlop/s, Time= 0.226 msec, Size= 131072000 Ops, WorkgroupSize= 1024 threads/block
Checking computed result for correctness: Result = PASS
NOTE: The CUDA Samples are not meant for performance measurements. Results may vary when GPU Boost is enabled.
means CUDA has been installed successfully otherwise revisit the installation steps or official documentation.