The full NVIDIA driver documentation is here.
https://docs.nvidia.com/cuda/cuda-installation-guide-linux/
A simplified procedure is given here.
Attach GPU & Verify it is connected…
administrator@ubuntu:~$ lspci | grep NVIDIA
00:07.0 3D controller: NVIDIA Corporation GA100 [A100 SXM4 80GB] (rev a1)
Install GPU Driver & CUDA
sudo apt install ubuntu-drivers-common
ubuntu-drivers devices (show drivers available including recommended)
sudo apt install nvidia-driver-525-server (replace driver with recommended if different)
sudo apt install nvidia-cuda-toolkit
sudo reboot
Note
Packages like PyTorch and TensorFlow may be unstable at the latest versions of CUDA. Please check the release notes to find which Driver version is compatible with the CUDA version that your software requires: https://docs.nvidia.com/cuda/cuda-toolkit-release-notes/index.html#id4.
Verify that the NVIDIA Command Line Tool (nvidia-smi) is installed
administrator@ubuntu:~$ nvidia-smi
+-----------------------------------------------------------------------------+
| NVIDIA-SMI 525.105.17 Driver Version: 525.105.17 CUDA Version: 12.0 |
|-------------------------------+----------------------+----------------------+
| GPU Name Persistence-M| Bus-Id Disp.A | Volatile Uncorr. ECC |
| Fan Temp Perf Pwr:Usage/Cap| Memory-Usage | GPU-Util Compute M. |
| | | MIG M. |
|===============================+======================+======================|
| 0 NVIDIA A100-SXM... Off | 00000000:00:07.0 Off | 0 |
| N/A 31C P0 63W / 500W | 0MiB / 81920MiB | 20% Default |
| | | Disabled |
+-------------------------------+----------------------+----------------------+
+-----------------------------------------------------------------------------+
| Processes: |
| GPU GI CI PID Type Process name GPU Memory |
| ID ID Usage |
|=============================================================================|
| No running processes found |
+-----------------------------------------------------------------------------+
administrator@ubuntu:~$
sudo apt update && sudo apt upgrade -y
sudo apt install python3-pip
pip3 install virtualenv
virtualenv my_ai_project
source my_ai_project/bin/activate
pip3 install tensorflow
deactivate
If you have a Linux Desktop GUI installed, you can benchmark the performance of your GPU.
You can use GLX-Gears:
sudo apt install mesa-utils
glxgears
This will open up a window with an OpenGL rendering of a simple arrangement of three rotating gears. The frame rate is measured and printed out on the terminal every five seconds.
GL mark is a much richer benchmarking tool. Contrary to glxgears, glmark offers a rich set of tests that concern different aspects of your graphics unit performance (buffering, building, lighting, texturing etc), allowing for a much more comprehensive and meaningful test. Each test is conducted for 10 seconds and the frame rate is counted individually. In the end, users get a performance score based on all previous tests.:
sudo apt install glmark2
glmark2
4 x A100 SXM4 in a HGX Platform (Redstond). NVLink (Each GPU is connected to each other GPU.)
8 x H100 SXM5 in a HGX Platform. NVSwitch (Each GPU is connected to a switch fabric.)