Using GPUs in CloudCIX

1) Build a GPU Project

https://youtu.be/sXjtAGuKZOg

2) Install GPU Drivers and CUDA

The full NVIDIA driver documentation is here.

https://docs.nvidia.com/cuda/cuda-installation-guide-linux/

A simplified procedure is given here.

Attach GPU & Verify it is connected…

administrator@ubuntu:~$ lspci | grep NVIDIA
00:07.0 3D controller: NVIDIA Corporation GA100 [A100 SXM4 80GB] (rev a1)

Install GPU Driver & CUDA

sudo apt install ubuntu-drivers-common
ubuntu-drivers devices   (show drivers available including recommended)
sudo apt install nvidia-driver-525-server (replace driver with recommended if different)
sudo apt install nvidia-cuda-toolkit
sudo reboot

Note

Packages like PyTorch and TensorFlow may be unstable at the latest versions of CUDA. Please check the release notes to find which Driver version is compatible with the CUDA version that your software requires: https://docs.nvidia.com/cuda/cuda-toolkit-release-notes/index.html#id4.

Verify that the NVIDIA Command Line Tool (nvidia-smi) is installed

administrator@ubuntu:~$ nvidia-smi
+-----------------------------------------------------------------------------+
| NVIDIA-SMI 525.105.17   Driver Version: 525.105.17   CUDA Version: 12.0     |
|-------------------------------+----------------------+----------------------+
| GPU  Name        Persistence-M| Bus-Id        Disp.A | Volatile Uncorr. ECC |
| Fan  Temp  Perf  Pwr:Usage/Cap|         Memory-Usage | GPU-Util  Compute M. |
|                               |                      |               MIG M. |
|===============================+======================+======================|
|   0  NVIDIA A100-SXM...  Off  | 00000000:00:07.0 Off |                    0 |
| N/A   31C    P0    63W / 500W |      0MiB / 81920MiB |     20%      Default |
|                               |                      |             Disabled |
+-------------------------------+----------------------+----------------------+
+-----------------------------------------------------------------------------+
| Processes:                                                                  |
|  GPU   GI   CI        PID   Type   Process name                  GPU Memory |
|        ID   ID                                                   Usage      |
|=============================================================================|
|  No running processes found                                                 |
+-----------------------------------------------------------------------------+
administrator@ubuntu:~$

3) Install TensorFlow in a Python Virtual Environment

sudo apt update && sudo apt upgrade -y
sudo apt install python3-pip
pip3 install virtualenv
virtualenv my_ai_project
source my_ai_project/bin/activate
pip3 install tensorflow
deactivate

4) GPU Test

If you have a Linux Desktop GUI installed, you can benchmark the performance of your GPU.

How to Benchmark your GPU on Linux

You can use GLX-Gears:

sudo apt install mesa-utils
glxgears

This will open up a window with an OpenGL rendering of a simple arrangement of three rotating gears. The frame rate is measured and printed out on the terminal every five seconds.

GL mark is a much richer benchmarking tool. Contrary to glxgears, glmark offers a rich set of tests that concern different aspects of your graphics unit performance (buffering, building, lighting, texturing etc), allowing for a much more comprehensive and meaningful test. Each test is conducted for 10 seconds and the frame rate is counted individually. In the end, users get a performance score based on all previous tests.:

sudo apt install glmark2
glmark2

CloudCIX Region cork01 GPU types

  • 4 x A100 SXM4 in a HGX Platform (Redstond). NVLink (Each GPU is connected to each other GPU.)

  • 8 x H100 SXM5 in a HGX Platform. NVSwitch (Each GPU is connected to a switch fabric.)