Boole HPC Platform – User Guide¶

Introduction ¶

Welcome to the Boole High Performance Computing (HPC) Platform, operated by CloudCIX. The Boole HPC is designed to provide researchers, engineers, and students with powerful compute resources for a wide range of workloads — from interactive data exploration to large-scale parallel computation.

The platform combines modern web-based access with traditional HPC tools:

Open OnDemand – a user-friendly web portal available at: https://hpc.cloudcix.com This is the primary interface for accessing compute resources, uploading data, and launching jobs.
Slurm – the underlying workload manager responsible for scheduling and running jobs efficiently across compute nodes.
Interactive Applications – available directly from the Open OnDemand dashboard, including:
- Boole Linux Desktop – a full remote desktop environment for GUI-based workflows.
- Jupyter Notebooks – for data analysis, Python, and scientific computing.
- Boole Shell Access – opens a terminal directly in your browser for command-line access to the cluster.

You can also connect directly to the HPC system using SSH or transfer files via rsync from HEAnet IP addresses only.

Getting Started ¶

We recommend watching this short introductory video, which demonstrates how to access and use the platform:

📺 Boole HPC Basics – YouTube:

Watch the Boole HPC Overview Video on YouTube

Slurm Basics ¶

The Slurm workload manager is used to run jobs on the Boole HPC Platform. Users interact with Slurm through commands such as srun (for interactive jobs) and sbatch (for batch scripts). Jobs are submitted to specific partitions (queues), depending on the hardware and workload type.

Important

Default Resource Allocation

Please note that default resource allocation for a job is 1 CPU and 1GB of memory per node. If your job requires more resources, please specify them in your job submission script.

Available Partitions ¶

cloud – runs on our virtualized cloud infrastructure. Suitable for smaller or flexible workloads.
physical – runs on dedicated bare-metal compute nodes. Use this for high-performance CPU jobs.
physical-gpu – runs on bare-metal nodes equipped with GPUs (e.g. NVIDIA A100). Use this for GPU-accelerated workloads.

Interactive Jobs with `srun`¶

The srun command launches an interactive session on a compute node. This is useful for debugging, testing code, or short experimental runs.

Example: request 2 CPU cores and 4 GB of memory for 1 hour on the cloud partition:

srun --partition=cloud --cpus-per-task=2 --mem=4G --time=01:00:00 --pty bash

Once the session starts, you will be inside a compute node shell where you can run your program interactively.

Example CPU Job Script ¶

#!/bin/bash
#SBATCH --job-name=cpu_test
#SBATCH --output=cpu_test_%j.out
#SBATCH --partition=physical
#SBATCH --cpus-per-task=4
#SBATCH --mem=8G
#SBATCH --time=01:00:00

# Load the default compiler runtime
module load gcc-runtime/13.2.0

echo "Running on host: $(hostname)"
echo "Job started at: $(date)"

# Example program: Python hello world
python3 -c "print('Hello from Slurm on $(hostname)!')"

echo "Job finished at: $(date)"

Submit it:

sbatch cpu_test.sh

Check the output:

cat cpu_test_<jobid>.out

Example GPU Job Script ¶

For GPU workloads, request GPUs with --gres=gpu:<N>. Example with 1 GPU on physical-gpu:

Interactive session:

srun --partition=physical-gpu --gres=gpu:1 --cpus-per-task=4 --mem=16G --time=01:00:00 --pty bash

Batch job script:

#!/bin/bash
#SBATCH --job-name=gpu_test
#SBATCH --output=gpu_test_%j.out
#SBATCH --partition=physical-gpu
#SBATCH --gres=gpu:1
#SBATCH --cpus-per-task=4
#SBATCH --mem=16G
#SBATCH --time=01:00:00

# Load CUDA
module load cuda/12.9.0

echo "Running on host: $(hostname)"
nvidia-smi

# Example: run a CUDA program or container
# ./my_gpu_script

Submit it:

sbatch my_gpu_script.sh

Check the output:

cat gpu_test_<jobid>.out

Monitoring Jobs ¶

List your jobs:

squeue -u $USER

Cancel a job:

scancel <jobid>

Show job details:

scontrol show job <jobid>

Email Notifications ¶

Slurm can send you an email when your job starts, ends, fails, or is cancelled. To enable this, add the following options to your srun command or sbatch script:

--mail-type=TYPE – event(s) that trigger an email.
--mail-user=EMAIL – the email address to send notifications to.

Valid TYPE values include:

BEGIN – job starts running
END – job finishes successfully
FAIL – job fails
CANCEL – job is cancelled
ALL – shorthand for all events above

Example: interactive session with email notification:

srun --partition=cloud --cpus-per-task=2 --mem=4G --time=01:00:00 \
     --mail-type=ALL --mail-user=you@example.com --pty bash

Example: batch script with email notification:

#!/bin/bash
#SBATCH --job-name=notify_test
#SBATCH --output=notify_test_%j.out
#SBATCH --partition=physical
#SBATCH --cpus-per-task=2
#SBATCH --mem=4G
#SBATCH --time=00:30:00
#SBATCH --mail-type=END,FAIL
#SBATCH --mail-user=you@example.com

echo "Job running on $(hostname)"
sleep 60
echo "Job finished."

This will send an email to you@example.com when the job ends or fails.

Loading Software with Modules ¶

Our cluster uses Lmod (via Spack) for managing software. Before running your jobs, you may need to load specific compilers, libraries, or applications.

To list available modules:

module avail

To search for a module by keyword:

module spider gcc

To load a module:

module load gcc-runtime/13.2.0

To see what modules you have loaded:

module list

Conda / Miniforge ¶

For users who want to manage Python environments and packages independently of the cluster-wide software modules, we recommend using Miniforge or Conda in your home directory. This allows you to create isolated environments and install Python packages without affecting other users.

Installing Miniforge ¶

Download the latest Miniforge installer:

cd $HOME
wget https://github.com/conda-forge/miniforge/releases/latest/download/Miniforge3-Linux-x86_64.sh

Run the installer:

bash Miniforge3-Linux-x86_64.sh

Follow the prompts:
- Accept the license.
- Install into your home directory (default: $HOME/miniforge3).
- Allow the installer to initialize Conda by modifying your shell startup file.
Activate Conda:

source $HOME/miniforge3/bin/activate

You can also add the initialization to your shell automatically:

conda init

Creating and Managing Environments ¶

Create a new environment:

conda create --name my_env python=3.11

Activate the environment:

conda activate my_env

Deactivate the environment:

conda deactivate

List all environments:

conda env list

Installing Packages ¶

Within an activated environment, you can install Python packages independently:

conda install numpy scipy matplotlib

Or use pip inside the environment:

pip install pandas seaborn

Update packages:

conda update numpy

Remove packages:

conda remove matplotlib

Delete an environment:

conda env remove --name my_env

Best Practices ¶

Always activate your environment before running Python programs.
Keep separate environments for different projects to avoid package conflicts.
Avoid installing packages directly into the base Conda environment — use named environments instead.

Accessing Object Storage from the Boole HPC Platform ¶

s3cmd is available to all users in the cluster for accessing S3-compatible object storage services.

Configure s3cmd ¶

s3cmd --configure

When prompted, enter the following configuration details:

Access Key: <your access key>
Secret Key: <your secret access key>
Default Region: <your default region>
S3 Endpoint: <your S3 endpoint>
DNS-style: no
Encryption password: [Press Enter]
Path to GPG program: [Press Enter]
Use HTTPS protocol: [Press Enter]
HTTP Proxy server name: [Press Enter]
Save settings? [y/N] y

Example: CloudCIX Object Storage Configuration ¶

Configuring s3cmd to use the CloudCIX Object Storage:

s3cmd --configure

Access Key: <your access key>
Secret Key: <your secret access key>
Default Region: boole-zonegroup
S3 Endpoint: s3-boole.cloudcix.com
DNS-style: no
Encryption password: [Press Enter]
Path to GPG program: [Press Enter]
Use HTTPS protocol: [Press Enter]
HTTP Proxy server name: [Press Enter]
Save settings? [y/N] y

Basic Usage Examples ¶

# Create a bucket
s3cmd mb s3://test
Bucket 's3://test/' created

# Upload a file
s3cmd put file.txt s3://test
upload: 'file.txt' -> 's3://test/file.txt'  [1 of 1]
 0 of 0     0% in    0s     0.00 B/s  done

# Download a file
s3cmd get s3://test/file.txt
download: 's3://test/file.txt' -> './file.txt'  [1 of 1]
 0 of 0     0% in    0s     0.00 B/s  done

Running Docker Images with Apptainer on HPC ¶

This guide shows how to pull and run a Docker image using Apptainer in an HPC environment managed by Slurm.

Apptainer is the container platform available on our HPC system. It allows you to package applications, dependencies, and environments into portable container images that can run across all partitions (cloud, physical, and physical-gpu).

Step 1: Request an Interactive Slurm Session ¶

Start an interactive session using srun:

srun --partition=cloud --cpus-per-task=2 --mem=4G --time=01:00:00 --pty bash

--partition=cloud → the partition to use
--cpus-per-task=2 → number of CPU cores
--mem=4G → memory allocation
--time=01:00:00 → run time limit (hh:mm:ss)
--pty bash → interactive shell

Step 2: Pull the Docker Image ¶

Pull a Docker image and convert it to a SIF file:

apptainer pull docker://hello-world

Output:

hello-world_latest.sif

This creates a file hello-world_latest.sif in your current directory.

Step 3: Run the Container ¶

Run the container with:

apptainer run hello-world_latest.sif

Expected output:

Hello from Docker!
This message shows that your installation appears to be working correctly.

Step 4: Optional: Inspect or Enter the Container ¶

Inspect metadata:

apptainer inspect hello-world_latest.sif

Run a command inside the container:

apptainer exec hello-world_latest.sif ls /

Interactive shell inside the container:

apptainer shell hello-world_latest.sif

Notes ¶

apptainer pull → downloads the image once and converts it to a reusable .sif file.
apptainer run → executes the container’s default program.
apptainer exec → runs a custom command inside the container.
apptainer shell → opens an interactive shell inside the container.

Quick Reference ¶

Common Slurm Commands ¶

Command	Description
`squeue -u $USER`	Show your running/pending jobs
`sbatch script.sh`	Submit a batch job script
`scancel <jobid>`	Cancel a specific job
`sinfo`	Show partition and node status
`scontrol show job <jobid>`	Show detailed job information
`sacct -j <jobid>`	Show job accounting information

Common Resource Requests ¶

Resource Type	Slurm Option	Example
CPU cores	`--cpus-per-task=N`	`--cpus-per-task=4`
Memory	`--mem=XG`	`--mem=16G`
GPUs	`--gres=gpu:N`	`--gres=gpu:2`
Time limit	`--time=HH:MM:SS`	`--time=02:30:00`
Partition	`--partition=NAME`	`--partition=physical-gpu`

Troubleshooting ¶

Common Issues and Solutions ¶

Job stuck in pending state

Check partition availability with sinfo
Reduce resource requirements (CPUs, memory, time)
Consider using a different partition

Out of memory errors

Increase --mem parameter
Check actual memory usage with sacct -j <jobid> --format=JobID,MaxRSS

Module not found

Use module spider <software> to search
Check if module name includes version number
Try module avail to see all available modules

Container permission errors

Ensure SIF file has correct permissions
Try rebuilding the container image
Check if the container requires specific bind mounts

Getting Help ¶

System Status: Check https://hpc.cloudcix.com for announcements
Documentation: This guide and Open OnDemand help pages
Support: Contact CloudCIX support team - support@cloudcix.com

Additional Resources ¶

Slurm Official Documentation: https://slurm.schedmd.com/documentation.html
Apptainer Documentation: https://apptainer.org/docs/
Conda User Guide: https://docs.conda.io/projects/conda/en/latest/user-guide/

Table Of Contents

Contents

Previous topic

Next topic