(773) 795-2667 help@rcc.uchicago.edu

Job Submission

Jobs can be run on the compute nodes on Beagle3 via slurm, the same way it was done on Midway3. Jobs submitted should explicitly specify --account=pi-<group> in their job submission script or interactive invocation.

An example sbatch job submission script for submitting a single core job to the standard compute partition is given below:

#!/bin/bash
  
#SBATCH --job-name=mnist
#SBATCH --output=out.txt
##SBATCH --time=01:00:00
#SBATCH --time=00:05:00
#SBATCH --partition=beagle3
#SBATCH --ntasks=1
#SBATCH --mem=4G
#SBATCH --gres=gpu:1
#SBATCH --constraint=a100
#SBATCH --account=pi-<group>

module load python/anaconda-2021.05
module load cudnn/11.2
python mnist_convnet.py

Nodes on the Beagle3 platform:

Node type Num. of Nodes Node Specifications
A100 22 32 cores, 256GB memory, 960GB SSD
A40 22 32 cores, 256GB memory, 960GB SSD
Big memory 4 512GB memory, 960GB SSD, no GPUs

Users can specify which kind of node they want by using the --constraint flag for the SLURM scheduler:

  • --constraint=a40 jobs will only run on an A40 node
  • --constraint=a100 jobs will only run on an A100 node
  • --constraint=256g jobs will run on either an A40 or an A100 node, but not a big-memory node (“beagle3-bigmemX”)
  • --constraint=512g jobs will only run on a big-memory node

It may be noted that beagle3-00[01-22] are all A100 nodes and beagle3-[0022-44] are all A40. The command sinfo gives information about the existing status of nodes.
Eg.

sinfo -p beagle3 -O 'partition,available,nodes,features,statecompact,nodelist'
PARTITION           AVAIL               NODES               AVAIL_FEATURES      STATE               NODELIST            
beagle3             up                  1                   gold-6346,256g,a100 down*               beagle3-0010        
beagle3             up                  1                   gold-6346,256g,a40  mix                 beagle3-0023        
beagle3             up                  21                  gold-6346,256g,a100 idle                beagle3-[0001-0009,0
beagle3             up                  21                  gold-6346,256g,a40  idle                beagle3-[0024-0044] 
beagle3             up                  4                   gold-6346,512g      idle                beagle3-bigmem[1-4] 

QoS Policies

There are two quality-of-service (QoS) options available on the beagle3 partition. You can specify either one by using the --qos flag in your sbatch scripts or sinteractive commands.

  • --qos=beagle3: This QoS allows you to request up to 512 CPU-cores and 64 GPUs, and a maximum wall time of 48 hours. It is the default QoS for the beagle3 partition.
  • --qos=beagle3-long: This QoS allows you to request up to 128 CPU-cores and 16 GPUs, and a maximum wall time of 96 hours.