Job Submission
Jobs can be run on the compute nodes on Beagle3 via slurm, the same way it was done on Midway3. Jobs submitted should explicitly specify --account=pi-<group>
in their job submission script or interactive invocation.
An example sbatch job submission script for submitting a single core job to the standard compute partition is given below:
#!/bin/bash
#SBATCH --job-name=mnist
#SBATCH --output=out.txt
##SBATCH --time=01:00:00
#SBATCH --time=00:05:00
#SBATCH --partition=beagle3
#SBATCH --ntasks=1
#SBATCH --mem=4G
#SBATCH --gres=gpu:1
#SBATCH --constraint=a100
#SBATCH --account=pi-<group>
module load python/anaconda-2021.05
module load cudnn/11.2
python mnist_convnet.py
Nodes on the Beagle3 platform:
Node type | Num. of Nodes | Node Specifications |
---|---|---|
A100 | 22 | 32 cores, 256GB memory, 960GB SSD |
A40 | 22 | 32 cores, 256GB memory, 960GB SSD |
Big memory | 4 | 512GB memory, 960GB SSD, no GPUs |
Users can specify which kind of node they want by using the --constraint
flag for the SLURM scheduler:
--constraint=a40
jobs will only run on an A40 node--constraint=a100
jobs will only run on an A100 node--constraint=256g
jobs will run on either an A40 or an A100 node, but not a big-memory node (“beagle3-bigmemX”)--constraint=512g
jobs will only run on a big-memory node
It may be noted that beagle3-00[01-22]
are all A100 nodes and beagle3-[0022-44]
are all A40. The command sinfo
gives information about the existing status of nodes.
Eg.
sinfo -p beagle3 -O 'partition,available,nodes,features,statecompact,nodelist'
PARTITION AVAIL NODES AVAIL_FEATURES STATE NODELIST
beagle3 up 1 gold-6346,256g,a100 down* beagle3-0010
beagle3 up 1 gold-6346,256g,a40 mix beagle3-0023
beagle3 up 21 gold-6346,256g,a100 idle beagle3-[0001-0009,0
beagle3 up 21 gold-6346,256g,a40 idle beagle3-[0024-0044]
beagle3 up 4 gold-6346,512g idle beagle3-bigmem[1-4]
QoS Policies
There are two quality-of-service (QoS) options available on the beagle3
partition. You can specify either one by using the --qos
flag in your sbatch scripts or sinteractive commands.
--qos=beagle3
: This QoS allows you to request up to 512 CPU-cores and 64 GPUs, and a maximum wall time of 48 hours. It is the default QoS for thebeagle3
partition.--qos=beagle3-long
: This QoS allows you to request up to 128 CPU-cores and 16 GPUs, and a maximum wall time of 96 hours.