code for article pfeilbr/aws-parallel-cluster-playground
learn AWS ParallelCluster. formerly CfnCluster (“cloud formation cluster”)
framework that deploys and maintains high performance computing clusters on Amazon Web Services (AWS). Developed by AWS, CfnCluster facilitates both quick start proof of concepts (POCs) and production deployments. CfnCluster supports many different types of clustered applications and can easily be extended to support different frameworks.
Notes
- distributed as a Python package and is installed using the Python pip package manager.
- supports Slurm and AWS Batch schedulers
- generates CloudFormation to create the cluster resources
- CDK is used to generate templates
- head node - responsible for submitting and scheduling jobs
- compute node(s) - runs jobs
- AMI for head and compute nodes (ec2 instances)
- AMI Name: aws-parallelcluster-3.1.4-amzn2-hvm-x86_64-202205121006 2022-05-12T10-09-45.467Z
- AMI Locaiton: amazon/aws-parallelcluster-3.1.4-amzn2-hvm-x86_64-202205121006 2022-05-12T10-09-45.467Z
- When you use the awsbatch scheduler, the AWS ParallelCluster CLI commands for AWS Batch are automatically installed in the AWS ParallelCluster head node
- CLI commands for AWS Batch - e.g. awsbsub, awsbqueues, etc.
Demo
install, configure cluster, create cluster, submit job to cluster
python3 -m pip install "aws-parallelcluster" --upgrade --user
# verify install
pcluster version
# configure cluster. prompts for scheduler type, region, etc.
# when done `hello-world.yaml` is created
# and it creates the networking / vpc resources via a cfn stack (e.g. `parallelclusternetworking-pubpriv-20220522231401` stack)
# <https://github.com/aws/aws-parallelcluster/tree/release-3.0/cli/tests/pcluster/example_configs> for example
# cluster configuration files
# <https://docs.aws.amazon.com/parallelcluster/latest/ug/cluster-configuration-file-v3.html> - configuration files spec
pcluster configure --config hello-world.yaml
# create / provision the cluster (the networking/vpc resources already exist from previous `configure` command)
pcluster create-cluster --cluster-name hello-world --cluster-configuration hello-world.yaml
# login to cluster head node
pcluster ssh --cluster-name hello-world -i /path/to/keyfile.pem
# run the command sinfo to verify that your compute nodes are set up and configured.
sinfo
# create job to run (hello-job.sh)
cat << EOF > hello-job.sh
#!/bin/bash
sleep 30
echo "Hello World from $(hostname)"
EOF
# submit job
# this will create and ec2 compute instance on the fly
sbatch hello-job.sh
# view job in job queue
squeue
# once no job in queue a `.out` file will be created with results (STDOUT)
cat slurm-1.out
# clean up
# delete cluster compute nodes (this doesn't delete cluster head node)
pcluster delete-cluster-instances --cluster-name hello-world
# delete the cluster itself
# this terminates the head node and delete the cfn stack used to create the cluster
pcluster delete-cluster --cluster-name hello-world
Screenshots
EC2 instances
Resources
- AWS ParallelCluster
- aws/aws-parallelcluster
- cfncluster.readthedocs.io
- AWS services used by AWS ParallelCluster
- AWS Services used in CfnCluster
- Running your first job on AWS ParallelCluster
- Cluster configuration file
- aws-parallelcluster/cli/tests/pcluster/example_configs/ - example cluster configuration files
- AWS ParallelCluster CLI commands
- Slurm Workload Manager - Documentation
Twitter • Reddit