learn AWS ParallelCluster. formerly CfnCluster (“cloud formation cluster”)
framework that deploys and maintains high performance computing clusters on Amazon Web Services (AWS). Developed by AWS, CfnCluster facilitates both quick start proof of concepts (POCs) and production deployments. CfnCluster supports many different types of clustered applications and can easily be extended to support different frameworks.
- distributed as a Python package and is installed using the Python pip package manager.
- supports Slurm and AWS Batch schedulers
- generates CloudFormation to create the cluster resources
- CDK is used to generate templates
- head node - responsible for submitting and scheduling jobs
- compute node(s) - runs jobs
- AMI for head and compute nodes (ec2 instances)
- AMI Name: aws-parallelcluster-3.1.4-amzn2-hvm-x86_64-202205121006 2022-05-12T10-09-45.467Z
- AMI Locaiton: amazon/aws-parallelcluster-3.1.4-amzn2-hvm-x86_64-202205121006 2022-05-12T10-09-45.467Z
- When you use the awsbatch scheduler, the AWS ParallelCluster CLI commands for AWS Batch are automatically installed in the AWS ParallelCluster head node
- CLI commands for AWS Batch - e.g. awsbsub, awsbqueues, etc.
install, configure cluster, create cluster, submit job to cluster
python3 -m pip install "aws-parallelcluster" --upgrade --user # verify install pcluster version # configure cluster. prompts for scheduler type, region, etc. # when done `hello-world.yaml` is created # and it creates the networking / vpc resources via a cfn stack (e.g. `parallelclusternetworking-pubpriv-20220522231401` stack) # <https://github.com/aws/aws-parallelcluster/tree/release-3.0/cli/tests/pcluster/example_configs> for example # cluster configuration files # <https://docs.aws.amazon.com/parallelcluster/latest/ug/cluster-configuration-file-v3.html> - configuration files spec pcluster configure --config hello-world.yaml # create / provision the cluster (the networking/vpc resources already exist from previous `configure` command) pcluster create-cluster --cluster-name hello-world --cluster-configuration hello-world.yaml # login to cluster head node pcluster ssh --cluster-name hello-world -i /path/to/keyfile.pem # run the command sinfo to verify that your compute nodes are set up and configured. sinfo # create job to run (hello-job.sh) cat << EOF > hello-job.sh #!/bin/bash sleep 30 echo "Hello World from $(hostname)" EOF # submit job # this will create and ec2 compute instance on the fly sbatch hello-job.sh # view job in job queue squeue # once no job in queue a `.out` file will be created with results (STDOUT) cat slurm-1.out # clean up # delete cluster compute nodes (this doesn't delete cluster head node) pcluster delete-cluster-instances --cluster-name hello-world # delete the cluster itself # this terminates the head node and delete the cfn stack used to create the cluster pcluster delete-cluster --cluster-name hello-world
- AWS ParallelCluster
- AWS services used by AWS ParallelCluster
- AWS Services used in CfnCluster
- Running your first job on AWS ParallelCluster
- Cluster configuration file
- aws-parallelcluster/cli/tests/pcluster/example_configs/ - example cluster configuration files
- AWS ParallelCluster CLI commands
- Slurm Workload Manager - Documentation