code for article  pfeilbr/amazon-managed-workflows-for-apache-airflow-playground
learn Amazon Managed Workflows for Apache Airflow
Install Steps
- create S3 bucket with versioning enabled
 - create VPC - this is where RDS Postgres for airflow data lives, the airflow web UI runs in a container here
 - create aiflow environment
 
Notes
- you create named MWAA environments and they each have own configuration
 - airflow ui exposed publically or private within VPC
 - login to airflow ui with IAM user
 - can create web login token via AWS CLI
 - upload DAG .py files to 
dagpath in S3 - run DAGs via airflow cli, REST endpoint, boto3
 - invoke DAG via lambda
 - install additional deps by providing requirements.txt
 - auto scales up and down via Apache Celery Executor by adding / removing worker containers as needed
 - scale via number of workers, and environment class
 
Airflow
Plugins
- Hooks: Hooks are basically python modules that enables tasks to access external platform, like AWS, AZURE, GCP and many more.
 - Sensors: Sensors are python modules which are used to create watcher tasks(in the most basic sense), for example s3Sensor is used to create s3 file watcher task. A sensor stays in running state till a specific state appears.
 - Operators: Operators are typically execution engines. Operators are used to create task that execute some process, based on the type of Operator. For example: PythonOperator can be used create a task that will run a specific python method.
 
XCom
- share small bits of data between tasks
 - stored in airflow postgres db
 
Example Environment Attributes
{
    "Environment": {
        "AirflowConfigurationOptions": {},
        "AirflowVersion": "1.10.12",
        "Arn": "arn:aws:airflow:eu-west-1:xxxxxxxxxxxx:environment/airflow-blogpost-dublin",
        "CreatedAt": 1610632127.0,
        "DagS3Path": "dags",
        "EnvironmentClass": "mw1.medium",
        "ExecutionRoleArn": "arn:aws:iam:: xxxxxxxxxxxx:role/airflow-demo-mwaa-eks-iamrole",
        "LastUpdate": {
            "CreatedAt": 1611137820.0,
            "Status": "SUCCESS"
        },
        "LoggingConfiguration": {
            "DagProcessingLogs": {
                "CloudWatchLogGroupArn": "arn:aws:logs:: xxxxxxxxxxxx:log-group:airflow-ricsue-dublin-DAGProcessing",
                "Enabled": true,
                "LogLevel": "INFO"
            },
            "SchedulerLogs": {
                "CloudWatchLogGroupArn": "arn:aws:logs:: xxxxxxxxxxxx:log-group:airflow-ricsue-dublin-Scheduler",
                "Enabled": true,
                "LogLevel": "INFO"
            },
            "TaskLogs": {
                "CloudWatchLogGroupArn": "arn:aws:logs:: xxxxxxxxxxxx:log-group:airflow-ricsue-dublin-Task",
                "Enabled": true,
                "LogLevel": "INFO"
            },
            "WebserverLogs": {
                "CloudWatchLogGroupArn": "arn:aws:logs:: xxxxxxxxxxxx:log-group:airflow-ricsue-dublin-WebServer",
                "Enabled": true,
                "LogLevel": "INFO"
            },
            "WorkerLogs": {
                "CloudWatchLogGroupArn": "arn:aws:logs:: xxxxxxxxxxxx:log-group:airflow-ricsue-dublin-Worker",
                "Enabled": true,
                "LogLevel": "INFO"
            }
        },
        "MaxWorkers": 5,
        "Name": "ricsue-dublin",
        "NetworkConfiguration": {
            "SecurityGroupIds": [
                "sg-0c88ef4755c295zzz"
            ],
            "SubnetIds": [
                "subnet-0493dffd0282f4xxx",
                "subnet-08f416023356ffyyy"
            ]
        },
        "RequirementsS3Path": "requirements/requirements.txt",
        "ServiceRoleArn": "arn:aws:iam:: xxxxxxxxxxxx:role/aws-service-role/airflow.amazonaws.com/AWSServiceRoleForAmazonMWAA",
        "SourceBucketArn": "arn:aws:s3:::airflow-mybucket",
        "Status": "AVAILABLE",
        "Tags": {},
        "WebserverAccessMode": "PUBLIC_ONLY",
        "WebserverUrl": "aaaaaaaa-bbbb-cccc-dddd-eeeeeeeeeeee.c5.eu-west-1.airflow.amazonaws.com",
        "WeeklyMaintenanceWindowStart": "SUN:14:00"
    }
}
Resources
- Amazon Managed Workflows for Apache Airflow
 - Interacting with Amazon Managed Workflows for Apache Airflow via the command line
 - AWS CLI Command Reference | mwaa
 - Airflow — Custom Plugins
 - Airflow XCOM : The Ultimate Guide
 
Twitter • Reddit