AWS Glue

learn and experiment with aws glue

Running “Python Shell” Job

Local Running/Testing of Script

python3 -m venv venv
source venv/bin/activate
pip install -r requirements.txt
python scripts/

Running/Submitting Python Shell Job

cp .env.sample .env
# modify .env for your environment

# following submits the job (scripts/
node src/job-runner.js run-python-shell-script scripts/

Steps to Run scripts/ in SageMaker notebook

see scripts/

  1. upload data.csv to S3
  2. create glue crawler for data.csv which results in a table in glue database being created

    you can verify by previewing the data in athena

  3. create aws glue Dev Endpoint

    no need to specify ssh key

  4. create SageMaker notebook

    SageMaker notebook works just like Zepplin notebook, but less setup steps.

  5. open SageMaker notebook and past in code from scripts/