Serverless Architecture: Key Service Considerations

A serverless architecture is “typically” composed of many services. The following covers the key considerations and configuration options for the most common AWS services leveraged for serverless architectures.

Relevant Patterns

common cloud native patterns to consider in the context of serverless architectures of scale

  • event sourcing
  • circuit breaker - trip circuit to prevent downstream systems overload
  • load shedding - prevent backlog buildup
  • handle poison messages - prevent kinesis and dynamodb streams from progressing
  • prevent distributed transactions. e.g. lambda send job to SQS and stores status in dynamodb. break it up. lambda put job status in dynamo -> dynamo stream -> lambda send job to SQS


  • synchronous vs asynchronous vs poll based (poll based is sync) - impacts automatic retries, stuck messages due to poison message, etc.

  • if lambda is strictly a glue passthrough for API Gateway to call a backend AWS service, look to use API Gateway Service Proxies to remove lambda. simpler/cheaper/etc.

  • memory

  • DLQ

  • lambda destinations (only for async invokes)

  • reserved concurrency - concurrency allocated for a specific function. e.g. i always want fn X to be able to run 10 lambda invokes concurrently

  • provisioned concurrency - pre-warmed lambda instances / no cold starts. good for latency sensitive needs

    • can optionally use auto scaling to adjust on based on metrics and/or schedule.
    • will spill over to on-demand scaling (lambda default)
    • Provisioned Concurrency comes out of your regional concurrency limit
  • concurrent executions (throttles) - 1000 per account

  • timeout - 15min

    • set code timeouts based on remaining invocation time provided in context
  • burst concurrency - 500 - 3000

  • burst - 500 new instances / min

  • poll based options (kinesis, dynamodb, SQS)

    • on-failure destination (SNS or SQS)
    • retry attempts
    • max age of record - use to implement load shedding (prioritize newer messages)
    • split batch on error
    • concurrent batches per shard


  • fan out to address scale
  • KMS to encrypt payloads


  • batch size - batch fails as unit
  • visibility timeout - set to 6x lambda timeout
  • message retention period
  • delivery delay - max 15min
  • types - standard vs FIFO
    • standard - at least once delivery. need to ensure idempotent
  • alarm on queue depth
  • KMS


  • partition key - choose wisely as order is guaranteed per shard and pk determines the shard the message lands on
  • poison messages (retry until success - can cause backlog)
  • KMS to encrypt payloads
  • enhanced fan-out via AWS::Kinesis::StreamConsumer. each consumer gets 2 MiB per second for every shard you subscribe to. can subscribe a max of 5 consumers per stream.


  • put events - 2400 requests per second per region
  • invocation quota - 4500 requests per second per region (invocation is an event matching a rule and being sent on to the rule’s targets)


  • global tables - for resilient active-active architectures
  • throttles
  • streams - 24hr data retention. poison messages (retry until success - can cause backlog)
  • partition key - distribute data among nodes to minimize hot partitions
  • TTL - can the data be removed automatically

Step Functions

API Gateway

  • REST API vs HTTP API (cheaper)
  • caching - fixed cost based on time / no pay per use
  • throttles
  • timeout - 29s
  • auth - cognito, JWT, IAM (aws sigv4), custom lambda auth
  • OpenAPI specs for payload validation
  • service proxies - no need for lambda glue in middle
  • custom domains
  • websockets


  • origin access identity to force traffic through CloudFront and removes direct access to S3 website domain URL
  • signed URLs or cookies
  • lambda@edge - headers only requests, rewrite URLs, server-side rendering (SSR), auth, etc.
  • cache invalidations
  • non GET HTTP methods support. must explicitly turn on support for PUT, POST, PATCH, etc.
  • WAF in front


Global Accelerator

uses the AWS global network to optimize the path from your users to your applications, improving the performance of your traffic by as much as 60%


  • can put in front of API Gateway or CloudFront
  • API Gateway provides overlapping functionality with WAF. Need to determine the appropriate service to use.