Cloud Native Software Architecture

A personal cheatsheet/reference for cloud native software architecture.


how the components are assembled and organized. This will be done in a way that meets the quality attributes.

Key Questions

  • who are the users?
  • what devices and form factors will be used?
  • what is the context of their usage?
  • scale and growth?
  • who are the main actors in the system (domain objects - e.g. orders, products, etc.)?
  • data classifications (PII)?
  • data types and sizes (relation records, documents, media files, etc.)?
  • what is the time frame for delivery?
  • is there an existing product / SaaS / open-source / etc. that provides the solution or a portion / components of it
  • capacity estimation & constraints?
  • functional requirements?
  • Non Functional Requirements - Latency, Consistency, Availability, High Throughput, etc.
  • what is explicitly out of scope
  • organization and teams structure

see System Design: DoorDash — a prepared food delivery service for good reference

Organization Considerations

  • engineering (application & platform)
  • operations (application & platform)

Quality Attributes (*ities and friends)

  • reliability - ability to continue to operate under predefined conditions
  • availability - ratio of the available system time to the total working time
  • scalability - ability of the system to handle load increases without decreasing performance
  • efficiency
  • performance
  • security
  • cost
  • interoperability
  • correctness
  • maintainability
  • readability
  • extensibility
  • testability


modern cloud native architecture patterns as of July 2020


Capture all changes to an application state as a sequence of events.

Core Design Decisions

  • Domain Entities and Events
  • Event Content
    • each event stores delta state
    • each event stores full state
      • idempotent is easy to solve for duplicate events
  • Total Ordering (ordered stream of events - ledger)
    • ensure all event are processed in order. this is needed for causal relationships.
    • e.g. ordering matters for two messages related to the same entity



the ports and adapters architecture. decouples core domain logic from specific storage, database, protocol, etc.


Topics / Concepts / Terms


Shuffle Sharding

limits / isolates tenants in a multi-tenant system so they don’t negatively impact other tenants. method of assigning tenant to resources.


Constant Work

  • overprovision resources to the point where it would operate correctly even if an availability zone were to be unavailable
  • if AZ becomes unavailable, no new resources need to be provisioned, just a quick re-routing. you are essentially always operating the infrastructure for failure mode (active-active)



A canary release is a technique to reduce the risk from deploying a new version of software into production. A new version of software, referred to as the canary, is deployed to a small subset of users alongside the stable running version. Traffic is split between these two versions such that a portion of incoming requests are diverted to the canary. This approach can quickly uncover any problems with the new version without impacting the majority of users.



Books (