- Definition(s)
- Key Questions
- Organization Considerations
- Quality Attributes
- Patterns
- Topics / Concepts / Terms
- Resources
- let business value guide all architecture decisions
- evolutionary architectures are needed to support rapid rate of business and technology change
Definition(s)
- “Architecture is about the important stuff. Whatever that is”
- the decisions you wish you could get right early in a project
- the shared understanding that the expert developers have of the system design
- how the components are assembled and organized. This will be done in a way that meets the quality attributes.
- significant design decisions that shape a system, where significant is measured by cost of change - Grady Booch
Key Questions
- who are the users
- what devices and form factors will be used
- what is the context of their usage
- scale and growth
- who are the main actors in the system (domain objects - e.g. orders, products, etc.)
- data classifications (pii)
- data types and sizes (relation records, documents, media files, etc.)
- what is the time frame for delivery
- is there an existing product / SaaS / open-source / etc. that provides the solution or a portion / components of it
- Capacity estimation & Constraints
- Functional Requirements
- Non Functional Requirements - Latency, Consistency, Availability, High Throughput, etc.
- Out of scope
- organization and teams structure
see https://medium.com/partha-pratim-sanyal/system-design-doordash-a-prepared-food-delivery-service-bf44093388e2 for good reference
Organization Considerations
- engineering (application & platform)
- operations (application & platform)
Quality Attributes
- reliability - ability to continue to operate under predefined conditions
- availability - ratio of the available system time to the total working time
- scalability - ability of the system to handle load increases without decreasing performance
- efficiency
- performance
- security
- cost
- interoperability
- correctness
- maintainability
- readability
- extensibility
- testability
Patterns
event-sourcing
Capture all changes to an application state as a sequence of events.
Core Design Decisions
- Domain Entities and Events
- popular method is via Event Storming
- Event Content
- each event stores delta state
- each event stores full state
- idempotent is easy to solve for duplicate events
- Total Ordering (ordered stream of events - ledger)
- ensure all event are processed in order. this is needed for causal relationships.
- e.g. ordering matters for two messages related to the same entity
Resources
- Scaling Event Sourcing for Netflix Downloads, Episode 1
- Scaling Event Sourcing for Netflix Downloads, Episode 2
- InfoQ | Scaling Event Sourcing for Netflix Downloads | Video + Presentation - shows in detail how they implemented event sourcing backed by cassandra
- matrinfowler.com | Event Sourcing
- Pattern: Event sourcing
- EventBridge Storming — How to build state-of-the-art Event-Driven Serverless Architectures - approach to defining the Events, Boundaries and Entities in your business domain
- Decomposing the Monolith with Event Storming
Hexagonal
Allow an application to equally be driven by users, programs, automated test or batch scripts, and to be developed and tested in isolation from its eventual run-time devices and databases.
Resources
Topics / Concepts / Terms
Database
- Consistency: Every read receives the most recent write or an error
- Availability: Every request receives a (non-error) response, without the guarantee that it contains the most recent write
- Partition tolerance: The system continues to operate despite an arbitrary number of messages being dropped (or delayed) by the network between nodes
Shuffle Sharding
limits / isolates tenants in a multi-tenant system so they don’t negatively impact other tenants. method of assigning tenant to resources.
Resources
- Workload isolation using shuffle-sharding
- AWS Well-Architected Labs | Fault isolation with shuffle sharding
Constant Work
overprovision resources to the point where it would operate correctly even if an availability zone were to be unavailable
If AZ becomes unavailable, no new resources need to be provisioned, just a quick re-routing. you are essentially always operating the infrastructure for failure mode (active-active)
Resources
Canary
A canary release is a technique to reduce the risk from deploying a new version of software into production. A new version of software, referred to as the canary, is deployed to a small subset of users alongside the stable running version. Traffic is split between these two versions such that a portion of incoming requests are diverted to the canary. This approach can quickly uncover any problems with the new version without impacting the majority of users.
Resources
Resources
Books (oreilly.com)
- The Software Architect Elevator
- Fundamentals of Software Architecture
- Clean Architecture: A Craftsman’s Guide to Software Structure and Design, First Edition
- Software Architecture Patterns
- Building Evolutionary Architectures
- Clean Architecture: A Craftsman’s Guide to Software Structure and Design, First Edition
- Domain-Driven Design: Tackling Complexity in the Heart of Software
- Microservices Patterns
- Patterns of Enterprise Application Architecture
- Refactoring: Improving the Design of Existing Code
- Design Patterns: Elements of Reusable Object-Oriented Software
- Designing Distributed Systems
- Designing Distributed Control Systems: A Pattern Language Approach (Wiley Software Patterns Series)
- Distributed Tracing in Practice
- Making Sense of Stream Processing
- I Heart Logs
- Streaming Systems
Organization Architecture
Websites
- The System Design Primer - great real life architecture and design examples
- martinfowler.com
- AWS Architecture Center
- AWS Architecture Blog
- Amazon Builders’ Library
- Azure Architecture Center
- medium | software architecture
- C4 model for visualizing software architecture
Twitter • Reddit