A solutions architect describes a design as using loose coupling. What does loose coupling mean in a cloud architecture?
Loose coupling is one of the five AWS Well-Architected Framework pillars' most-tested concepts. A loosely coupled architecture:
• Services communicate through APIs, message queues (SQS), or event buses (EventBridge) — not directly • A failure in Service A doesn't automatically fail Service B • Services can be scaled, deployed, or updated independently • Contrast with tight coupling: services call each other synchronously, share databases, or rely on shared in-process state
Loose coupling patterns on exams: • Web tier → SQS queue → processing tier (classic decoupling) • API Gateway → Lambda (each Lambda is independent) • SNS fan-out → multiple SQS queues (one publisher, many consumers) • EventBridge rules routing domain events to different consumers
Anti-patterns the exam tests against: • Tight coupling: Service A calls Service B's private database directly • Synchronous chains: A calls B which calls C which calls D — one failure fails all
2 / 5
A question asks which architecture is stateless. An architect says: "Our API servers are stateless." What does this mean — and why does it matter for cloud scalability?
Stateless servers are a foundational cloud architecture concept. A stateless server:
• Does not store user session data or application state in local memory or local disk • Every request is self-contained — any server in the fleet can handle any request • Session data lives in ElastiCache (Redis/Memcached) or DynamoDB • File uploads are stored in S3, not local disk
Why it matters for scalability: If your servers store session state locally, users are "stuck" to the same server (sticky sessions). You can't freely add or remove instances because the state moves with the instance.
With stateless servers + external state store: → Auto Scaling can add/remove instances freely → All instances are identical and interchangeable → No sticky sessions required at the load balancer
Exam signal: "stateless" → use external cache/database for state → enables horizontal scaling "Sticky sessions" → the app is currently stateful → architecture improvement question → refactor to stateless + ElastiCache
3 / 5
A company needs an architecture that continues to function even when components fail. The exam calls this property _____.
Fault tolerance means the system continues operating — possibly at reduced capacity — when one or more components fail, without any intervention or downtime.
Distinguish these related terms:
• Fault tolerant — the system keeps running through failures (zero downtime) • Highly available — the system recovers quickly from failures (minimal downtime, often through automatic failover) • Resilient — the system can withstand disruptions and recover (broader concept covering both) • Elastic — the system automatically scales resources to match demand
Fault tolerance patterns: • Multi-AZ with synchronous replication → database failover in seconds • Read replicas for database read traffic • Multi-region active-active for extreme fault tolerance • S3 Cross-Region Replication • Route 53 health checks with failover routing
On the exam: "Continue operating even if one component fails" → fault tolerance "Recover quickly from failures" → high availability "Automatically adjust capacity" → elasticity "Design for failure" → AWS principle: assume components will fail, architect accordingly
4 / 5
An AWS exam scenario says: "The application must achieve an RPO of 1 hour and RTO of 4 hours." What does this mean for the disaster recovery design?
RPO and RTO are the two fundamental disaster recovery metrics. They appear in every disaster recovery question on all major cloud certification exams.
RPO (Recovery Point Objective): The maximum acceptable amount of data loss measured in time. "RPO = 1 hour" → after a disaster, you can tolerate losing at most 1 hour of data. → Requires backups or replication at least every hour.
RTO (Recovery Time Objective): The maximum acceptable time the system can be down after a disaster. "RTO = 4 hours" → the system must be back online within 4 hours of the failure. → Determines your DR strategy: backup restore (cheaper, longer RTO) vs. warm standby vs. hot standby.
DR strategy tiers (cheapest → most expensive, longest → shortest RTO/RPO): 1. Backup and restore — hours RTO, highest RPO 2. Pilot light — core services always on, scale up on disaster 3. Warm standby — scaled-down copy always running 4. Multi-site active-active — near-zero RTO/RPO, highest cost
Lower RPO/RTO = more expensive. The exam may ask: "which DR strategy meets an RPO of 15 minutes at lowest cost?" → warm standby.
5 / 5
A senior architect says: "We follow the shared responsibility model." A junior engineer asks: "Who is responsible for patching the operating system on our EC2 instances?"
The AWS Shared Responsibility Model is tested on every AWS certification. The key principle:
AWS is responsible for: "Security OF the cloud" • Physical hardware, facilities, power • Network infrastructure • Hypervisor • Managed service software (e.g. the RDS database engine patches — AWS patches those)
Customer is responsible for: "Security IN the cloud" • EC2 OS patches • Application code • Data encryption (at rest and in transit) • IAM user and role management • Security Group and NACL configuration • Network traffic protection
The model shifts based on service type: • EC2 (IaaS) → customer manages OS and above • RDS (PaaS) → AWS manages OS and DB engine; customer manages data and access • Lambda (serverless) → AWS manages runtime; customer manages code and IAM • S3 (managed storage) → AWS manages infrastructure; customer manages data, access policies, encryption choice
Exam trick: questions about "who patches the OS" for managed services (RDS, Elastic Beanstalk) → AWS. For EC2 → customer.