Scaling up (vertical scaling where you upgrade the memory/CPU) vs scaling out (horizontal scaling where you add additional instances of same size/type)
Design distributed stateless components that can be disposed or added based on demand
You can create an Amazon CloudWatch alarm that monitors an Amazon EC2 instance and automatically recovers the instance if it becomes impaired due to an underlying hardware failure or a problem that requires AWS involvement to repair. Terminated instances cannot be recovered. A recovered instance is identical to the original instance, including the instance ID, private IP addresses, Elastic IP addresses, and all instance metadata.
Infrastructure as code
Cloud Formation scripts can be used to automate creation/provision your AWS resources in orderly and predictable fashion and repeat the formation as many times as needed
Use Auto Scaling to scale out and back
Use cloud watch alarms/events to send SNS notifications when a particular metric goes beyond a specified threshold level. SNS can trigger a lambda or en-queue SQS message or POST to API endpoint
Lambda Schedules Events: They can be scheduled and run a lambda function at a specified time at regular intervals
Loose Coupling
Ensures a failure in one component does not cascade it to other component
Service Discovery
Instead of hard coding ip address of a loosely coupled service, we should use DNS/Rout 53 zones/ELB end points
Asynchronous Integration
Suitable for interaction between two components of a system where immediate response is not needed. Only an acknowledgement that the request was received will suffice. Example SQS or Kinesis
Loosely coupled components make the system resilient and enables graceful failure
Services NOT Servers
Use Lambda/S3/DynamoDb as opposed to EC2
Serverless architecture can scale out easily
Use Cognito as identity store as opposed to custom solutions that live on EC2 or SQL database
RDS can be used to scale horizontally thru read replicas as opposed to vertical scaling by upgrading instance type with higher memory/CPU
RDS multi AZ deployment feature can be used to automatically replicate your db in a different AZ and fail-over in real time when disaster strikes (DR Disaster Recovery)
Anti Patterns: If you application can maintain data integrity and there is no need for major JOINs or normalization, use DynamoDB NoSQL database which is inherently scalable horizontally for both reads and writes
Remove single point of failure and use redundant systems
Active or Standby redundancy
Failure Detection
Alarms/Health Checks
Cost Reduction
Right Sizing: Find the minimum configuration that is suitable. Use magnetic as opposed to SSD use small as opposed to large EC2 etc
Use spot instances
Use auto scaling to scale back (right sizing)
Security
Use ACLs/Security Groups
Use IAM roles as opposed to access key id/secret access key
Use application firewalls
Use Cloud Watch to enable real time logging/monitoring/auditing resources