DynamoDB NoSQL key value and document database

  1. AWS DynamoDB
    1. A NoSQL database providing single digit millisecond latency
    2. Supports document and key value storage
    3. Fully managed by AWS, highly redundant and available and scalable
    4. Runs across 3 different geographically separate locations so its highly redundant
  2. Table contains Items (rows). Items have attributes (key – value)
  3. Primary Key
    1. Two types of primary key
      1. Partition Key (Hash Key)  will help determine the physical location of data.
      2. Composite key:  Partition Key (Hash Key) & Sort Key (Range key – e.g date)
    2. If two data points have same partition key (same location) it should have a different sort key, and they will be stored together on single location.
  4. Secondary Indexes: Secondary indexes allow you to perform queries on attributes that are not part of the table’s primary key. Note that global secondary index read and write capacity settings are separate from those of the table, and will incur additional costs.
    1. Local Secondary Index – Same Partition Key + Different Sort Key ( can only be created while creating the table, cannot be added/removed or modified later)
    2.  Global Secondary Index – Different Partition Key + Different Sort Key ( can be created during the table creation or can be added later or removed / modified later)
  5. Amazon DynamoDB synchronously replicates data across three facilities in an AWS Region, giving you high availability and data durability.
  6.  Supports
    1. Eventually Consistent Reads
      1. It may take 1 second before the reads return what has been written, but it will be eventually (after 1 sec) consistent
      2. Best price performance
    2. Strongly consistent reads
      1. Read returns all writes that received a “successful write” response before the read is sent
      2. If you need all your reads to be consistent with all writes (even that happened less than a second before) you will choose this model
  7. Pricing is based reads/writes/storage only. Not by CPU usage or data transfer
    1. Read throughput  0.000xxx $ per hour for every 40 units
    2. Write throughput  0.000xxx $ per hour for every 10 units
    3. Writes are 5X expensive than reads
    4. Storage Cost 0.25$ per Gb/month
    5. at the time of creating table choose
      1. Provisioned Read/Write capacity units
      2. Reserved Read/Write capacity units with 1 or 3 year contracts help you reduce costs
  8. Push button scaling
    1. By simply changing RCU/WCU inside the console on a table’s capacity tab
    2. Not possible in RDS since scaling up involves some downtime in RDS
  9. DynamoDB queries
    1. Query operation finds item in a table using only primary key attribute values, must provide partition attribute name and the value to search for
    2. you can optionally provide a sort key attribute name and value to refine search result
    3. By default Query returns all the data attributes for those items with specified primary keys. You can further use ProjectionExpression parameter to only return a selected attributes.
    4. Query results are always sorted by the sort key (ascending for both numbers and string by default). To reverse the sort order set the ScanIndexForward parameter to false
    5. By Default Queries are going to be Eventually consistent but can be changed to StronglyConsistent.
  10. DynamoDB scans
    1. Scan operation examines every item.
    2. by default Scan returns all the data attributes but we could use ProjectionExpression parameter to only return a selected attributes.
    3. Query operation is more efficient than scan operation. Scan operations are less efficient than other operations in DynamoDB.
    4. A Scan operation always scans the entire table or secondary index, then filters out values to provide the desired result, essentially adding the extra step of removing data from the result set.
    5. Because a Scan operation reads an entire page (by default, 1 MB), you can reduce the impact of the scan operation by setting a smaller page size.
  11. DynamoDB Streams
    1. captures a time-ordered sequence of item-level modifications in any DynamoDB table,
    2. stores this information in a log for up to 24 hours. Applications can access this log and view the data items as they appeared before and after they were modified, in near real time.
    3. DynamoDB stream is an ordered flow of information about changes to items in an Amazon DynamoDB table.
    4. When you enable a stream on a table, DynamoDB captures information about every modification to data items in the table.
    5. use to capture any kinda modification to the dynamo db table, Lambda can capture events and push notifications thru SNS
    6. Table can be exported to csv (either select all or few items )
  12. Encryption at rest can be enabled only when you are creating a new DynamoDB table. After encryption at rest is enabled, it can’t be disabled. Uses AWS KMS for key.
  13. Point-in-time recovery provides continuous backups of DynamoDB table data. Once enabled, DynamoDB maintains continuous backups of your table for the last 35 days.
  14. TTL is a mechanism to set a specific timestamp for expiring items from your table. The timestamp should be expressed as an attribute on the items in the table. The attribute should be a Number data type containing time in epoch format. Once the timestamp expires, the corresponding item is deleted from the table in the background.

Relational Database Service (RDS)

  1. RDS is AWS managed relational db service.
    1. Database will run in EC2 instance within your VPC/Subnet but you can’t SSH or have root access
    2. You can have public access thru DB instance connection string, port, id and password.
  2. DB subnet groups:

    1. While creating DB, you need to pick a DB subnet group for the DB instance to reside in.

    2. A DB subnet group is a collection of subnets (typically private) that you create in a VPC and that you then designate for your DB instances.

    3. Each DB subnet group should have subnets in at least two Availability Zones in a given AWS Region.

    4. From the DB subnet group, Amazon RDS chooses a subnet and an IP address within that subnet to associate with your DB instance.
    5. The DB instance uses the Availability Zone that contains the subnet.

    6. If the primary DB instance of a Multi-AZ deployment fails, Amazon RDS can promote the corresponding standby and subsequently create a new standby using an IP address of the subnet in one of the other Availability Zones.

    7. The subnets in a DB subnet group are either public or private. They can’t be a mix of both public and private subnets. The subnets are public or private, depending on the configuration that you set for their network access control lists (network ACLs) and routing tables.

  3. Size limits
    1. 16 TB is limit for MySQL as of 2018 April. So if you have an existing 10 TB MySql that you want to migrate to AWS RDS, and expect that it will double in one year, go for AWS Aurora (common exam question) as its compatible with MySQL and supports up to 64 TB db size.
  4. Cacheing
    1. Amazon Aurora offer an integrated cache that is managed within the database engine and has built-in write-through capabilities. When the underlying data changes on the database table, the database updates its cache automatically
    2. You can use AWS Elasticache with AWS RDS but you need to customize your code (Eg. EC2 web app calling RDS to get resultset and calling Elasticache API to cache the resultset and later retrieve the result set from elasticache as opposed to RDS)
  5. Two types of backups are supported
    1. Automatic Backups
      1. Enabled by default
      2. Backup data is stored in S3 (you get free S3 storage allowance equal to the size of your RDS db)
      3. Daily snapshots + transaction logs
      4. AB’s let you recover at any point of time within a retention period
      5. Retention period can be set as 1 day to 35 days
      6. Backups are taken at a specified window of time
      7. During the backup time the latency may be higher than normal so choose a backup window time which is least in demand for your services
      8. Deleted automatically when you delete RDS inst.
    2. Manual Snapshots
      1. Snapshots are manual
      2. Initiated by the admins
      3. Unlike automatic backups, the snapshots are available even after you deleted your RDS instances
      4. When you restore a snapshot it will create a new RDS instance with a new endpoint.
  6. Encryption:
    1. You can encrypt your Amazon RDS instances and snapshots at rest by enabling the encryption option for your Amazon RDS DB instance.
    2. Amazon RDS encrypted instances use the industry standard AES-256 encryption algorithm to encrypt your data on the server’s EBS that hosts your Amazon RDS instance.
    3. Once your data is encrypted, Amazon RDS handles authentication of access and decryption of your data transparently with a minimal impact on performance. You don’t need to modify your database client applications to use encryption.
    4. Encryption at rest is done thru AWS KMS
    5. supported for MySQL, Oracle, MariaDb, SQL Server  and PostGreSQL
    6. Existing un-encrypted db can’t be encrypted. You need create a new encrypted RDS instance and migrate to it
    7. All storage, snapshots, automatic backups, read replicas will be encrypted as well if source is encrypted
  7. Multi AZ deployment for high availability  (Resilient pillar)
    1. Synchronously replicated in another AZ
    2. Automatic fail over (dns endpoint remains same. No need to point to the secondary db)
    3. Disaster Recovery (DR) purpose only (Resilient pillar). Not for performance. Use read replicas instead for scalability and performance enhancements.
  8. Read Replicas  (Performance pillar)
    1. They are different from multi Availability Zone deployment
    2. Asynchronous replication (NOT Synchronous)
    3. Replica lag will be there (time taken to sync once master is modified)
    4. Creates exact copy of master db
    5. If multi AZ is enabled, then RR uses secondary db to create the initial snapshot thus avoiding 1 min slowdown time which would otherwise happen
    6. Read replicas have separate end points which can be directly used by EC2 instances running applications (connection strings)
    7. Read replicas can have read replicas (Double replica lag)
    8. MySQL, Maria Db and PostgreSQL support 5 Read Replicas. Aurora supports upto 15 read replicas.
    9. RRs can be a different region/AZ for MySQL
    10. RRs can be promoted to be master dbs. Once promoted, the RR link will be lost and the promoted instance will act as an independent master db
    11. RRs are Read only. Can’t write to RRs
    12. Scaling up (Better CPU/Memory/Instance Type) is a manual process unlike in DynamoDB where scaling can take place with push button
  9. Recovery Point Objective (RPO) and Recovery Time Objective (RTO)
    1. Commonly used in disaster recovery strategy, RPO is the amount of data loss measured in time (Eg: Last 10 min of data is lost)
    2. RTO is the amount of time needed to restore a business process to its service level. (Eg. Took 2.5 hours to bring production up and running after disaster)

AWS Kinesis

  1. AWS Kinesis is a cloud based web service that enables managing streaming data such as stock quotes, gaming data, social network (facebook) data, geospacial data such as Lyft/Uber/Swiggy or data collected by iOT devices
  2. Kinesis Streams
    1. Use Amazon Kinesis Data Streams to collect and process large streams of data records in real time.
    2. You’ll create data-processing applications, known as Amazon Kinesis Data Streams applications. A typical Amazon Kinesis Data Streams application reads data from a Kinesis data stream as data records. These applications can use the Kinesis Client Library, and they can run on Amazon EC2 instances.
    3. The processed records can be sent to dashboards, used to generate alerts, dynamically change pricing and advertising strategies, or send data to a variety of other AWS services.
    4. Producers send data to Kinesis  Streams
    5. Data is stored from default 24 hours to max 7 days
    6. Stream data is stored in shards (capacity of stream = sum of the capacities of all the shards inside the stream)
    7. These streams are consumed by consumers (EC2)
  3. Kinesis Firehose
    1. Amazon Kinesis Data Firehose is a fully managed service for delivering real-time streaming data to destinations such as Amazon Simple Storage Service (Amazon S3), Amazon Redshift, Amazon Elasticsearch Service (Amazon ES), and Splunk.
    2. You configure your data producers to send data to Kinesis Data Firehose, and it automatically delivers the data to the destination that you specified.
    3. You can also configure Kinesis Data Firehose to transform your data before delivering it.
    4. No need to worry about shards/streams
    5. record is the data of interest that your data producer sends to a Kinesis data delivery stream. A record can be as large as 1,000 KB.
    6. Data is not stored. Its immediately sent to the consumer
  4.  Kinesis Data Analytics
    1. you can process and analyze streaming data using standard SQL. The service enables you to quickly author and run powerful SQL code against streaming sources to perform time series analytics, feed real-time dashboards, and create real-time metrics.
    2. To get started with Kinesis Data Analytics, you create a Kinesis data analytics application that continuously reads and processes streaming data.
    3. The service supports ingesting data from Amazon Kinesis Data Streams and Amazon Kinesis Data Firehose streaming sources. Then, you author your SQL code using the interactive editor and test it with live streaming data. You can also configure destinations where you want Kinesis Data Analytics to send the results.
    4. Kinesis Data Analytics supports Amazon Kinesis Data Firehose (Amazon S3, Amazon Redshift, and Amazon Elasticsearch Service), AWS Lambda, and Amazon Kinesis Data Streams as destinations.

API Gateway

  1. AWS provides low cost highly scalable REST API gateway for developers to seamlessly create front door to their programs/data
    1. Your core domain functionality/business logic and or data can be published thru API
    2. Your processes running on EC2/code running on Lambda/Web Applications can be used to publish this data thru API
  2. Typically the API consumers are your end users using
    1. Web browser running JavaScript calling the REST API
    2. Mobile applications
    3. Other server based processes inside your organization or clients outside your organization
  3. API gateway can be setup cache the results of the API for a predefined amount of time known as TTL
  4. You can throttle requests to prevent DOS attacks
  5. Can log requests to Cloud Watch logs
  6. Cross Origin Resource Sharing (CORS)
    1. CORS enables JavaScript served from domain1 to make REST API calls to domain2
    2. CORS can be enabled on AWS API Gateway
  7. When request submissions exceed the steady-state request limits and burst limits, API gateway fails the limit exceeding requests and returns 429 Too many requests error.
  8. API Cacheing
    1. Not eligible for Free tier
  9. /ping and /sping are reserved by AWS so can’t use them in your API path
  10. In a lambda proxy integration, the entire request is sent to the backend Lambda function as-is, via a catch-all ANY method that represents any HTTP method. The actual HTTP method is specified by the client at run time. The ANY method allows you to use a single API method setup for all of the supported HTTP methods: DELETEGETHEADOPTIONSPATCHPOST, and PUT.
  11. First create Lambda.
    1. Then create API.
    2. In the API, create resource.
    3. Under the resource create a method.
  12. Build an API Gateway API with Lambda Proxy Integration  describe how to create an API Gateway API to expose the integrated Lambda function. In addition, you can create an API Gateway API to expose other AWS services, such as Amazon SNS, Amazon S3, Amazon Kinesis, and even AWS Lambda. This is made possible by the AWS integration. The Lambda integration or the Lambda proxy integration is a special case, where the Lambda function invocation is exposed through the API Gateway API.
    1. you create an IAM role that your AWS service proxy uses to interact with the AWS service. We call this IAM role an AWS service proxy execution role. Without this role, API Gateway cannot interact with the AWS service. In later steps, you specify this role in the settings for the GET method you just created.
  13. In addition to exposing Lambda functions or HTTP endpoints, you can also create an API Gateway API as a proxy to an AWS service, such as Amazon SNS, Amazon S3, Kinesis, enabling your client to access the back end AWS services through your APIs.
  14. In addition to using IAM roles and policies or custom authorizers, you can also use a user pool in Amazon Cognito to control who can access your API in API Gateway. A user pool serves as your own identity provider to maintain a user directory. It supports user registration and sign-in, as well as provisioning identity tokens for signed-in users.

Elastic Transcoder

  1. Converts media files from one format to another format over the cloud
  2. Cost is based on the minutes spent and resolution
  3. A typical usecase
    1. Upload media file to S3
    2. S3 calls Lambda
    3. Lambda invokes Elastic Transcoder
    4. Elastic trans coder then copies the results a different S3 bucket

AWS Simple Notification Service (SNS)

  1. SNS is highly scalable push based messaging system (unlike SQS which is pull based).
  2. Works on the principle of pub/sub (publish subscribe) protocol
  3. Publishers send messages to SNS topic
  4. One or more subscribers (programs/lambdas/EC2’s) can subscribe to the topic.
    1. Once subscribed to a topic, all subscriber programs get all the messages PUSHED to it.
    2. No need to poll for the messages unlike the SQS
  5. Push notifications can be sent in many ways:
    1. Push messages to Android/ios/fire os devices
    2. SMS text messages
    3. Emails
    4. SQS queues
    5. HTTP REST endpoints
  6. SNS achieves redundancy by storing these messages across many Availability Zones in a single region.
  7. Protocols used: HTTP, HTTPS, EMAIL, EMAIL-JSON, SQS or Application – messages can be customized for each protocol.
  8. SNS messages are stored redundantly to multiple AZs
  9. Data format used by SNS is JSON (Subject, Message, TopicArn, MessageId, unsubscribeURL)
  10. Cost is based on recipient types.
    • $0.50 per 1 million SNS requests
    • to HTTP: $0.06 / 100,000 notification deliveries
    • to EMAIL: $2 / 100,000 notification deliveries
    • to SMS: $0.75 / 100 notification deliveries

Simple Workflow Service (SWF)

  1. AWS SWS is a cloud based web service that enables a sequence of tasks to be completed by a distributed set of programs/lambdas/EC2s/humans.
    1. A good example to Amazon’s own order taking/processing/fulfillment/shipping/returns system with a large number of tasks within a single workflow.
    2. SWS is task oriented service (as opposed to SQS which is message oriented service)
  2. Three types of actors
    1. Starters initiate the workflow
    2. Workers are programs that can be run inside an EC2 or behind your firewall.
      1. retrieve tasks from SWS
      2. process the received tasks
      3. send results back to SWS
    3. A Decider program coordinates/orders/schedules tasks.
      1. They run inside an EC2 or behind your firewall.
      2. Also responsible for concurrency
      3. Workers and Deciders can run independently
  3. SWS responsibilities:
    1. SWS coordinates across workers/deciders.
    2. SWS stores tasks, assigns the tasks, monitors the tasks
    3. A task is assigned only once and never duplicated (unlike SQS messages)
    4. SWS maintains the state of the workflow
    5. The workers/deciders do not have to remember the application state hence workers/deciders can run and scale independently
  4. Workflows are grouped into domains. Domains isolate a set of tasks/executions/types from other domains within a given AWS account.
  5. A workflow can take up to a maximum 1 year to finish.
  6. Task is only assigned once and never duplicated (main difference from SQS where messages can be processed multiple times).
  7. SQS Design Question:
    1. For mission critical business processes such as e-commerce order processing, SWF is preferable to SQS since you don’t want to process orders more than once.
    2. SQS can’t 100% guarantee these duplicate messages no matter how much you increase the visibility timeout and make sure processed SQS messages are deleted.

Simple Queue Service (SQS)

  1. AWS SQS is a web service that provides us with a message queue on the cloud to which a sender process (not humans) can send a message and a reader process (not humans) can retrieve/process/delete the message.
    1. So message queue is basically a temporary repository of messages on the AWS cloud.
  2. SQS is asynchronous. Processes are  decoupled (or loosely coupled), distributed, hence highly elastic and scalable.
  3. SQS is pull based system (unlike SNS which is push based). One or more reader programs/lambdas/EC2s can pull messages. You can setup auto scaling to add/remove pulling EC2 instances based on the size the queue.
  4. Messages can be 256 KB max. Text only. Billed in 64 KB chunks.  You can ASCII encode if you need to send binary files.
  5. Two types of queues:
    1. Standard queues:
      1. No order is maintained.
      2. Messages are guaranteed to be sent at-least once. Some rare occasions they are sent twice or more. You need to architect keeping this in mind.
    2. FIFO queues:
      1. Order of messages is maintained.
      2. Messages are guaranteed to be sent only once.
      3. Multiple ordered messages groups can be hosted in a single queue.
      4. Max 300 transactions per sec.
  6. Messages can be kept in the queue from duration of 1  min to 14 days max. Default is 4 days.
  7. Visibility Timeout (VT) of a message is the amount of time it will be invisible in the queue after a reader pulls that message.
    1. Max visibility timeout is 12 hours.
    2. Default visibility timeout is 30 sec
    3. After timeout expires, the message will be automatically visible again in the queue. So its the responsibility of the reader to delete the message after processing.
    4. Also make sure message processing is finished before the VT or else you as an architect need to increase the visibility timeout.
  8. Two types of polling possible by the reader
    1. Short polling returns immediately. So it could be expensive if the queue is sparse.
    2. Long polling returns only when a message is available in the queue or the long polling timeout (min 1 sec to max 20 sec) whichever is earliest.
  9. Custom SQS Policy can be used to allow access to queues across accounts
    1. If you want to allow Amazon SQS access based only on an AWS account ID and basic permissions (such as for SendMessage or ReceiveMessage), you don’t need to write your own policies. You can just use the Amazon SQS AddPermission action.
    2. If you want to explicitly deny or allow access based on more specific conditions (such as the time the request comes in or the IP address of the requester), you need to write your own Amazon SQS policies and upload them to the AWS system using the Amazon SQS SetQueueAttributes action.
  10. Single request can have 1 to 10 messages with a maximum 256KB payload
  11. Cost: First 1 million request ares free, then $0.50 per million req.
  12. Priority based Design – Create 2 queues, one for higher priority (Eg: Paid Customers)  and other for lower priority (free users). Lets EC2 read priority queues first and process then move on to low priority queue.

Introduction to Amazon Web Services (AWS)

Amazon Web Services (AWS) is Amazon’s popular cloud platform. This book will help you prepare and successfully finish AWS Certified Solutions Architect – Associate (Released February 2018)

This book is for programmers who have at least 6 months exposure to AWS cloud. There are many books available in the market. The official study guide (OSG) for ACSA certification is what I used to get my certification. While going thru the book, I realized that even though this OSG was perfect for me with not much experience and enough time on hand, there are many people out there who are already experienced in AWS but not certified and they may not have as much time on hand to go thru this entire book as it takes up to 2 months to go thru and finish all the exercises.

So I have decided to take the summary of this book and create a smaller version with precise bullet points. This version is targeted for seasoned AWS developers/architects to study and pass the exam in just one week. By no means my version is a copy of the original book, but it is redesigned and re-written from grounds up, by keeping the original as only a model and covering all the topics and also adding many more topics since my book is targeted  for those attempting the February 2018 version of the exam.

The course is divided into topics and sub topics. For example IAM is a topic and IAM Policies is a sub topic. I try to keep each subtopic very precise with around 10 main bullet points and some of these bullet point having sub bullet points where needed. Also each sub topic covers everything that you need to understand and my own acronyms are provided where ever possible to make the point memorable. Please feel free to email us from Contact page if you think an important point is missing in the bullet point list for any sub topic, but utmost care is taken not to miss any point and I am very confident that this course is enough for you to pass the exam. Only other things that you may want to study are the white papers and sample tests.

IAM Authentication

  1. sIAM Authentication Methods
  1. IAM authenticates a principal (human or application) using one the following three ways:
    1. UserId/Password
      1. Password policy ensures complexity and duration of password
      2. MFA enables multi factor authentication
    2. Access Key
      1. Access Key is a combination of 20 char Access Key Id and 40 char Secret Access Key
      2. Using Access Key, an application can interact with AWS SDK/API via IAM
      3. aws config cli command can store access key id and secret access key
      4. For security purposes you need to rotate keys from time to time
    3. Access Key/Session Token
      1. Process can assume a role and a temp security token is obtained by the process from IAM STS
      2. Security token contains Access Key (Access Key Id/Secret Access Key combo) and a session token
      3. Calls to SDK API must be passed with both the above values to access AWS resource
      4. Security Token Service (STS) grants users temporary access to resources on AWS. There are three types of users
        1. Federation users such as active directory or any other LDAP based directory service users
        2. Federation with well known services such as Google/FB/Twitter users
        3. Users from another AWS account
      5. Identity broker is a service that can take identity from Identity Store/Pool 1 and join (federate) it with Identity Store/Pool 2
        1. In a typical scenarios, a user logs into a website with id/pwd
        2. Identity broker then calls LDAP first and authenticates the user
          1. Then identity broker talks to AWS STS to get authenticated and get security token to access AWS services (like S3)
          2. Or alternatively it can request IAM role and assume that role to authenticate with STS and then get access permissions to talk to S3
      6. Active Directory users can access AWS using SAML (Security Assertive Markup Language). AD Connector is designed to give you an easy way to establish a trusted relationship between your Active Directory and AWS. When AD Connector is configured, the trust allows you to:
        1. Sign in to AWS applications such as Amazon WorkSpaces, Amazon WorkDocs, and Amazon WorkMail by using your Active Directory credentials.
        2. Seamlessly join Windows instances to your Active Directory domain either through the Amazon EC2 launch wizard or programmatically through the EC2 Simple System Manager (SSM) API.
        3. Provide federated sign-in to the AWS Management Console by mapping Active Directory identities to AWS Identity and Access Management (IAM) roles.
  2. Longevity of the authenticated session
    1. Credentials that are created by using account credentials can range from 900 seconds (15 minutes) up to a maximum of 3600 seconds (1 hour), with a default of 1 hour.
    2. The GetSessionToken action can be called by using the long-term AWS security credentials of the IAM user. Credentials that are created by IAM users are valid for the duration that you specify, from 900 seconds (15 minutes) up to a maximum of 129600 seconds (36 hours), with a default of 43200 seconds (12 hours)
Copyright 2005-2016 KnowledgeHills. Privacy Policy. Contact .