The commoditization of technology has reached its pinnacle with the advent of the recent paradigm of Cloud Computing. Infosys Cloud Computing blog is a platform to exchange thoughts, ideas and opinions with Infosys experts on Cloud Computing

« March 2019 | Main

June 28, 2019

Amazon Aurora Serverless, the future of database consumption

Amazon has recently launched Amazon Aurora Serverless database (MySQL-compatible edition). This is going to set a new trend in the way databases are consumed by organizations. Traditionally database setup, administration, scaling and maintenance is tedious, time consuming and expensive. Thanks to cloud computing, RDS takes away the setup, scaling and maintenance of databases from customers. Amazon Aurora Serverless takes RDS to the next level where the users pay only for what they use and when they use.

Amazon Aurora database architecture addresses the bottlenecks in scalability and fault tolerance of traditional databases by decoupling the storage and compute tiers and offloading some of the functions like redo logging, crash recovery, backup & restore etc. to the storage tier. Aurora database has fault-tolerant and self-healing storage built for the cloud that replicates six copies of user data across three Availability Zones.

AuroraServerlessArchitectureDiag.png

Aurora Serverless is an on-demand auto-scaling Aurora database without managing the database instances by customers. Aurora Serverless scales up and down based on the load generated by the application which makes it ideal choice for unpredictable workloads.

Users can provision the database capacity in "Aurora Capacity Units (ACUs)", a combination of memory and processor capacity. Database scales up and down based on the application load between the specified "minimum and maximum ACUs". Aurora Serverless manages a warm pool of resources to minimize scaling time. Due to decoupling of compute and storage tiers, new capacity can serve traffic in seconds.

AWS charges its customers for Aurora Serverless on three parameters - Database Capacity Units (ACUs), I/O (million request increments) and storage consumption (per GB-month increments).

Aurora Serverless can be optionally paused automatically after a given amount of time with no activity (default 5 minutes). When the database is paused users are only charged for the storage consumption. This is a game changer especially for non-production databases. Typically, non-production databases are not used round the clock hence the cost saving potential is huge. When Aurora Serverless automatically resume from pause the first connection will experience a higher latency of about 25 seconds.

Amazon Aurora Serverless provides multiple levels of security for the databases similar to Amazon Aurora standard instance. It supports "VPC endpoint", encryption of "data at rest and in transit" and key management through "AWS Key Management Service (KMS)". By enabling encryption for an Aurora Serverless database, encryption is automatically enabled for "underlying storage, automated backups, snapshots and replicas".

Some of the typical use cases for Aurora Serverless could be Dev/Test environments, unpredictable workloads, infrequently used applications etc. Setting up of an application stack is "at the click of a button" using Amazon Aurora Serverless databases and other AWS Serverless components.

Courtesy: AWS documentation and white papers.

June 26, 2019

AWS Cloudformation: An underrated service with a vast potential

As businesses are experiencing surge in provisioning and managing infrastructure and services through cloud offerings, a collateral challenge has emerged on the sidewalls. The challenge to remain accurate and quick while provisioning, configuring and managing medium to large scale setups with predictability, efficiency and security.
Infrastructure as a Code i.e. IaaC is a way to manage resource provisioning, configurations and updates/changes using tested and proven software development practices which are used for application development.

E.g.
  • Version Control
  • Testing
  • CI/CD
IaaC2.png

Key Benefits:

1)  Cost Reduction- Time and effort reduction in provisioning and management through IaaC.
2)  Speed - Faster execution through automation.
3)  Risk Reduction- Less chances of error due to misconfiguration or human error.
4)  Predictability- Assess the impact of changes via change set and take decision accordingly.

There are several tools which can be used for deploying Infrastructure as a Code.
  • Terraform
  • CloudFormation 
  • Heat
  • Ansible
  • Salt
  • Chef, Puppet

Ansible, Chef and Puppet are configuration management tools which are primarily designed to install and manage software on existing servers. Certain degree of infrastructure provisioning can be supported by them, however, there are some specifically designed tools which are a better fit.

Orchestration tools like Terraform and CloudFormation are specially designed for infrastructure provisioning and management.  

CloudFormation is an AWS native Infrastructure as a code offering. One of the most underrated services in Amazon cloud environment for so many years. However, with increasing awareness on this, IaaC Service is getting traction and lot of clients are willing to look at the advantages.

It allows codification of infrastructure which helps in leveraging best software development practices and version control. It can be authored with any code editor like Visual Studio code or Atom editor, checked into a version control system like Git and reviewed with team members before deployment into Dev/Test/Prod. 

CloudFormation takes care of all the provisioning and configuration of resources and developer can focus on development rather than spending time and efforts on creating and managing resources individually.

CFNDgrm1.4.png
Resources are defined in the form of code (JSON or YAML) in Template which interacts with CFN service to produce Stack which is a collection of AWS resources that can be managed as a single unit. In other words, we can create, update, or delete a collection of resources by creating, updating, or deleting stacks.

CloudFormation can be used to deploy simple scenarios like spinning up a single EC2 instance to a complex multi-tier and multi-region application deployment.

For example, all the resources required to deploy a web application like web server, database server and networking components can be defined in a template. When this template interacts with CloudFormation service, it deploys desired web application. There is no need to manage dependencies of the resources on each other as it's all taken care by CloudFormation. 

CloudFormation treats all stack resources as a single unit which means for a stack creation to be successful, all the underlying resources should be created successfully. If resource creation fails, by default CloudFormation will roll back the stack creation and any created resource till that point of time will be deleted.

However, point to be noted here is that any resource created before roll back will be charged.

Below example will create a t2.micro instance Named "EC2Instance" using Amazon Linux AMI in N. Virginia region.

Temp2.png
 
Like easy creation, CloudFormation also allows easy deletion of stack and cleanup of all underlying resources in a single go.

Change Sets- While updating or changing any resource there is always a risk associated with the impact of that change. For example, updating security group description without defining VPC in template or in a non VPC environment will recreate security group as well as EC2 instance associated to it. Another example is updating an RDS database name which will recreate the database instance and can be severely impacting.

CloudFormation allows to preview and assess the impact of that change through change sets to ensure it doesn't implement unintentional changes. 

ChangeSet3.1.png

Below change set example shows that this change will -

CHangeSetAWSPart1.png
CHangeSetAWSPart2.png
 
1)  Replace the security group.
2)  EC2 instance may or may not be replaced based on several factors which are external to this CloudFormation template and can't be assessed with certainty. For such cases the impact can be assessed with the help of AWS Resource and Property Types Reference (https://docs.aws.amazon.com/AWSCloudFormation/latest/UserGuide/aws-template-resource-type-ref.html) document.

Conclusion: CloudFormation, the infrastructure as a code service from AWS unleashes the real power and flexibility of cloud environment and has revolutionized the way we deploy and manage the infrastructure. It is worth investing time and efforts exploring it.





June 25, 2019

S3- Managing Object Versions


Main2.jpg
S3 has been one of the most appreciated services in AWS environment, launched in 2006, it provides 99.999999999 % (eleven nines) of durability. As of now, it handles over a million requests per second and stores trillions of documents, images, backups and other data.

Versioning is one of the S3 feature which makes it even more useful. Once versioning is enabled, successive uploads or PUTs of a particular object creates distinct named and individually addressable versions of it. This is a great feature as it provides safety against any accidental deletion due to human or programmatic error. Therefore, if versioning is enabled, any version of object stored in S3 can be preserved, retrieved or restored.

However, this comes with an additional cost as each time a new version is uploaded, it adds up to S3 usage which is chargeable. This cost can multiply very quickly if the versions which are not in use are managed improperly. So how to suitably manage current as well as old versions?

This is easy, there are two options: -
1)  Use of S3 Lifecycle Rules
2)  S3 Versions-Manual Delete 


Use of S3 Lifecycle Rules

When versioning is enabled, a bucket will have multiple versions of same file i.e. current and non-current ones.
Lifecycle rules can be applied to ensure object versions are stored efficiently by defining what action should be taken for non-current versions. Lifecycle rules can define transition and expiration action.
 
LifecyclePolicySteps.png
Below example will create a lifecycle policy for the bucket which says that all non-current versions should be transitioned to Glacier after one day and should be permanently deleted after thirty days.

Review2.JPG



S3 Versions-Manual Delete
Deleting versions manually can be done simply from console. Because all the versions are visible/accessible from console so specific version of the object can be selected and deleted.

 
ObjectConsoleDelete.JPG

However, while using command line interface, a simple delete object command will not permanently delete the object named in delete command, instead S3 will insert a delete marker in the bucket. That delete marker will become the current version of that object with new Id and all subsequent GET object request will return that delete marker resulting a 404 error. 
So even though that object is not erased, it's not accessible and can be confused with deletion. However, the object with all versions along with a delete marker still exists in bucket and keeps on consuming the storage which results in additional charges.

ObjectDel1.1.png
So what is the delete marker? When delete command is executed for a versioned object, a delete marker get inserted in the bucket which is like a placeholder for that versioned object. Due to this delete marker, S3 behaves as if object is erased. Like any object, delete marker also has key name and Id, however it differs from an object as it does not have any data and that is the reason it returns 404 error. 

The storage size of a delete marker is equal to the size of its key name which adds one to four byte of bucket storage for each character in key name. It is not that huge; then why should we get concerned about it? This is because the size of objects it blocks or hides can be huge and pileup enormous bills.

Point to be noted here is that delete marker is also inserted in version suspended buckets, so if versioning is enabled and then suspended (because we know that versioning can't be disabled ever if once enabled) even then all simple delete commands will insert delete marker. 

Removing delete markers is tricky. If a simple delete request is executed to erase a delete marker without specifying its version Id, it won't get erased instead another delete marker gets inserted with a new unique version Id. All subsequent delete request will insert additional delete markers. It is possible to have several delete markers for same object in a bucket.

ObjectDel2.2.png
To permanently remove delete marker, simply include version Id in delete object version Id request.

ObjectDel3.3.png
Once this delete marker is removed, a simple GET request will now retrieve the current version (e.g. 20002) of the object. 

This solves the problem of unintended storage consumption. But how to deal with that object at first place so that we don't have to go through this complication? 
To get rid of an object permanently, we need to use specific command "DELETE Object versionId". This command will permanently delete that version.

ObjectDel4.4.png

Conclusion: S3 provides virtually unlimited storage in cloud and versioning makes it even more secure by protecting objects from accidental deletion. However, it comes with a cost and should be managed cautiously. Above is a rational explanation for a scenario where the user deleted S3 object but still struggled with its charges in AWS bill. 



RDS - Scaling and Load Balancing

img2.2.jpg

Solution architects often have to encounter a question i.e. like an EC2 instances, can a load balancer with autoscaling group be used for scaling and load balancing RDS instances and databases hosted on them? 

While the answer to this question is "NO" there are other ways to scale RDS and load balance RDS read queries.

Point to consider here is that RDS is a managed service from AWS and thus takes care of scaling of relational database to keep up with the growing requirements of application without manual intervention. So former part of this article will focus on exploring vertical as well as horizontal scaling of RDS instances and the later would be observing load balancing options.

 Amazon RDS was first released on 22 October 2009, supporting MySQL databases. This was followed by support for Oracle Database in June 2011, Microsoft SQL Server in May 2012, PostgreSQL in November 2013 and MariaDB in October 2015. As on today it is one of the core PaaS services offered by AWS.


Scaling- RDS Instances 

RDS instances can be scaled vertically as well as horizontally.

Vertical Scaling

To handle higher load, database instances can be vertically scaled up with a single click. At present there are fifty type of instance and sizes to choose for RDS MySQL, PostgreSQL, MariaDB, Oracle or Microsoft SQL Server instance. For Aurora, there are twenty different instance sizes.

Follow below steps to vertically scale RDS instance.

 

img3.jpg

Remember, so far only instance type has been scaled, however the storage is separate and when instance is scaled up or down it remains unchanged. Hence volume also must be modified separately. We can increase the allocated storage space or improve the performance by changing the storage type (such as to General Purpose SSD to Provisioned IOPS SSD). 


img4.jpg

One important point to remember while scaling horizontally is to ensure correct license is in place for commercial engines like Oracle, SQL Server. Especially in BYOL model, because licenses are usually tied to the CPU sockets or cores. 

Another important consideration is that single AZ instance will be down or unavailable during this change. However if  database instance is Multi-AZ, the impact will be minimal as backup database will be updated first. A fail over will occur to newly updated database (backup which was updated first) before applying changes to main database engine (which will now become the standby).


Horizontal Scaling

To scale read intensive traffic, read replicas can be used. Presently, Amazon RDS for MySQL, MariaDB and PostgreSQL allow to create up to five read replicas for a given source database instance. Amazon Aurora permits creation of up to fifteen read replicas for a given database cluster.

Read replicas are asynchronously replicated copies of main database.

A read replica can be -

  • In same or different AZ as well as region- to be placed close to users.
  • Can be promoted to master as disaster recovery.
  • Can have same or different database instance type/class.
  • Can be configured as Multi-AZ - Amazon RDS for MySQL, MariaDB and PostgreSQL allow to enable Multi-AZ configuration on read replicas to support disaster recovery and minimize downtime from engine upgrades.
  • Can have different storage class.

a9.2.jpgEach of these read replicas can have different endpoints to share the read load. We can connect to these read replicas like how we connect to standard DB instance.


Load Balancing/Distribution of Read/Write Traffic on AWS RDS

AWS load balancer doesn't support routing of traffic to RDS. The CLB, ALB and NLB cannot route traffic to RDS Instances. So then how to distribute or balance read traffic on AWS RDS read replicas? 

There are two options -


1)  Using an open-source software-based load balancer, like HAProxy.

As we know that each replica has unique Domain Name Service (DNS) endpoint. These endpoints can be used by an application to implement load balancing. 

This can be done programmatically at application level or by using several open-source solutions such as MaxScale, ProxySQL and MySQL Proxy.

These solutions can split read/write queries and then proxy's like HAProxy can be used in between application and database server. HAProxy can listen read and write on different ports and route accordingly.


11.2.jpg

This approach allows to have a single database endpoint instead of several independent DNS endpoints for each read replica.

It also allows more dynamic environment as read replicas can be transparently added or removed behind the load balancer without any need to update database connection string of the application.


2) Second option is to use Amazon Route 53 weighted record sets to distribute requests across read replicas.

Though there is no built-in way, but this is a work around to use Route 53 weighted records to load share requests across multiple read replicas to achieve same results.

Within Route 53 hosted zone, different record sets can be created with each read replica endpoint with equal weight and then using Route 53 read traffic can be shared among different record sets. 


a8.2.jpg

Application can use route 53 endpoint to send read requests to database which will be distributed among all read replicas to achieve same behavior as load balancer.