The commoditization of technology has reached its pinnacle with the advent of the recent paradigm of Cloud Computing. Infosys Cloud Computing blog is a platform to exchange thoughts, ideas and opinions with Infosys experts on Cloud Computing

« June 2019 | Main | May 2020 »

January 28, 2020

Demystifying- AWS Instance Tenancy

Multitenancy is one of the five core features of cloud computing. It makes larger pool of computing resources available to large group of people without compromising security and confidentiality. Since resources are utilized more efficiently, multitenancy brings cost-effectiveness.
InstanceTenancy-1.png
To visualize tenancy, let's take an example of apartments in a building. Each resident has access and authorization to their flats. However, all residents share common areas, electricity, water and amenities. 
In this article will explore tenancy scenarios in AWS cloud context and try to understand various offerings.

Primarily AWS has below tenancy models:

  1)  Shared / Default Tenancy 
  2)  Dedicated Tenancy

           a)  Dedicated Instance
           b)  Dedicated Host 
  3)  Bare Metal 

Shared/ Default Tenancy

Launched in 2006, this is default tenancy model in AWS cloud environment. In this model virtual machines from different customers shares same underlying hardware. These virtual machines are isolated from one another at hypervisor level. So even they share same server but still the individual VM's from different customers can't interact with each other.

This is the most cost-effective model and often used unless there is a restriction due to licensing or regulatory compliance. This in true sense is commoditization of IT services in cloud computing.

InstanceTenancy-2.png
When a VM in shared tenancy model is launched or requested, AWS will carve out the requested infra capacity on any of the available physical hardware. This is chosen randomly by a process transparent to customer. We do not have any control on this decision. Once VM is launched, it is identified by instance ID and you won't get details of underlying host, like host ID.

This instance to host association is temporary and persist till VM is powered ON. Once VM is shutdown, the capacity on the physical server is released to resource pool and only the data on persistent storage remains. This is a really powerful feature as it allows to reduce cost when you shut down the instances which are not in use.
Once started, AWS again looks for available physical host to place this instance. This will result in the change in underlying hardware.

InstanceTenancy-3.pngHowever, when instance is rebooted, the underlying hardware doesn't get changed as reboot being OS level activity.

Dedicated Tenancy

On the other hand, there is dedicated tenancy model in AWS under which the physical host will have only the instances (dedicated or shared) belonging to same customer account. This host will not be shared with other customers and this way will provide host level isolation. Even the instances belonging to AWS accounts linked to a single payer account are also isolated at physical host level.


InstanceTenancy-4.png
This is suitable for the use cases where shared model is undesirable due to government regulations dictating against sharing the physical host. For example, earlier under Health Insurance Portability and Accountability Act (HIPAA), it was mandated that instance need to be hosted on dedicated resource.

It is also applicable for the cases where there are license restrictions, like bring your own license (BYOL) where application licenses need physical cores and sockets visibility.

In whichever case if instance hosted on shared physical host is unacceptable, dedicated tenancy model is the option. However, it comes with additional cost.

To understand dedicated tenancy in detail, let's look at the comparison between shared and dedicated tenancy.

InstanceTenancy-5.png
AWS has two launch types under dedicated tenancy model.
  1)  Dedicated Instance
  2)  Dedicated Host

AWS introduced dedicated instance type in 2011, later in 2015 they also came up with dedicated host type. Both dedicated host and dedicated instance allow instance to run on a hardware dedicated to your account. They use same physical hardware, equally secure and same in performance. However, advantage with dedicated host type launch is that it gives additional control and visibility to place the instance on chosen dedicated (physical) server or in other words we can decide on which physical server our instance should be hosted.

InstanceTenancy-6.png


Dedicated Instance
Advantages of dedicated instances launch type:
  • Dedicated instances support auto scaling.
  • RDS can run on dedicated database instance. To achieve this, we have to create a VPC with dedicated tenancy and launch RDS instance with an approved dedicated instance type.

Limitation- 
  • When instance tenancy of a VPC is set to dedicated some AWS, services won't work and some instances type can't be launched. 
  • There is no instance placement control.

Dedicated Host
Advantages of dedicated host launch type:
  • Affinity
  • Instance placement controls
  • Visibility of sockets and physical cores

Affinity- This is an option in dedicated host type launch. When turned on it will create a relationship between host ID and the instance. So even if the instance is stopped and restarted, it will still run on the specified dedicated host unlike shared tenancy or dedicated instance where underlying hardware can change on stop and start.

Instance placement controls- Dedicated host gives more control and allow to maintain instance placement scheme or to decide that on which hardware the instance will persist. This is a very useful option to address corporate compliance, government regulatory requirements and licensing requirements. 

Visibility of sockets and physical cores - For the migration use cases where existing licenses which are bound to physical cores, sockets or VM's need to be reused, dedicated host is the suitable option. For example, with dedicated host you can use your existing VM, core, socket based licenses for SUSE Linux Enterprise Server, Red Hat Enterprise Linux, Microsoft Windows Server, Microsoft SQL Server licenses. 

Instance Placement in Dedicated Host Vs Dedicated instance 
InstanceTenancy-7.png

Capacity Reservation- Dedicated Host
Dedicated hosts need to be allocated first. While allocating you need to choose instance size and type which will define number of sockets and physical cores on dedicated host. Therefore, the choice of instance type will define the number of instances which can be launched on that host.

For example, if you choose a c5.xlarge instance to be launched on dedicated host which will have 2 sockets and 36 physical cores, you'll be allowed to launch maximum of 18 c5.xlarge instance on that host.

InstanceTenancy-8.png

Billing 

Dedicated instance Pricing 
There are 2 components for AWS dedicated instance pricing.

  1. Per hour instance usage fee.
  2. Dedicated Per region fee - Irrespective of number of dedicated instances running, an additional fee of 2USD per hour is charged per region.
Reserved dedicated instances can be purchased to further reduce cost up to 70%.

Dedicated Host Pricing
  • Complete physical server is reserved irrespective of the number of instances to be hosted on it. The price of dedicated host depends on the region and instance family. An hourly charge is applied for entire host till it is released.
  • For example, to reserve a dedicated host in N.Virginia region to launch m5 instance family and hourly charge of $5.069 will be applicable.
  • Point to be noted here is that each dedicated host can have same instance type/family. You can't mix instance families. For example, if a dedicated host is allocated with c3.xlarge instance family, the host can have upto 8 c3.xlarge or 4 c3.2xlarge instances. 

Limitations of Dedicated Host
  • For each instance family only two on-demand dedicated hosts are allowed per region. This is a soft limit and increase can be requested.
  • The instance launched on dedicated host doesn't get counted against instance limit. It is independent for dedicated host limit.
  • With dedicated host some operating systems can't be used even if they are available on AWS marketplace or offered by AWS itself. Example, SUSE Linux, RHEL and Windows AMIs.
  • Dedicated Host only supports instance launch within VPC.
  • RDS, auto-scaling, placement groups and free tier usage are not supported for dedicated host.
  • EBS volume still runs on multi tenancy hardware even if EC2 instance is dedicated instance/host.

Limitations of Changing Tenancy After Launch
So far, we have seen that any instance launched in VPC has one of the below tenancy attribute: -
  • Shared Tenancy- Where customer's instances run on multitenant host.
  • Dedicated Instance - Where customers instances run on single-tenant host.
  • Dedicated Host - Where customers instances run on single-tenant host with instance placement control and visibility of sockets and cores.
Once instance is launched there are some limitation in changing the tenancy.
  • If instance is launched with default tenancy, it can't be changed to dedicated or host.
  • If an instance is launched with dedicated instance or host type tenancy it can't be changed to default tenancy.
  • However, tenancy can be changed from dedicated instance to dedicated host or vice versa.
Similarly, VPC also has a tenancy attribute which is defined during VPC creation. These are: -

Default - In the VPC with default tenancy attribute, both default and dedicated tenancy instances can be launched. 
Dedicated- In VPC with dedicated tenancy attribute, by default all instance will be launched as dedicated instances unless specified as dedicated hosts. Instances with default tenancy can't be launched in such VPC's.

InstanceTenancy-9.png
Point to be noted here is that if VPC is launched with dedicated tenancy attribute it can be changed to default tenancy attribute. But if VPC is launched with default tenancy attribute it can't be changed to dedicated tenancy later.
As of now this tenancy can be changed through AWS CLI, AWS SDK or AWS EC2 API. Change through console is not supported.

Bare Metal Instance
Introduced in 2017, bare metal instances are non-virtualized AWS instances. There is no hypervisor. The OS runs directly on the host or the physical box/server. The OS will have access to complete hardware of the physical device.
Spinning up a bare metal instance is same as any other instance, just select bare metal in instance type. Most of latest generation AMI's and compatible with bare metal instances.

InstanceTenancy-10.png

Once launched, similar to any other instance in AWS, bare metal instances are also placed in a VPC, we can attach EBS volume and connect through SSH/RDP.
Alike other EC2 instances, the bare metal instances also take advantage of other AWS services such as AWS cloudwatch, AWS autoscaling, AWS elastic load balancer and can also access full suite of AWS analytics, mobile, IoT, security and artificial intelligence services.

Cost wise, its equal to the largest instance in the family because here complete server is blocked for customer.
Below is the example from AWS calculator where an m5.24xlarge instance and m5.metal instance has same cost:

InstanceTenancy-11.png

Bare metal instance families - currently instance families supported for bare metal instances are m5. metal, r5. metal and with instance store m5d.metal, r5d.metal and z1d.metal.

InstanceTenancy-12.png
These bare metal instance are suitable for below use cases: -
  • Where software need to run without hypervisor layer directly on the hardware.
  • If customer wants to run his own hypervisor and manage it.

Here, it's very important to understand that in dedicated host whole server is reserved for you. You pay for whole box irrespective of whatever size of instance you run on it. No stopping of instance concept.
In bare metal, you can stop and restart the instance. Billing stops and underlying hardware might change. You do not get to know host ID.
Even its running on full physical host and there is no virtualization but still underlying host is not fix. 

InstanceTenancy-13.png



-- This article is written under Atul's guidance.

January 27, 2020

AWS Datalake - Let's Dive Deep

With the surge in information technology, huge amounts of data need to be dealt with. Storing data at such a large scale and deriving meaningful insight of great business value is important. There are different ways to deal with it like data warehouse technologies and data lakes. This article will explore how AWS offerings supports data lake solutions.


Datalake-1.png

Image Source- www.pexels.com/ free images

Pentaho CTO James Dixon is credited for coining the term "data lake". He describes a data mart (subset of a data warehouse) analogues to a bottle of water" cleansed, packaged and structured for easy consumption" "while a data lake is more like a water body in its natural state. Data flows from the streams (the source systems) to the lake. Users have access to the lake to examine, take samples or dive in".


Datalake

As we are moving in the world of IoT and machine learning to derive better and informed business decisions, data has become most valuable asset for organizations. From clickstreams to IoT, mobile apps to social media and data generated by business applications, it's all data. This amount of data is massive and organizations are looking for a way to deal with it. Consequently, data lakes are getting popular day by day.

Data warehouses are being used mainly for deriving operational reporting and analysis since long. Generically, it's a relational database which is apt for pre-defined schema and data structure while optimizing fast SQL queries.  The data received from transactional systems and business applications is cleaned, transformed and enhanced to be used as "single source of truth".

Whereas, data lakes can capture relational as well as non-relational data where data structure or schema need not to be defined. Which means that structured data from business applications to non-relational data like clickstreams and data from social media and IoT devices all can be captured in data lakes.

Business analysts and data scientists can then use this diverse data to run SQL queries, real-time analytics, big data analytics and machine learning to derive trends and conclusions of great business value. E.g. use of machine learning to predict future outcomes and prescribe actions for rapid response.

Datalake-2.png
Think data lake as a centralized repository which can store all data in real time irrespective of the source, the structure, the size and its type. Data is kept in its raw form. It is only transformed when it is ready to be used. 

Datalake-3.png
Analysts can process this raw data with some analytic tools and frameworks. E.g. open source frameworks like Apache Hadoop, Apache Spark and Presto as well as commercial offerings from many business intelligence solution vendors. These all analytics can be done without moving data to any separate analytics system.

But, the biggest challenge today with data lakes is that deploying and managing them requires lot of complex and laborious manual tasks. Like: -

  • Load data from diverse sources and monitor those data flows.
  • Match linked records
  • Turn-on encryption and management keys
  • Provide access to data sets
  • Setting up partitions
  • Define transformation jobs and monitor their operation
  • Deduplicate redundant data
  • Re-organize data into columnar format
  • Configure access control settings and audit periodically. 


Deploying data lake in AWS is easy, there are two ways, automated data lake build through combination of services which can be implemented through infrastructure as a code service of AWS called CloudFormation and another one is a managed service called AWS Lake Formation. 


Data Lake Solution on AWS

This solution has required AWS services to build a data lake solution described in a JSON or YAML template. This template can be executed to deploy data lake using AWS native services like AWS S3, AWS Athena, AWS Glue, AWS DynamoDB, AWS CloudWatch and AWS Elasticsearch.

Features of data lake solution on AWS -

  • Flexible and scalable- Flexible to ingest all types of data (as-is) at scale. Design components support data encryption, search, analysis and querying at scale.
  • Access control and data security- Granular access-control policies and data security mechanisms to protect all data stored in data lake.
  • Leverage AWS Managed Services- Eg. Amazon Kinesis, AWS Direct Connect or AWS Snowball/ Snowmobile to transfer large amounts of data and use AWS Data Pipeline, Amazon EMR, and Amazon Elasticsearch Service for data processing and analysis.

Datalake-4.png

Image Source- Amazon Web Service


AWS data lake has a server-less architecture (no EC2 instance deployment and management). It uses S3 for storage and processing is done by a micro-services layer which is written using AWS Lambda. This solution deploys a data lake console into an Amazon S3 bucket which is configured for static website hosting and configures an Amazon CloudFront distribution to be used as the solution's console entry point. 

Below are the AWS offerings for data lake solution -

  • Amazon API Gateway - Provide access to data lake micro-services. These micro-services interact with Amazon S3, Amazon Athena, AWS Glue, Amazon Elastic-search, Amazon DynamoDB and Amazon CloudWatch logs to provide data storage, management and audit functions.
  • AWS Lambda - For Microservices
  • Amazon Elasticsearch - For robust search capabilities.
  • Amazon Cognito - For user authentication.
  • AWS Glue - For data transformation.
  • Amazon Athena - For Analysis.
  • Amazon S3- Storage to leverage security, durability, and scalability of S3.
  • Amazon DynamoDB - Manage metadata.
  • AWS KMS - For Security.


AWS Lake Formation

Last year in Re-Invent, AWS announced new managed service for data lake deployment called AWS Data lake Formation. It has even simplified data lake deployment.
So far AWS Lake Formation was only available in limited preview. However, from September 2019 onward it now available in general availability.


Datalake-5.png

Image Source- Amazon Web Service


With data lake formation user just need to define where data should reside and what data access and security policies need to apply. Lake Formation then automatically takes care of below using machine learning algorithm. 

  • Import data from databases which are in AWS, from external data sources and from other AWS sources to amazon S3 data lake.
  • Catalog, label data and transform data.
  • Clean and deduplicate data.
  • It also takes care of security by enforcing encryption, managing access control and implementing audit logging.

Once this processed and transformed data is available, any analytics and machine learning service, like Amazon Elastic Map Reduce for Apache Spark, AWS Redshift, AWS Athena, AWS QuickSight and AWS Sagemaker, can be used to draw a meaningful vision from this data.



-- This article is written under Atul's guidance.