Tech scrap book: January 2016

Sunday, January 3, 2016

SAML explained

You might have seen lot of web sites / services you access requesting you to login with an login page that you have used before.

To understand SAML, you might need to understand the following terms.

Identity Provider (IP): The company / enterprise holding the user credentials.
Service provder (SP): The company providing the service - say salesforce.
User: the person going to use the service.

Traditionally the credentials are maintained by Identity provider, but the credentials are also maintained in the Service provider so that whenever the user tries to access the SP he is authenticated within the service provider itself.
You might have seen in some sites where they ask you to register with a username and password to access the service.

Now the problems of using this approach is:
- the service provider have to maintain the user credentials even though the user credentials are in the Identity provider.
- the user has to remember different user credientials for the IP and the SPs.
- The identity provider has to follow a cumbersome process of disconnecting the access to the SPs whenever the user goes out of the company or his roles changed.

How its solved.

Typically, the user requests access to the Service (SP), the SP redirects him to the Identity provider (IP), the user gets his userid and password validated from the Identity provider.
The Identity provider provides a token (a file which contains user information - but not password) to the service provider.
The service provider now knows that its a valid user and allows him to access the service.

Now using this approach, the SP dont have to maintain password, the user has to remember only one password, and its easy for IP to manage user information and he has control over the services.

Now what is SAML:
SAML is the token we talked about. Its the XML based data format for exchanging user information between the Identity provider and the service provider.

When the user is authenticated by the IP, the user can then access the services with the token provided by the IP.

How long can the user access the service with the token provided by IP, what if the user goes out of the company.
The token issued by the IP also has the expiry time. Whenever the SP receives the token, it is valid only till the time, after that the service provider redirects back to the identity provider. The user has to get another token with a revised expiry time on the token.

What does SAML describe:
- describe the structure of the data.
- explains how the data should get transported between the IP and SP

How does communication happens between the IP and the SP:

In the above picture you can see the sequence flow how the user is authenticated and he is allowed to access the resource in SP.

Friday, January 1, 2016

AWS Glacier - Overview

Glacier storage service is meant for storing archival data. It is best suited for data that is not frequently accessed.

It is priced very low when compared to S3. Accessing the object in glacier is very slow - for example, the checkout job can take several hours for data to be available.
Integrates will with S3 life cycle.
With glacier you dont have to worry about the server provisioning, you can store as much data as you want - as usual.

A single archive can be as large as 40 TB.

You can configure glacier to be like "Write once read many" to prevent losing data by mistake.
You can also set the data retrieval policies such as maximum retrieval rate. you can also see which users have accessed the vault over the last one month, or identify who deleted the data.

You can use AWS console or use the AWS SDK APIs to store data into Glacier.
you can also setup policies in S3 so that every month data is archived back from S3 to Glacier. AWS Import / export product can be used to accelerate moving large amount of data.

Pricing is based on how much data you are planning to store and how much time you are planing to retrieve. Data transfer costs and PUT requests also matters but its low.

For example, if you want to store 100GB of data with no retrieval, it might approximately cost 7 cents per month.
You can use the AWS calculator to calculate how much it is going to cost you based on your data requirements.

Amazon S3 - overview

S3 is a storage product from AWS to help users store documents such as pdfs, pngs and any other files. An object in S3 can be called as a single file.

Characteristics:
- an object can be up to 5TB.
- you can encrypt data before saving into S3 and decrypt before downloading into your desk.
- supports unlimited data upload and multi-threaded uploads.
- High availability: Automatically stores data in multiple zones within your region.
- you can add policies to a bucket, for example you can say after 2 months of data upload, you can configure to move the data to another storage such as glacier.
- supports versioning of objects within S3, which means when you accidentally delete an object, you can recover by accessing old versions of the object.
- secured storage, which means you can be sure that only you can store or set priveleges that only you can access the data.
- you can store as much data you want, it will scale based on your needs.

What operations you can perform using S3.
- CRUD operations on Bucket.
- you can choose your region/zone you want to store the data and use S3 APIs to store the data.
- you can send event notifications whenever someone uploads a data into an bucket.
- Can be integrated with other AWS services such as "whenever an object is loaded into a bucket, you can send the data into Kinesis".
- use S3 as a backup storage or host static files for your web page.
- perform disaster recovery of your systems by storing backup images in S3.
- host static web sites or files.

S3 pricing:
- Storage cost: You pay approximately $0.03 for every GB for the first 1 TB
- Request pricing:
you pay $0.004 per 10,000 requests for all GET requests.
You pay $0.005 per 1,000 requests for all PUT, COPY, POST requests, delete requests are not priced.
- Data transfer pricing:
Data transfer in to S3 is not charged
Data transfer OUT to same region is not charged.
Data transfer OUT to another region or over to internet is charged, first 1 GB transfer is free, after that it will be charged at 9 cents per GB.

You can use the AWS calculator to approximately determine how much it will cost for the services you choose based on the data you provide.
http://calculator.s3.amazonaws.com/index.html

Full pricing information can be found here - https://aws.amazon.com/s3/pricing/

How do you operate on S3:
- use the AWS console to operate on S3
- use AWS SDK, supports different languages such as Java, .net, python, php, nodejs, ruby. You can also use AWS mobile SDK to directly upload data into S3 from the mobile.

S3 FAQ: https://aws.amazon.com/s3/faqs/