Wednesday, April 22, 2015

Implementing SSL

SSL Basics

SSL & TLS are technologies which allow web browsers and servers to communicate over a secured connection (i.e., the data sent is encrypted by one side, transmitted and then decrypted on the other side before processing. 

Client Authentication: the server may also request a Certificate from your web browser, asking for proof that you are who you claim to be

CERTIFICATE: In order to implement SSL, the web server must have an associated CERTIFICATE for each external IP address. 
  • You can think of Certificate as a digital driver’s licence of a  web server. 
  • It states what company the site is associated with, basic information about site owner or administrator. 
  • CERTIFICATE is extremely difficult for anyone else to forge. 
  • CERTIFICATE is issued by well known Certificate authority 9CA) such as VeriSign, Thawte or Symantec. 

Overall process: 

  • Siteowner purchases the certificate from a CA. 
  • Site owner configures (or) installs the certificate in the web server. 
  • The end user from a browsers when he attempts to access a secured page in your site, sees the certificate and asked if he wishes to accept the certificate as valid and continue on the transaction
  • The data is then encrypted and sent to the server. 
  • Server decrypts the data and process it. 

What is self signed certificate: 
  • are user generated certificates which are not officially registered with any well-known CA
  • they are not guaranteed to be authentic at all. 
  • This may or may not be important to you - depending on your needs. 
A web site which involves credit card transaction might want a popular vendor like Verisign to provide the certificate to attract customers whereas a site which displays recipie might go for a normal vendor. 

Self signed certificate is normally used for testing purpose. 

Tuesday, April 21, 2015

AWS Storage services

AWS provides storage services so that the application users can store data using services provided by AWS:

There are different types of Storage supported by AWS: 
  1. Instance storage
  2. Elastic Block Storage (EBS)
  3. Secured Storage Service (S3)
  4. Glacier
  5. Elastic File system

Instance storage: 

The data is stored directly in the instance, if the instance dies, the data also dies. 

Elastic Block storage (EBS)

If you want the data to be persisted irrespective of the instance life cycle, then you can go for EBS option. 

There are 3 different volume types in EBS: 
  • Magnetic - still around (based on magnetic disks)
  • General Purpose - default one
  • Provisioned IOPS - premium offering, extremely fast. 

You can think of EBS as a external hard drive where data will be available even if the system is switched off. Similarly you can create an EC2 instance, Choose EBS option and choose size and then attach to the EC2 instance. 

Secured storage service (S3): 

S3 is a storage as a service solution from Amazon which the users can use to upload the content and download using REST APIs of AWS. S3 data can be replicated across different region. You pay less for the data you use. 

Glacier: 

Glacier is similar to S3 in terms of storage as a service, but it is meant for people to store archival data where the reads are not frequent. You can store large volume of data and you pay extremely less for this. But you might need wait for hours to retrieve the data. 

Elastic file system (EFS): 


EFS is a new file storage service from Amazon where the storage capacity is elastic, growing and shrinking automatically  as  you add / remove files. So the application have the storage they need. Multiple EC2 instances can access an Amazon EFS system. 

AWS Glacier - Basics

What is Amazon Glacier ?


Amazon glacier is another storage solution from Amazon. Its a secure low cost storage service mainly used for data archiving and backup. 

Glacier is optimised for infrequently accessed data where the retrieval time is slower, whereas S3 is meant to access frequently accessed data. A single archive file can be as large as 40 TB. 

Characteristics: 

  • extremely low cost
  • meant for storing archival data where retrieval is not frequent. 
  • Secure: supports data transfer over SSL. 
  • Durable: Your data is stored redundantly across multiple facilities. 
  • Audit logs on the glacier data, who accessed what data. 

Uploading data to AWS: 

  • AWS provides SDKs which help you in transferring large amount of data. 
  • If you want to transfer large amount of data from your corporate network, you can use AWS Import / Export feature and AWS Direct Connect. 

Downloading data from Amazon: 

  • Downloading a data is asynchronous operation. First AWS glacier prepares your archive which takes hours and then you have 24 hours to download the data from the staging location. 

How pricing works: 

You normally pay for the following
  • Data storage (appxt. $0.01 per GB)
  • Request
    • Upload and retrieval request: $0.05 $ per 1000 requests. 
  • Data transfer

Please find the up to date pricing information in this URL: 


More about Glacier can be found in this URL: 

AWS S3 basics - Getting started

What is AWS - S3 


S3 is storage solution from Amazon (Secured storage service) where you can store documents such as images, videos, machine images and other documents - With S3 - you pay for the storage and the bandwidth. S3 is very reliable and 99.99% available. 

With S3, you might want to store static content of a web site or data generated out of a project in a storage service so that it is available globally. S3 is often used with other AWS products such as EC2. 

Its very easy to use. Following are different ways by which you can store and retrieve data. 
  • Using Command line interface. 
  • using AWS console. 
  • Using AWS SDK. 

Characteristics: 

  • Data is stored as objects and objects are stored as folders called buckets. 
  • You may store as many objects as you want in a bucket. 
  • You can read, upload and delete objects in your buckets. 
  • You can control access to the bucket. 
  • Supports versioning of data objects. 

Main features: 

  • Cross region replication. 
  • Event notifications (whenever an object is placed in S3) can be sent to SQS, lambda etc., 

How pricing works in S3: 

  • With S3, you pay only for the storage you use. There is no min fee or setup cost. As of Apr 2015, you pay 3 cents per GB for the first 1TB / month.  
  • You also pay for the PUT, POST, COPY and GET requests. 
      • GET: $0.004 per 10,000 requests. 
      • PUT, POST: $0.005 per 1000 requests. 
  • No charges for delete requests.
  • You also need to pay for data transfer pricing: 
The price keeps changing every now and then, visit this page to get the updated pricing: 

Get started: 

  • Signup for AWS account. 
  • Go to AWS console - S3 options.  
  • Create folders or buckets. 
  • Upload objects to buckets. 
  • Assign permission to buckets. 
  • Create events for buckets. 
  • Programming with S3

AWS provides SDKs for Java, PHP, .net, Python, Node.JS, Ruby and AWS Mobile SDK. You can use these apis to create documents and retrieve documents.

Visit S3 home page for more information 
http://aws.amazon.com/s3/

Sunday, April 19, 2015

AWS Lamdba - Basics

What is Lambda: 

AWS lambda is a service that runs your code which respond to events without you managing the infrastructure (You don’t have to worry about the server at all). It responds to events such as image upload, web site click, 

What is Lamda function: 

The code (i.e., function) you run on AWS lamda is called a lama function. They are stateless. You can write your code on top of services such as S3, dynamoDB table, kinesis or SNS notification. 

What language should I learn: 

  • Javascript. 

How is lambda function called:

Lambda function is called using 2 approaches: 
  • Push: 
External system asynchronously triggers lambda to do something. 

  • Pull: 
Lamda looks for data stream changes and triggers it. Example could be it could see a change in the data in dynamodb and kinesis. 

You can also invoke lambda functions using AWS API or using the command line. 


Features: 

  • Run your code without managing infrastructure. 
  • Respond to events quickly, within milliseconds of an event. 
  • thousands of functions can run in parallel. 
  • Runs your code only when needed. 
  • Cost effective and efficient - charges a low fee per request
  • Automatic scaling: (no limit on the no. of request your code can handle).
  • Runs within milliseconds of an event. 
  • You can also choose the amount of memory your lambda function can handle. 

Where is lamda normally used: 

  • you can watch for a pattern change and trigger an alert. (i.e., from data stream updates from Kindesis or DynamoDB). 
  • perform nightly archive cleanups. 
  • can be triggered from events from connected devices, you can create an SNS notification when a smart thermosta 

OK, where do I write code: 

You can write the function code directly in the AWS lambda console or you can upload the code. Uploading will be required mainly if you your code depends on some libraries and you want to upload the code with the library. 

How do I write my code in local and upload.

  • Install required libraries by using npm install commands. 
  • Create Javascript code. 
  • Zip the entire folder which includes the code and the libraries. 
  • Attach a role to the function (can do this in the console. This is required since you might need permission to use AWS services (such as read/write to an image from S3). 

Ok, How do I test my event: 

Lamda console provides option to write test event code and test the function. 

Hmmm, how do I know how many requests has been made. 
Lambda provides a console where you can see the request count, request duration and execution error count. 

Limitations: 

  • Only available runtime is NodeJS
  • Need to be a javascript programmer. 
  • Only standard AWS events are there - S3, Kinesis, DynamoDB. 
  • Debugging can be tricky. 


Costing: 

  • You are charged based on the no. of requests and the time taken by the code to execute. 
  • Free tier includes 1M free requests per month. 
  • $0.2 per 1 million requests thereafter. 
  • The price also depends upon the time taken to execute and how much memory it uses. 
  • You are charged $0.00001667 for every GB second. 

Some interesting links: 

Home page: https://console.aws.amazon.com/lambda/home?region=us-east-1#/

Saturday, April 18, 2015

AWS Beanstalk basics

What is AWS Beanstalk:

                AWS elastic beanstalk is an AWS product to manage and deploy web application in Amazon infrastructure. Beanstalk manages the application stack for you, so you dont have to worry about spend the time managing and configuring servers, databases, load balancers firewalls and networks. Beanstalk also takes care of autoscaling You can use the following languages, Java .Net, PHP, NodeJS, Python, Ruby etc.,
                Once you configure to deploy, AWS takes care of capacity provisioning, load balancing, autoscaling and health monitoring.
                Beanstalk depends on other AWS services such as AWC EC2, Autoscaling, ELB, S3 etc., YOu can use Cloudwatch to monitor the performance of the infrastructure.
How do I get started.
You can use the AWS Elastic Beanstalk CLI or Beanstalk API. AWS also provides Toolkits

Features


  • Directly deploy applications from your local desktop to cloud using AWS CLI or eclipse plugins.
  • Monitor performance metrics of the instances using Cloud watch
  • Autoscale applications using Elastic load balancing features.
  • Access server log files without logging into the server
  • Option to choose the instance server configuration such as memory size, disk space etc.,
  • Implement HTTPS on the load balancer in an easy way.

  • Java applications using Apache tomcat.
  • PHP using Apache Http server.
  • Phython apps using Apache http server.
  • NODEJS applications
  • Ruby applications
  • .net applications using Microsoft IIS server.

AWS is working to extend to support multiple development stacks and programming languages.

  Cost


.               There is no additional charge for Elastic Beanstalk - you pay only for the AWS resources needed  to store and run your applications.You normally pay for the services such as EC2, S3, load balancing etc., 

Friday, April 17, 2015

Cassandra basics


What is Cassandra: 


Cassandra is a highly scalable open source NOSQL db. Cassandra is designed to handle big data workloads across multiple nodes with no single point of failure. Each node exchanges information across the cluster every second.

Characteristics: 
  • Distributed in nature. it is based on masterless architecture, 
  • Every single node is independent. It shares nothing with the other node. Each node is responsible for a portion of the dataset. If you need more capacity, add more nodes. 
  • Fully replicated. Client writes in local, data synchronises across the other nodes. 

What happens when you write or read to Cassandra. 
  • A sequentially written commit log on each node captures write activity to ensure data durability. 
  • Data is then indexed and written to an in-memory structure, called a memtable, which resembles a write-back cache. 
  • Once the memory structure is full, the data is written to disk in an SSTable data file.
  • Client read or write requests can be sent to any node in the cluster. When a client connects to a node with a request, that node serves as the coordinator for that particular client operation. 
  • The coordinator acts as a proxy between the client application and the nodes that own the data being requested. 
  • The coordinator determines which nodes in the ring should get the request based on how the cluster is configured.

Key terminologies: 
  • Node: where the data is stored. 
  • Data center: Collection of nodes. 
  • Cluster: contains one more more data centre, can span physical locations. 
  • Commit log: data written to commit log for durability. 
  • SSTable: Sorted string table which are append only, and stored on disk sequentially. 

Key components: 
  • Gossip: Peer to peer communication protocol which nodes use to discover and share information about other nodes in the cluster. The gossip process runs every second and exchanges state messages with up to three other nodes in the cluster. So all nodes learn about all other nodes in the cluster. 
  • Partitioner: determines how to distribute data across the nodes in the cluster, each row of data is identified by a partition key and distributed across the cluster. 
  • Replication factor: determines how many copy of data is stored in the cluster.  

——————————————————————————————
Installing
——————————————————————————————