Saturday, July 11, 2015

Amazon Kinesis Overview

What is Kinesis: 
Kinesis is a managed cloud service from Amazon to process real time data over large distributed data streams. The data can be captured from variety of stores such as web site click streams, social feeds, financial data transactions, sensor data, IT logs and location tracking events. 

Features: 
  • Manage high volume stream of data that comes into Kinesis. 
  • Can emit data to other AWS services such as S3, redshift, Lamda etc., 
  • Client libraries that support different languages such as Java, Python, Ruby etc., 
  • Durable: preserves the data for 24 hours.  
  • Elastic, highly scalable. 
  • You can also use Kinesis connector library to integrate Kinesis with other AWS services such as S3, Redshift, DynamoDB etc., 
  • Parallel processing: you can have multiple kinesis app processing the same stream concurrently. 
  • Replicates your data into 3 facilities in AWS region
Architecture: 

What is a shard: is a base throughput unit of AWS Kinesis stream .
  • 1 shard is capable of accepting 1MB / sec data input 
  • 1 shard is capable of 2MB / sec data output
  • 1 shard is capable of accepting 1000 PUT records per second. 
When you create a data stream, you have to tell how many shards you might need. You can dynamically add or remove shards from your stream as your data throughput changes via retarding. 


A partition key is used to route data records to different shards. This key is specified by the producer while placing data into the Kinesis. 


Main steps in Kinesis development includes:
  • Creation of the Data stream: You can use Kinesis console to create a data stream. 
  • Pushing data to the stream: There are different ways to send data: 
    • HTTP Post
    • AWS SDK
    • AWS Mobile SDK
  • Receiving data from the stream, processing and storing values to the output database. You can 
    • Receive data using the Kinesis Client Library (KCL)
    • Processing: 
      • Transformer: Necessary data conversion is performed using this component. 
      • Filter: Removal of data and validation is performed 
      • Buffering is performed in this component. 
    • Storing output data:
    • You can store in S3, Redshift or Glacier using Kinesis connector libraries. 
    • You can also store in the preferred NO-SQL DB you want. 

You can also use Kinesis Storm spout (a pre-built library that helps you easily integrate Kinesis with Apache Storm


Limitations: 
  • Records will be maintaining up to 24 hours from the time they are added to the stream. 
  • Max size of a data blob is 1MB
  • Each shard can support up to 1000 PUT records per second.

Your pricing and evaluation depends on the following questions: 
  1. How much data your application will put into Kinesis. 
  2. What is the frequency level in which it will put. 
  3. What is the maximum and minimum size of the data it will generate for each record. 
  4. Will the data be immediately consumed by the consumer. What is the maximum wait time in the queue. 

Pricing: 
  • SHARD hour
    • 1 shard costs $0.015 per hour, so assuming you are running 1 shard 24 * 7, it will cost you 11.16$ for one shard usage cost. 
    • If you are using 4 shards it will cost you around 44$ a month
  • PUT payload unit
    • 1 million PUT payload units costs $0.014
    • 1 PUT is 25 KB chunk. 

Please visit this link to get the latest pricing details as Amazon keeps changing the price details every now and then. 

PS: 
  • Amazon does not charge for data transfer from Kinesis to AWS Kinesis consumers. 

Why Kinesis:
  • Kinesis manages the infrastructure, storage, and configuration need to stream your data. 
  • Preserves data for 24 hours so that you don’t lose them - it replicates across 3 facilities, elastic and scalable. 
  • You don’t have to worry about provisioning or on-going maintenance of hardware. 

What else you can do:
  • You can integrate with Cloudwatch to view and analyse reporting data to see how your streams behave. 
  • Integrate with IAM to ensure only certain user group is able to put message in the Kinesis. 

Applications of Kinesis: 
  • Social data processing. 
  • Processing Sensor data from devices. 
  • IT log processing
  • Processing Gaming data feed etc., 


1 comment:

  1. your post is the very organized way and easily understandable. Doing a good job. Thank you for sharing this content. aws training in omr | aws training in velachery | best aws training center in chennai


    ReplyDelete