What is Storm
Storm is a distributed real time stream processing system. In this the processing work which normally takes lot of time to do the work is delegated to different components each responsible for performing a task.
Storm comprises of the following process at a high level.
- Input: The data is received from different sources by a component called spout. The sources could be file, messaging queues
- Processing: The processing is actually done by different components called bolts. The work can be done by one node or different nodes.
- Output: Once the data is processed, they can be stored in db, files
Advantages:
- See real time results while the storm components take care of processing it in high speed by utilising different nodes to process them.
Applications of Storm:
- Process real time data from different devices and analyse them quicker as and when the data flows into the system.
- Lively statistics.
- Build predictive models for real time data.
- Build monitoring and alerting systems.
Why Storm:
- simple to program
- support for multiple programming languages.
- fault tolerant: takes care of workers going down, reassigning tasks when necessary.
- Scaling: multi-node scaling options.
Operation :
- Local mode: run in a single machine
- Remote mode:
- we submit our topology to the storm cluster, composed of different process usually in different machines,
Nodes:
- Master node
- they run a daemon called Nimbus.
- Responsible for distributing code around the cluster
- Assign tasks to worker nodes
- Monitor failures
- Worker node
Types of grouping:
- Shuffle grouping.
- select the tuple emitted by the source to a randomly chosen bolt.
- useful for performing mathematical operations.
- not suitable for operations that cannot be randomly distributed.
- Field grouping
- control how tuples are sent to bolts, based on one or more field definitions.
- All grouping
- sends a single copy of each tuple to all instances of the receiving bolt.
- used to send signals to all bolts i.e., refresh a cache
- Direct grouping
- Global grouping
No comments:
Post a Comment