Cluster Architecture

How Job is executed on Spark Cluster?
- When driver submits a job, it sends the request to the YARN Resource manager.
- YARN resource manager checks for data locality and find the best available slave nodes for task scheduling
- Then job splits into different stages, each stage splits into tasks based on data locality and resources
- Prior to task execution, driver daemon sends necessary job details to each node
- Driver keeps track of currently executing task and updates the job monitoring status on master node (it can be checked with Master Node UI)
- Once job is completed, all the nodes share the aggregate values to the master node