Introduction

Q. Why Spark?

  • Quite faster in computation than MapReduce, as MapReduce is using Disk IO where Spark does computation with-in memory data most of the time
  • MapReduce is very slow when Graph processing or implementing iterative algorithms
  • Spark programming can be done in a functional way which is quite modularized and handy for programmers
  • Spark simpler APIs for Streaming, Batch Processing, ad-hoc query engine, Machine Learning, Graph Processing etc. So there is no need to learn other specialized frameworks.
  • Writing Spark Application is very simpler as line of code is reduced with compare to MapReduced

Q. Why iterative processing is slow in MR?

Every time when MapReduce job is executed it reads data from HDFS (eventually from disk) and writes output on HDFS (eventually on disk). And in case your job needs such multiple iterations, it will be very slow due to Disk IO at every iteration.

In case of Apache Spark, it keeps the output of your previous stage in memory for that in next iteration it can be retrieved from memory which is quite faster than Disk IO.

Driver

The Driver Program which is part of a Spark Application launches the Application into Spark Cluster

Spark Context:

In Spark, we access the cluster through object of type SparkContext.

results matching ""

    No results matching ""