Spark has several deploy modes, this will affect the way our sparkdriver communicates with the executors. So, I want to say a little about these modes.
Through Spark Deploy Mode document, we know that Spark has three deploy modes:
- Standalone: without having Yarn or Mesos, you can run your own cluster by starting a master and workers manually. It looks like the same way when I played with Hadoop 1.
- Yarn: Yarn becomes the cluster manager in this mode and of course we have something call Spark Application Master. This is popular as many frameworks run on top of Yarn.
- Mesos: the same as standalone mode, except that the cluster manager, which was Spark master, is replaced by Mesos. I know few about Mesos, hopefully I have a chance to play with it later on.
Each mode also has two modes. For standalone and Yarn, we have client mode and cluster mode.
For Mesos, we have coarse-grained and fine-grained mode.
|Yarn / Standalone|
|Driver is launched in the same process that submitted job.||Driver is launched in a worker or Application Master in the cluster.|
|Need to wait to get the result when job finishes||Can quit without waiting for getting job results|
|Used for interactive job (shell)Good for debugging, testing||Cannot get job result back to clientUseful for running long job|
|each spark task runs as a separate mesos task||only one long-running spark task on each mesos machine, tasks will be scheduled inside this|
|useful for sharing||no sharing, utilizing cluster|
|overhead at launching task||lower startup overhead|
|good for interactive job||good for batch job|
It’s a little bit messy but if you see the figures I made below for two modes of Yarn, it becomes easier and clearer.
Standalone mode is very useful for debugging and testing. With 2-3 virtual machines in your own machine, you can see Spark in action quite well. If you need more details about these modes and their configuration, go to Apache Spark document. 🙂