[Overview] Spark deploy modes

Spark has several deploy modes, this will affect the way our sparkdriver communicates with the executors. So, I want to say a little about these modes.

Through Spark Deploy Mode document, we know that Spark has three deploy modes:

  • Standalone: without having Yarn or Mesos, you can run your own cluster by starting a master and workers manually. It looks like the same way when I played with Hadoop 1.
  • Yarn: Yarn becomes the cluster manager in this mode and of course we have something call Spark Application Master. This is popular as many frameworks run on top of Yarn.
  • Mesos: the same as standalone mode, except that the cluster manager, which was Spark master, is replaced by Mesos. I know few about Mesos, hopefully I have a chance to play with it later on.

Each mode also has two modes. For standalone and Yarn, we have client mode and cluster mode.

For Mesos, we have coarse-grained and fine-grained mode.

Yarn / Standalone
Client Cluster
Driver is launched in the same process that submitted job. Driver is launched in a worker or Application Master in the cluster.
Need to wait to get the result when job finishes Can quit without waiting for getting job results
Used for interactive job (shell)Good for debugging, testing Cannot get job result back to clientUseful for running long job
Mesos
fine-grained coarse-grained
each spark task runs as a separate mesos task only one long-running spark task on each mesos machine, tasks will be scheduled inside this
useful for sharing no sharing, utilizing cluster
overhead at launching task lower startup overhead
good for interactive job good for batch job

It’s a little bit messy but if you see the figures I made below for two modes of Yarn, it becomes easier and clearer.

Standalone mode is very useful for debugging and testing. With 2-3 virtual machines in your own machine, you can see Spark in action quite well. If you need more details about these modes and their configuration, go to Apache Spark document. 🙂

Advertisements

5 thoughts on “[Overview] Spark deploy modes

  1. Pingback: [Arch] Spark job submission breakdown | Quang-Nhat HOANG-XUAN

    • Hi Hoang-Mai, we have many systems for cluster management, Spark wants to support all kinds of those systems so it has many modes. For sure it can not support all the cluster management systems but I think Yarn, Mesos are the two popular systems. Standalone mode is very useful for testing/debugging or running on an isolated Spark cluster.

      Liked by 1 person

      • Nice. Actually, I am pretty interested in the Client mode and Cluster mode of Spark on YARN. As you said. the client mode is for debugging and testing, so does they usually use the cluster mode in practice? and can you give more advantages and disadvantages between client and cluster mode. In cluster mode, I see that the SparkClient can quit right away after it submits the job, so what is it for?

        Like

        • No, client mode is not for debugging and testing. It’s better to use standalone mode, even local mode instead.
          – Advantages/Disadvantages: think about overhead of resource management.
          – About client mode on Yarn, this is very useful if your job runs day by day and you need to do something else.

          Like

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s