[SysDeg] Worksharing Framework and its design: Some modifications

sparksql-server-system-design-3

After a lot of discussions, we decided to change the design of the system a little bit, so it can be more general and extensible. The new design is described as the figure above.

The WorkSharing Detector remains the same as the old design. Its goal is generating bags of DAGs which are labeled with sharing-type or no-sharing.

The Cost Model is a new component. It provides an interface so that users can plug their own cost model into the system (User defined cost model).

With a bag from the output of the previous and a cost model associated with its sharing type, the Optimizer component will do the job: pick the plan with the lowest cost.

The best plan will be the input of the Rewriter. The rewriter will know which rule it should apply to the plan to get the transformed plan. The rule is also plugable and users can add their own for sure.

Let me take an example with sharing scan, the simplest form of work sharing.

From the WS Detector, we got bag1 which is labeled with scan-sharing and bag2 with no-sharing. In bag1, assume we have {J1,..,Jn} jobs inside.

The cost model should compute:

  • T(J): cost of running all jobs without packaging/caching.
  • From {J1,…Jn}, we can compute many combinations of those jobs, each combination forms a group Jg: grouped jobs, and J \ Jg: the other jobs that we don’t group. So, we need to choose the combination which T({Jg, J \ Jg}) <= T(J)
  • T(Jc): without grouping but caching.

The Optimizer then picks the best plan among those costs and passes to the Rewriter that information. The Rewriter picks an associated rule to generate the best plan.

Finally, the scheduler takes the role and submits the jobs to the cluster manager due to its scheduling strategy.

Note: our system will be on top of Spark, so we can utilize the caching advantage of the framework. Let’s look back at the second case that the cost model does in the above example. We got a plan which has grouped job Jg and the rest. From the cost model, we know that these J \ Jg jobs are not good to be packed together, so, can we use caching with them? Hopefully, I will have the answer in the next three months of my internship.

Advertisements

One thought on “[SysDeg] Worksharing Framework and its design: Some modifications

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s