[SysDeg] Worksharing Framework and its design – Part 3: Prototype for the first version

Long time no see! After one month playing with caching in Spark, I learned many valuable lessons (which will be posted on other blog posts, about Cache Manager and Block Manager of Spark). Our team came back to the  design of the system - spark SQL server. To be honest, i spent too much time … Continue reading [SysDeg] Worksharing Framework and its design – Part 3: Prototype for the first version

Advertisements

[SysDeg] Worksharing Framework and its design – Part 2: Communication method

After having a basic understanding about Spark and SparkSQL, I came back to my system. The high level design of the system remains the same as I described two months ago. It is a client-server model, but the server is changed from the Spark server to the SparkSQL server. I spent roughly two weeks for some coding … Continue reading [SysDeg] Worksharing Framework and its design – Part 2: Communication method

[Arch] SparkSQL Internals – Part 2: SparkSQL Data Flow

The paper of SparkSQL provides a very nice figure about SparkSQL data flow. I’ve had experiences on Apache Pig for more than one year so I realized that it is better to put them all together. I created a new figure that includes the data flow of Hive, Pig, and SparkSQL. I know a little … Continue reading [Arch] SparkSQL Internals – Part 2: SparkSQL Data Flow

[Overview] Spark deploy modes

Spark has several deploy modes, this will affect the way our sparkdriver communicates with the executors. So, I want to say a little about these modes. Through Spark Deploy Mode document, we know that Spark has three deploy modes: Standalone: without having Yarn or Mesos, you can run your own cluster by starting a master … Continue reading [Overview] Spark deploy modes