[Overview] Spark deploy modes

Spark has several deploy modes, this will affect the way our sparkdriver communicates with the executors. So, I want to say a little about these modes. Through Spark Deploy Mode document, we know that Spark has three deploy modes: Standalone: without having Yarn or Mesos, you can run your own cluster by starting a master … Continue reading [Overview] Spark deploy modes

[Arch] SparkContext and its components

When you work with Spark or read documents about Spark, definitely you will face SparkContext, which is inside the driver at client-side. This really made me confused and curious when I heard about it so I decided to dig into it. To summarize it in some words, I would say that SparkContext, in general, is … Continue reading [Arch] SparkContext and its components

[Sysdeg] Worksharing framework and its design – Part 1

As the previous post I mentioned, my internship will mostly focus on designing and implementing a worksharing framework for GROUPING SETS on Apache Spark. I will briefly discuss about the concept of the framework and its design in this post. Actually, many sharing (scan, computation) frameworks have been proposedĀ in also traditional database systems and in … Continue reading [Sysdeg] Worksharing framework and its design – Part 1

Some experiences on building my own Pig

Pig is a high-level platform for creating MapReduce programs used with Hadoop. The language for this platform is called Pig Latin. Pig Latin abstracts the programming from the Java MapReduce idiom into a notation which makes MapReduce programming high level, similar to that of SQL for RDBMS systems. Pig Latin can be extended using UDF … Continue reading Some experiences on building my own Pig