[Arch] An overview of Spark components and their dependencies

I sketch here the components inside Spark and their dependencies so you can have a general overview of Spark. Each component is in charge of a particular function (of course). Straightforwardly, you can understand most of the components and their functions. I just explain some components that "not easy" to understand. - repl: the interractive shell … Continue reading [Arch] An overview of Spark components and their dependencies

[Sysdeg] Worksharing framework and its design – Part 1

As the previous post I mentioned, my internship will mostly focus on designing and implementing a worksharing framework for GROUPING SETS on Apache Spark. I will briefly discuss about the concept of the framework and its design in this post. Actually, many sharing (scan, computation) frameworks have been proposed in also traditional database systems and in … Continue reading [Sysdeg] Worksharing framework and its design – Part 1

My internship and some documents on Apache Spark

I heard about what I will do in my internship 6 months ago. Well, to be precise, it was right after I finished my summer internship. It is designing and building a worksharing framework (scan, computation)  for Pig queries - Hadoop MapReduce, which mostly focuses on GROUPING SETS operation. 6 months later, my internship still … Continue reading My internship and some documents on Apache Spark