I sketch here the components inside Spark and their dependencies so you can have a general overview of Spark.
Each component is in charge of a particular function (of course). Straightforwardly, you can understand most of the components and their functions. I just explain some components that “not easy” to understand.
– repl: the interractive shell for spark, like spark-shell, pyspark…
– bagel: Spark implementation of Google’s Pregel graph processing framework. It will be replaced by GraphX.
– catalyst: a query optimization framework for Spark.
Next “ARCH” posts will focus on the most important components of Spark-Core.