I assume that you’ve already read these documents about SparkSQL. Things that you should keep in your mind: DataFrame API: where relational processing meets procedural processing. Catalyst: extensible query optimizer which works on trees and rules, provides lazy optimization and is easy to extend/add a new rule. In this post, I will introduce to you … Continue reading [Arch] SparkSQL Internals – Part 1: SQLContext
Maybe you still remember the draft design of the system I proposed here. The reason why I delayed posting the part-2, which mostly focuses on technical details, because Spark is new for me so I need time to dig more into it. However, the design won’t be changed so much, I think. Come back to … Continue reading [Sysdeg] Moving to SparkSQL, why not?