The situation is that: there’s one of our work need to be benchmarked. It’s still the SparkSQL Server that I designed and developed but new sharing technique was implemented and integrated into it. So, we decided to use the spark-sql-perf of Databricks to benchmark our work, and the benchmarked queries we used is TPCDS.
After 6 months working, I've just defensed my master thesis at the middle of September. The system works properly and efficiently. So sorry since there wasn't any posts during July to August. As the previous posts, I had done with the design of my system, the rest that I need to do is implementing them. The … Continue reading [Implementation]Simultaneous Pipeline technique of MRShare
After a lot of discussions, we decided to change the design of the system a little bit, so it can be more general and extensible. The new design is described as the figure above. The WorkSharing Detector remains the same as the old design. Its goal is generating bags of DAGs which are labeled with … Continue reading [SysDeg] Worksharing Framework and its design: Some modifications
Long time no see! After one month playing with caching in Spark, I learned many valuable lessons (which will be posted on other blog posts, about Cache Manager and Block Manager of Spark). Our team came back to the design of the system - spark SQL server. To be honest, i spent too much time … Continue reading [SysDeg] Worksharing Framework and its design – Part 3: Prototype for the first version
After having a basic understanding about Spark and SparkSQL, I came back to my system. The high level design of the system remains the same as I described two months ago. It is a client-server model, but the server is changed from the Spark server to the SparkSQL server. I spent roughly two weeks for some coding … Continue reading [SysDeg] Worksharing Framework and its design – Part 2: Communication method