The situation is that: there’s one of our work need to be benchmarked. It’s still the SparkSQL Server that I designed and developed but new sharing technique was implemented and integrated into it. So, we decided to use the spark-sql-perf of Databricks to benchmark our work, and the benchmarked queries we used is TPCDS.
Spark has several deploy modes, this will affect the way our sparkdriver communicates with the executors. So, I want to say a little about these modes. Through Spark Deploy Mode document, we know that Spark has three deploy modes: Standalone: without having Yarn or Mesos, you can run your own cluster by starting a master … Continue reading [Overview] Spark deploy modes
Pig is a high-level platform for creating MapReduce programs used with Hadoop. The language for this platform is called Pig Latin. Pig Latin abstracts the programming from the Java MapReduce idiom into a notation which makes MapReduce programming high level, similar to that of SQL for RDBMS systems. Pig Latin can be extended using UDF … Continue reading Some experiences on building my own Pig