Spark on farm
Jump to navigation
Jump to search
In order to launch a Spark cluster, follow these steps:
1) Deploy a Spark installation on your ~
2) On a UI, start a spark master using this command and write down the master url provided
${SPARK_HOME}/sbin/start-master.sh -h $(hostname)
3) Create a submit script like this one, substituting <VARIABLES>
SPARK_MASTER=<MASTER_URL> NUM_CORES=1 MEMORY=4g cd ${TMPDIR}
PYSPARK_PYTHON=<PATH_TO_VIRTUALENV_BIN_PYTHON> \ <SPARK_HOME>/bin/spark-class org.apache.spark.deploy.worker.Worker ${SPARK_MASTER} -c ${NUM_CORES} -m ${MEMORY} -d ${TMPDIR}
4) Submit the previous script to the farm
qsub -q <QUEUE> -t 1-<NUM_WORKERS> <SUBMIT_SCRIPT>
5) After a few minutes, workers should start registering to the master.
6) Create your SparkContext connecting it to the <MASTER_URL>
spark = SparkSession.builder.master(<MASTER_URL>).getOrCreate() sc = spark.sparkContext
7) Enjoy!