Difference between revisions of "Spark on farm"
Jump to navigation
Jump to search
(Created page with "In order to launch a Spark cluster, follow these steps: 1) Deploy a Spark installation on your ~ 2) On a UI, start a spark master using this command and write down the maste...") |
|||
| (4 intermediate revisions by the same user not shown) | |||
| Line 14: | Line 14: | ||
cd ${TMPDIR} | cd ${TMPDIR} | ||
| − | + | ||
| − | PYSPARK_PYTHON=< | + | echo "spark.executor.extraJavaOptions -Djava.io.tmpdir=${TMPDIR}" > spark-defaults.conf |
| − | <SPARK_HOME>/bin/spark-class org.apache.spark.deploy.worker.Worker ${SPARK_MASTER} -c ${NUM_CORES} -m ${MEMORY} -d ${TMPDIR} | + | echo "spark.local.dir ${TMPDIR}" >> spark-defaults.conf |
| + | echo "spark.pyspark.python <PYTHON_VENV_EXECUTABLE_PATH>" >> spark-defaults.conf | ||
| + | |||
| + | PYSPARK_PYTHON=<PYTHON_VENV_EXECUTABLE_PATH> \ | ||
| + | <SPARK_HOME>/bin/spark-class \ | ||
| + | org.apache.spark.deploy.worker.Worker ${SPARK_MASTER} \ | ||
| + | -c ${NUM_CORES} -m ${MEMORY} -d ${TMPDIR} --properties-file spark-defaults.conf | ||
4) Submit the previous script to the farm | 4) Submit the previous script to the farm | ||
| − | qsub -q <QUEUE> -t 1 | + | qsub -q <QUEUE> -t 1-<NUM_WORKERS> <SUBMIT_SCRIPT> |
5) After a few minutes, workers should start registering to the master. | 5) After a few minutes, workers should start registering to the master. | ||
Latest revision as of 09:14, 27 April 2021
In order to launch a Spark cluster, follow these steps:
1) Deploy a Spark installation on your ~
2) On a UI, start a spark master using this command and write down the master url provided
${SPARK_HOME}/sbin/start-master.sh -h $(hostname)
3) Create a submit script like this one, substituting <VARIABLES>
SPARK_MASTER=<MASTER_URL>
NUM_CORES=1
MEMORY=4g
cd ${TMPDIR}
echo "spark.executor.extraJavaOptions -Djava.io.tmpdir=${TMPDIR}" > spark-defaults.conf
echo "spark.local.dir ${TMPDIR}" >> spark-defaults.conf
echo "spark.pyspark.python <PYTHON_VENV_EXECUTABLE_PATH>" >> spark-defaults.conf
PYSPARK_PYTHON=<PYTHON_VENV_EXECUTABLE_PATH> \
<SPARK_HOME>/bin/spark-class \
org.apache.spark.deploy.worker.Worker ${SPARK_MASTER} \
-c ${NUM_CORES} -m ${MEMORY} -d ${TMPDIR} --properties-file spark-defaults.conf
4) Submit the previous script to the farm
qsub -q <QUEUE> -t 1-<NUM_WORKERS> <SUBMIT_SCRIPT>
5) After a few minutes, workers should start registering to the master.
6) Create your SparkContext connecting it to the <MASTER_URL>
spark = SparkSession.builder.master(<MASTER_URL>).getOrCreate() sc = spark.sparkContext
7) Enjoy!