Difference between revisions of "Spark on farm"

From Public PIC Wiki
Jump to navigation Jump to search
(Created page with "In order to launch a Spark cluster, follow these steps: 1) Deploy a Spark installation on your ~ 2) On a UI, start a spark master using this command and write down the maste...")
 
 
(4 intermediate revisions by the same user not shown)
Line 14: Line 14:
 
      
 
      
 
     cd ${TMPDIR}
 
     cd ${TMPDIR}
 
+
   
     PYSPARK_PYTHON=<PATH_TO_VIRTUALENV_BIN_PYTHON> \
+
    echo "spark.executor.extraJavaOptions -Djava.io.tmpdir=${TMPDIR}" > spark-defaults.conf
     <SPARK_HOME>/bin/spark-class org.apache.spark.deploy.worker.Worker ${SPARK_MASTER} -c ${NUM_CORES} -m ${MEMORY} -d ${TMPDIR}
+
    echo "spark.local.dir ${TMPDIR}" >> spark-defaults.conf
 +
    echo "spark.pyspark.python <PYTHON_VENV_EXECUTABLE_PATH>" >> spark-defaults.conf
 +
   
 +
     PYSPARK_PYTHON=<PYTHON_VENV_EXECUTABLE_PATH> \
 +
     <SPARK_HOME>/bin/spark-class \
 +
    org.apache.spark.deploy.worker.Worker ${SPARK_MASTER} \
 +
    -c ${NUM_CORES} -m ${MEMORY} -d ${TMPDIR} --properties-file spark-defaults.conf
  
 
4) Submit the previous script to the farm
 
4) Submit the previous script to the farm
  
     qsub -q <QUEUE> -t 1..<NUM_WORKERS> <SUBMIT_SCRIPT>
+
     qsub -q <QUEUE> -t 1-<NUM_WORKERS> <SUBMIT_SCRIPT>
  
 
5) After a few minutes, workers should start registering to the master.
 
5) After a few minutes, workers should start registering to the master.

Latest revision as of 09:14, 27 April 2021

In order to launch a Spark cluster, follow these steps:

1) Deploy a Spark installation on your ~

2) On a UI, start a spark master using this command and write down the master url provided

   ${SPARK_HOME}/sbin/start-master.sh -h $(hostname)

3) Create a submit script like this one, substituting <VARIABLES>

   SPARK_MASTER=<MASTER_URL>
   NUM_CORES=1
   MEMORY=4g
   
   cd ${TMPDIR}
   
   echo "spark.executor.extraJavaOptions -Djava.io.tmpdir=${TMPDIR}" > spark-defaults.conf
   echo "spark.local.dir ${TMPDIR}" >> spark-defaults.conf
   echo "spark.pyspark.python <PYTHON_VENV_EXECUTABLE_PATH>" >> spark-defaults.conf
   
   PYSPARK_PYTHON=<PYTHON_VENV_EXECUTABLE_PATH> \
   <SPARK_HOME>/bin/spark-class \
   org.apache.spark.deploy.worker.Worker ${SPARK_MASTER} \
   -c ${NUM_CORES} -m ${MEMORY} -d ${TMPDIR} --properties-file spark-defaults.conf

4) Submit the previous script to the farm

   qsub -q <QUEUE> -t 1-<NUM_WORKERS> <SUBMIT_SCRIPT>

5) After a few minutes, workers should start registering to the master.

6) Create your SparkContext connecting it to the <MASTER_URL>

   spark = SparkSession.builder.master(<MASTER_URL>).getOrCreate()
   sc = spark.sparkContext

7) Enjoy!