Difference between revisions of "HTCondor"
(140 intermediate revisions by 2 users not shown) | |||
Line 1: | Line 1: | ||
= Introduction = | = Introduction = | ||
− | At PIC | + | At PIC HTCondor is the current production batch system replacing the old Torque/Maui environment. |
− | The aim of this document is to show how to submit jobs to the | + | The aim of this document is to show how to submit jobs to the HTCondor infrastructure to all of the non-grid users of the PIC batch system. In other words, this document is a guide to submit local jobs to the HTCondor infrastructure at PIC. |
− | + | This User Guide begins with a presentation of the batch system concepts in HTCondor in comparison with the old Torque/Maui ones. Then, there is a Quick Start section focused on the minimum knowledge needed to submit a job in an HTCondor pool. The remaining sections try to give a deeper approach in how to submit, monitor and remove jobs in PIC HTCondor pool. | |
+ | |||
+ | Furthermore, you can access to the HTCondor User Guide slides and a git repository with several examples. | ||
+ | |||
+ | * HTcondor User Guide Tutorial: https://docs.google.com/presentation/d/1-64fEcfLyxLzSpZH-tbV-SjMAc0CvMpWhE0I4oaDKkQ/edit?usp=sharing | ||
+ | * HTCondor Tutorial examples: https://github.com/PortdInformacioCientifica/htcondor-tutorial | ||
− | + | We recommend to look at the HTCondor User Manual [[https://htcondor.readthedocs.io/en/v9_0/users-manual/index.html 1]] if you want a deeper approach to the HTCondor concepts. | |
= Basic batch concepts = | = Basic batch concepts = | ||
Line 13: | Line 18: | ||
HTCondor does not work as other batch systems where you submit your job to a differentiated queue that has some specifications. It employs the language of ClassAds (the same concept of classified advertisements) in order to match workload requests and resources. In other words, the jobs and the machines have their particular attributes (number of CPUs, memory, etc.) and the central manager of HTcondor does the matchmaking between these attributes. | HTCondor does not work as other batch systems where you submit your job to a differentiated queue that has some specifications. It employs the language of ClassAds (the same concept of classified advertisements) in order to match workload requests and resources. In other words, the jobs and the machines have their particular attributes (number of CPUs, memory, etc.) and the central manager of HTcondor does the matchmaking between these attributes. | ||
− | Furthermore, | + | Furthermore, similarly to Torque/Maui, there is the concept of fair-share which aims at ensuring that all groups and users are provided resources as needed in correspondence to their respective quota (e.g. the Atlas T2 quota equals to 9% of our resources). The fair-share concept implies that your jobs and the jobs of your experiment will have a greater priority while they are agreed at or below the share, if you are consuming more resources than your share, then the next job with more priority should belong to another experiment. |
− | [[image: HTCondor- | + | [[image: HTCondor General Diagram-PIC.png]] |
− | When a user submits a job from the | + | When a user submits a job from the user interface, it is queued in the HTCondor queue system (schedd) and, according to its priority and its requirements, the job is assigned by the batch system's Central Manager (running collector and negotiator daemons) to be executed in a Worker Node (startd) that matches its requirements. Once the job has finished, files such as the job log, the standard output and the standard error are retrieved back from the Worker Node to the submit machine. |
+ | |||
+ | It has to be remarked that, as it happens for all the batch systems, there is a strong dependence between the resources you require and the time on the queue. In other words, single-core jobs needing 2 GB of RAM will find a match faster than multi-core jobs that need many cores or jobs that require a lot of memory. | ||
= Quick start = | = Quick start = | ||
Line 25: | Line 32: | ||
Before taking a deeper view in all the elements of the job submission, we will show you the basic commands for a quick start guide to HTcondor. | Before taking a deeper view in all the elements of the job submission, we will show you the basic commands for a quick start guide to HTcondor. | ||
− | In our old Torque/Maui environment, the user would log into a machine, prepare the input and submit jobs to a queue using qsub command. Now, in a very similar way, the user logs into | + | In our old Torque/Maui environment, the user would log into a machine, prepare the input and submit jobs to a queue using qsub command. Now, in a very similar way, the user logs into the User Interface, prepares a submit file, and then creates and inserts jobs into the queue using a condor_submit command. |
− | '' | + | ''Since November 2019, all the users can submit directly their jobs from the user interfaces.'' |
Next, you can find the basic skeleton of an HTcondor submit file (called test.sub in this case). | Next, you can find the basic skeleton of an HTcondor submit file (called test.sub in this case). | ||
Line 70: | Line 77: | ||
-- Schedd: submit01.pic.es : <193.109.174.82:9618?... @ 02/25/19 13:07:52 | -- Schedd: submit01.pic.es : <193.109.174.82:9618?... @ 02/25/19 13:07:52 | ||
OWNER BATCH_NAME SUBMITTED DONE RUN IDLE TOTAL JOB_IDS | OWNER BATCH_NAME SUBMITTED DONE RUN IDLE TOTAL JOB_IDS | ||
− | + | testuser ID: 281 2/25 13:07 _ _ 1 1 281.0 | |
Total for query: 1 jobs; 0 completed, 0 removed, 1 idle, 0 running, 0 held, 0 suspended | Total for query: 1 jobs; 0 completed, 0 removed, 1 idle, 0 running, 0 held, 0 suspended | ||
− | Total for | + | Total for testuser: 1 jobs; 0 completed, 0 removed, 1 idle, 0 running, 0 held, 0 suspended |
Total for all users: 1 jobs; 0 completed, 0 removed, 1 idle, 0 running, 0 held, 0 suspended | Total for all users: 1 jobs; 0 completed, 0 removed, 1 idle, 0 running, 0 held, 0 suspended | ||
</pre> | </pre> | ||
Line 87: | Line 94: | ||
-- Schedd: submit01.pic.es : <193.109.174.82:9618?... @ 02/25/19 13:10:10 | -- Schedd: submit01.pic.es : <193.109.174.82:9618?... @ 02/25/19 13:10:10 | ||
ID OWNER SUBMITTED RUN_TIME ST PRI SIZE CMD | ID OWNER SUBMITTED RUN_TIME ST PRI SIZE CMD | ||
− | 281.0 | + | 281.0 testuser 2/25 13:09 0+00:00:00 I 0 0.0 test.sh -c 1 -t 60 |
Total for query: 1 jobs; 0 completed, 0 removed, 1 idle, 0 running, 0 held, 0 suspended | Total for query: 1 jobs; 0 completed, 0 removed, 1 idle, 0 running, 0 held, 0 suspended | ||
− | Total for | + | Total for testuser: 1 jobs; 0 completed, 0 removed, 1 idle, 0 running, 0 held, 0 suspended |
Total for all users: 1 jobs; 0 completed, 0 removed, 1 idle, 0 running, 0 held, 0 suspended | Total for all users: 1 jobs; 0 completed, 0 removed, 1 idle, 0 running, 0 held, 0 suspended | ||
</pre> | </pre> | ||
Line 103: | Line 110: | ||
= Submitting your jobs = | = Submitting your jobs = | ||
− | After a basic view of how to submit a job, we are going to explain more details about the job submission, in particular about the options of the submit file. You can find detailed information about '''condor_submit''' and the characteristics of the submit file in the documentation [[ | + | After a basic view of how to submit a job, we are going to explain more details about the job submission, in particular about the options of the submit file. You can find detailed information about '''condor_submit''' and the characteristics of the submit file in the documentation [[https://htcondor.readthedocs.io/en/v9_0/users-manual/submitting-a-job.html 2]]. |
+ | |||
+ | First of all, it has to be mentioned that the only options that are mandatory in a submit file are the "executable" and the "queue" command. | ||
− | == Executable, input, arguments, outputs and logs == | + | == Executable, input, arguments, outputs, errors and logs == |
− | You can specify the | + | You must specify your executable and you can specify the input, output and error logs in your submit files as we have seen before: |
<pre> | <pre> | ||
Line 117: | Line 126: | ||
</pre> | </pre> | ||
− | Thus, you can specify the location of the input of your application, considering that HTCondor uses the input to pipe into the stdin of the executable. On the other hand, there is the output which contains the standard output (stdout) and the error which contains the standard error (stderr). The log file reports the status of the job by HTCondor. | + | Thus, you can specify the location of the input of your application, considering that HTCondor uses the input to pipe into the stdin of the executable. On the other hand, there is the output which contains the standard output (stdout) and the error which contains the standard error (stderr). The log file reports the status of the job by HTCondor. If any of these options is not defined in the submit file, HTCondor redirects standard output and standard error to /dev/null. |
− | In the log file, you can see the submission host, the node where the job is executed, information about the memory consumption and a final summary of the resources used by your job. Here you have an example of | + | In the log file, you can see the submission host, the node where the job is executed, information about the memory consumption and a final summary of the resources used by your job. Here you have an example of a typical log file: |
<pre> | <pre> | ||
Line 148: | Line 157: | ||
</pre> | </pre> | ||
− | + | The log file allows us to monitor our jobs using the '''condor_wait''' command. You will find more information about condor_q and condor_wait later in this document. | |
<pre> | <pre> | ||
Line 157: | Line 166: | ||
All jobs done.The command '''condor_wait''' is used to track the information in log file. This command will be explained in next sections. | All jobs done.The command '''condor_wait''' is used to track the information in log file. This command will be explained in next sections. | ||
</pre> | </pre> | ||
+ | |||
+ | '''IMPORTANT NOTE'''. The executable is mandatory but not the other elements. This is especially relevant for the HTCondor logfile which can introduce extra load in the shared file systems if you submit too many jobs at the same time writing a log. If you plan to not check the HTCondor log, you can skip it and query your jobs by condor_q commands. | ||
=== ClusterId and JobId === | === ClusterId and JobId === | ||
Line 166: | Line 177: | ||
input = input.txt | input = input.txt | ||
arguments = arg | arguments = arg | ||
− | output = | + | output = test-$(ClusterId).$(ProcId).out |
− | error = | + | error = test-$(ClusterId).$(ProcId).err |
− | log = | + | log = test-$(ClusterId).$(ProcId).log |
</pre> | </pre> | ||
Line 178: | Line 189: | ||
executable = test.sh | executable = test.sh | ||
arguments = -c 1 -t 60 | arguments = -c 1 -t 60 | ||
− | output = | + | output = test-$(ClusterId).$(ProcId).out |
− | error = | + | error = test-$(ClusterId).$(ProcId).err |
− | log = | + | log = test-$(ClusterId).$(ProcId).log |
+JobBatchName="MyJobs" | +JobBatchName="MyJobs" | ||
Line 200: | Line 211: | ||
-- Schedd: submit01.pic.es : <193.109.174.82:9618?... @ 03/25/19 11:27:30 | -- Schedd: submit01.pic.es : <193.109.174.82:9618?... @ 03/25/19 11:27:30 | ||
OWNER BATCH_NAME SUBMITTED DONE RUN IDLE TOTAL JOB_IDS | OWNER BATCH_NAME SUBMITTED DONE RUN IDLE TOTAL JOB_IDS | ||
− | + | testuser MyJobs 3/25 11:27 _ _ 2 4 740.0-1 | |
Total for query: 2 jobs; 0 completed, 0 removed, 2 idle, 0 running, 0 held, 0 suspended | Total for query: 2 jobs; 0 completed, 0 removed, 2 idle, 0 running, 0 held, 0 suspended | ||
Line 213: | Line 224: | ||
-- Schedd: submit01.pic.es : <193.109.174.82:9618?... @ 03/25/19 11:27:33 | -- Schedd: submit01.pic.es : <193.109.174.82:9618?... @ 03/25/19 11:27:33 | ||
ID OWNER SUBMITTED RUN_TIME ST PRI SIZE CMD | ID OWNER SUBMITTED RUN_TIME ST PRI SIZE CMD | ||
− | 740.0 | + | 740.0 testuser 3/25 11:27 0+00:00:00 I 0 0.0 test.sh -c 1 -t 60 |
− | 740.1 | + | 740.1 testuser 3/25 11:27 0+00:00:00 I 0 0.0 test.sh -c 1 -t 60 |
Total for query: 2 jobs; 0 completed, 0 removed, 2 idle, 0 running, 0 held, 0 suspended | Total for query: 2 jobs; 0 completed, 0 removed, 2 idle, 0 running, 0 held, 0 suspended | ||
Line 222: | Line 233: | ||
== Requests == | == Requests == | ||
− | You can request the cpu, disk and memory that your jobs need. This is done by | + | You can request the cpu, disk and memory that your jobs need. This is done by request_cpus, request_disk and request_memory options in your submit file. You can use units in your requests. |
<pre> | <pre> | ||
executable = test.sh | executable = test.sh | ||
args = -c 8 -t 60 | args = -c 8 -t 60 | ||
− | output = | + | output = test-$(ClusterId).$(ProcId).out |
− | error = | + | error = test-$(ClusterId).$(ProcId).err |
− | log = | + | log = test-$(ClusterId).$(ProcId).log |
request_memory = 4 GB | request_memory = 4 GB | ||
Line 247: | Line 258: | ||
Here you can see the report of the requested resources: the number of cpus, the memory in MB and the disk in KB. | Here you can see the report of the requested resources: the number of cpus, the memory in MB and the disk in KB. | ||
− | There are default values already defined: | + | There are default values already defined, so, if you do not use request_cpus, request_memory or request_disk in your submit file, your job will ask for these default values: |
* 1 cpu | * 1 cpu | ||
− | * 2 GB of memory per | + | * 2 GB of memory per CPU |
− | * 15 GB of disk per | + | * 15 GB of disk per CPU |
+ | |||
+ | The CPU and the memory needed for your job are the most important requirements, on the other hand, the disk means the local and temporal disk used by your job and can be left as the default value for the major part of the cases. | ||
+ | |||
+ | In case you think that your job requires too many cores, memory, or disk, do not hesitate to ask your Contact person at PIC. | ||
+ | |||
+ | ''Note than in request_memory or request_disk, M or MB indicates MiB, G or GB indicates GiB, etc. '' | ||
+ | |||
+ | === Multi-core === | ||
+ | |||
+ | The request_cpus option in your submit_file defines if your job is single-core or multi-core. Then, the slot is created in the WNs with the resources that satisfy your request. In other words, submitting multi-core jobs is as easy as indicating request_cpus=X where X > 1. Although you can ask for the number of slots you desire, note that our pool is better tuned for multi-core jobs of 8 cpus (therefore it will be easier to satisfy such requests, hence these jobs will remain in queue shorter). | ||
+ | |||
+ | === GPUs === | ||
+ | |||
+ | Right now, there are 18 GPUs at PIC. Since January 2024, the 2 machines with GeForce GTX 1050 Ti GPUs, gpu02 and gpu03, are not accessible directly to users from batch or Jupyter. | ||
+ | |||
+ | There are two other machines with GPUs at PIC. The gpu01 with 8 GPUs RTX 2080 Ti and the gpu05 with 8 GPUs V100. Gpu01 priority use is Jupyter while gpu05 priority use is the Magnesia group. You can access directly to any of these machines by using "request_gpus" option, however take into consideration that you will run in preemption mode, which means that if a higher-priority job from Jupyter or Magnesia needs the resources, your job will be killed, and put in the queue again. | ||
− | + | Example of submit file to run in gpu01 or gpu05: | |
+ | |||
+ | <pre> | ||
+ | executable = test.sh | ||
+ | output = test.out | ||
+ | error = test.err | ||
+ | log = test.log | ||
+ | |||
+ | request_cpus=1 | ||
+ | request_memory = 4GB | ||
+ | request_gpus=1 | ||
+ | |||
+ | queue | ||
+ | </pre> | ||
+ | |||
+ | If you want to run specifically on gpu01 or gpu05, you can add requirements to your submit file such as: | ||
+ | |||
+ | <pre> | ||
+ | Requirements = regexp("gpu01",Name) | ||
+ | </pre> | ||
+ | |||
+ | Doing this, your job will land on the desired machine. | ||
== Flavours == | == Flavours == | ||
− | The maximum walltime of your job can be specified using a '''flavour'''. There are 3 such flavours: short, medium and long. | + | The maximum walltime of your job can be specified using a '''flavour'''. There are 3 such flavours: short, medium and long. |
− | |||
* short: 3 hours | * short: 3 hours | ||
* medium: 48 hours | * medium: 48 hours | ||
Line 275: | Line 322: | ||
</pre> | </pre> | ||
− | If you do not choose any flavour explicitly, the flavour medium is the default which corresponds to 48 hours of walltime. | + | If you do not choose any flavour explicitly, the flavour medium is the default which corresponds to 48 hours of walltime. Please do not forget to use the "" or the flavour medium will be chosen. |
Once the job arrives at the time limit, it will be held and it remains in this status for 6 hours before being removed from the queue. Thus, the user can check the jobs held in the queue for 6 hours. You will find more information about the JobStatus in later sections. | Once the job arrives at the time limit, it will be held and it remains in this status for 6 hours before being removed from the queue. Thus, the user can check the jobs held in the queue for 6 hours. You will find more information about the JobStatus in later sections. | ||
− | The priority works in this order short > medium > long being the shortest jobs the ones with | + | The priority works in this order short > medium > long being the shortest jobs the ones with higher priority. |
+ | |||
+ | Moreover, the flavours strictly control the memory consumption of your jobs. Jobs that exceed 50% over the requested memory will be also held. | ||
− | + | If for some reason your jobs need walltime over the 96 hours, please ask your contact. | |
== Environment == | == Environment == | ||
Line 313: | Line 362: | ||
<pre> | <pre> | ||
$ cat output-128.0.out | $ cat output-128.0.out | ||
− | My HOME directory is: / | + | My HOME directory is: /home/testuser |
My Workdir is: /home/execute/dir_13143 | My Workdir is: /home/execute/dir_13143 | ||
My PATH is: /bin:/usr/local/bin:/usr/bin | My PATH is: /bin:/usr/local/bin:/usr/bin | ||
Line 319: | Line 368: | ||
</pre> | </pre> | ||
− | + | As you can see in the output, the $HOME directory is defined but there is not a $HOME/bin directory in the $PATH, although $HOME/bin is a directory added int the $PATH in the .bashrc of the user, HTCondor does not load the information of the .bashrc by default. Furthermore, the $SOFT_DIR variable is also empty. Taking into account that these variables, $PATH and $SOFTWARE are known and defined, for instance, in the .bashrc, we can submit the job in different ways, using the environment command or directly adding the needed exports in your script. | |
* 1) Using environment command | * 1) Using environment command | ||
Line 337: | Line 386: | ||
<pre> | <pre> | ||
$ cat output-129.0.out | $ cat output-129.0.out | ||
− | My HOME directory is: / | + | My HOME directory is: /home/testuser |
My Workdir is: /home/execute/dir_13420 | My Workdir is: /home/execute/dir_13420 | ||
− | My PATH is: /bin:/usr/lib64/qt-3.3/bin:/usr/local/bin:/usr/bin:/usr/local/sbin:/usr/sbin:/ | + | My PATH is: /bin:/usr/lib64/qt-3.3/bin:/usr/local/bin:/usr/bin:/usr/local/sbin:/usr/sbin:/home/testuser/bin:/home/testuser/bin |
My SOFTWARE directory is: /software/dteam/ | My SOFTWARE directory is: /software/dteam/ | ||
</pre> | </pre> | ||
Line 372: | Line 421: | ||
<pre> | <pre> | ||
$ cat output-130.0.out | $ cat output-130.0.out | ||
− | My HOME directory is: / | + | My HOME directory is: /home/testuser |
My Workdir is: /home/execute/dir_15903 | My Workdir is: /home/execute/dir_15903 | ||
− | My PATH is: /bin:/usr/local/bin:/usr/bin:/ | + | My PATH is: /bin:/usr/local/bin:/usr/bin:/home/testuser/bin |
My SOFTWARE directory is: /software/dteam | My SOFTWARE directory is: /software/dteam | ||
</pre> | </pre> | ||
Line 382: | Line 431: | ||
== Queue == | == Queue == | ||
− | Queue is the command in your submit file to select the number of job instances you want to submit. Basically, you can specify the number of jobs of your batch using queue N at the end of your submit file. This is a very powerful tool that allows users to submit several jobs in different ways using the same submission script | + | Queue is the command in your submit file to select the number of job instances you want to submit. Basically, you can specify the number of jobs of your batch using queue N at the end of your submit file, being N the number of jobs per batch. This is a very powerful tool that allows users to submit several jobs in different ways using the same submission script. |
− | |||
− | |||
− | |||
− | |||
− | |||
− | |||
− | + | === Matching pattern === | |
− | |||
− | |||
− | |||
− | |||
− | |||
− | |||
− | |||
− | |||
− | |||
− | |||
− | |||
− | |||
− | |||
− | |||
Another example, we want to submit jobs that match the filenames we have in our current directory. | Another example, we want to submit jobs that match the filenames we have in our current directory. | ||
Line 412: | Line 441: | ||
executable = /bin/echo | executable = /bin/echo | ||
arguments = $(filename) | arguments = $(filename) | ||
− | output = | + | output = test-$(ClusterId).$(ProcId).out |
− | error = | + | error = test-$(ClusterId).$(ProcId).err |
− | log = | + | log = test-$(ClusterId).$(ProcId).log |
queue filename matching files Hello* | queue filename matching files Hello* | ||
Line 441: | Line 470: | ||
</pre> | </pre> | ||
− | + | === From file === | |
We have an executable test.sh that can have two arguments -c $(arg1), -y $(arg2) and we want to submit 4 jobs using a different set of arguments. This can be done using queue ... from file. | We have an executable test.sh that can have two arguments -c $(arg1), -y $(arg2) and we want to submit 4 jobs using a different set of arguments. This can be done using queue ... from file. | ||
Line 448: | Line 477: | ||
executable = test.sh | executable = test.sh | ||
arguments = -c $(arg1) -t $(arg2) | arguments = -c $(arg1) -t $(arg2) | ||
− | output = | + | output = test-$(ClusterId).$(ProcId).out |
− | error = | + | error = test-$(ClusterId).$(ProcId).err |
− | log = | + | log = test-$(ClusterId).$(ProcId).log |
queue arg1,arg2 from arg_list.txt | queue arg1,arg2 from arg_list.txt | ||
Line 465: | Line 494: | ||
</pre> | </pre> | ||
− | + | === In list === | |
Similar that using ''from file'', you can specify your different elements in a list. The next example will submit 4 jobs, each one with the argument specified in the list. | Similar that using ''from file'', you can specify your different elements in a list. The next example will submit 4 jobs, each one with the argument specified in the list. | ||
Line 472: | Line 501: | ||
executable = test.sh | executable = test.sh | ||
arguments = -c 1 -t $(arg1) | arguments = -c 1 -t $(arg1) | ||
− | output = | + | output = test-$(ClusterId).$(ProcId).out |
− | error = | + | error = test-$(ClusterId).$(ProcId).err |
− | log = | + | log = test-$(ClusterId).$(ProcId).log |
queue arg1 in (10 15 5 20) | queue arg1 in (10 15 5 20) | ||
</pre> | </pre> | ||
− | + | === Multiple queue statements (deprecated) === | |
+ | |||
+ | <b>IMPORTANT NOTE</b>. Although this was the simplest and most used way to submit several jobs using queue, the HTCondor developers want to remove it and favor the other ways to use the queue command. For instance, the multiple queue statements system is not working anymore for DAGMan jobs (since HTCondor 23.0.6 update) and in the future will also not work with condor_submit. Hence, we do not recommend to use it. We keep it here to prevent users to use it. | ||
+ | |||
+ | We want to submit 2 jobs using the same executable, each one with their arguments and different cpus. | ||
+ | |||
+ | <pre> | ||
+ | executable = test.sh | ||
+ | |||
+ | output = test-$(ClusterId).$(ProcId).out | ||
+ | error = test-$(ClusterId).$(ProcId).err | ||
+ | log = test-$(ClusterId).$(ProcId).log | ||
+ | |||
+ | args = -c1 -t 60 | ||
+ | request_cpus = 1 | ||
+ | queue | ||
+ | args = -c 8 -t 60 | ||
+ | request_cpus = 8 | ||
+ | queue | ||
+ | </pre> | ||
+ | |||
+ | Thus, there will be two jobs submitted, the first one using 1 cpu and the second one using 8. | ||
+ | |||
+ | In conclusion, as you can see there are several ways to use queue command. You can find more examples in the documentation [[https://htcondor.readthedocs.io/en/v9_0/users-manual/submitting-a-job.html#using-the-power-and-flexibility-of-the-queue-command 3]]. | ||
== Transfer files == | == Transfer files == | ||
− | + | First of all, note that the standard output and standard error of your job will always be transferred back when the job finishes and here we understand finishing in all the ways: completed jobs, removed jobs and held jobs. However, all the other files generated in the scratch dir of the WN will not be transferred back if you do not specify which files. | |
− | + | To transfer the input file you should use input= or transfer_input_files= options in your submit file. | |
+ | |||
+ | At PIC, the option when_to_transfer_output is forced to be ON_EXIT_OR_EVICT, thus, the output files chosen would be transferred back when the job finishes in any state or when the job is evicted (held). | ||
+ | |||
+ | === transfer_output_files === | ||
+ | |||
+ | The next example shows a script called test-output.sh that writes in two files: | ||
<pre> | <pre> | ||
Line 491: | Line 549: | ||
for i in {1..10}; do | for i in {1..10}; do | ||
− | echo "Number $i" >> 1.txt | + | echo "Number $i" >> $_CONDOR_SCRATCH_DIR/1.txt |
sleep 2s | sleep 2s | ||
− | echo "Hello" >> 2.txt | + | echo "Hello" >> $_CONDOR_SCRATCH_DIR/2.txt |
done | done | ||
</pre> | </pre> | ||
− | will generate the file 1.txt and 2.txt in the scratch directory | + | This, after submitting this job, the script will generate the file 1.txt and 2.txt in the scratch directory. Note that the condor variable $_CONDOR_SCRATCH_DIR is really useful to be sure that you are pointing to the scratch directory of the node. '''This is very important''' if you need to recover any of the files generated in the $_CONDOR_STRACH_DIR of the WN, you should use the option ''transfer_output_files'' in your submit file. |
<pre> | <pre> | ||
Line 509: | Line 567: | ||
</pre> | </pre> | ||
− | With transfer_output_files you can decide the files you want to be transferred back. Using the same script test-output.sh that generates 1.txt and 2.txt, | + | With transfer_output_files command, you can decide the files you want to be transferred back. Using the same script test-output.sh that generates 1.txt and 2.txt, the 1.txt is the only file which is going to be transferred to the submit machine when the job finishes. |
+ | |||
+ | Unfortunately, transfer_output_files does not allow wildcards. Note that if you use a wrong filename, in other words, a file that is not really in the scratch directory, an empty file will be generated after job completion. | ||
+ | |||
+ | Thus, as wildcards are not allowed, the best way to transfer several output files is to use directories. Consider the next script: | ||
+ | |||
+ | <pre> | ||
+ | $ cat test-output-dir.sh | ||
+ | #!/bin/bash | ||
+ | |||
+ | cd $_CONDOR_SCRATCH_DIR | ||
+ | mkdir my_files | ||
+ | cd my_files | ||
+ | for i in {1..10}; do | ||
+ | echo "Number $i" >> ./1.txt | ||
+ | sleep 2s | ||
+ | echo "Hello" >> ./2.txt | ||
+ | done | ||
+ | </pre> | ||
+ | |||
+ | We are creating a local directory in our scratch dir called my_files and putting there the 1.txt and 2.txt files. Then, we transfer back the whole directory because we are using transfer_output_files=my_files: | ||
+ | |||
+ | <pre> | ||
+ | executable = test-output-dir.sh | ||
+ | output = test-out1-$(ClusterId).$(ProcId).out | ||
+ | error = test-out1-$(ClusterId).$(ProcId).err | ||
+ | log = test-out1-$(ClusterId).$(ProcId).log | ||
+ | transfer_output_files=my_files | ||
+ | queue | ||
+ | </pre> | ||
+ | |||
+ | In our submit directory, after the job finishes, we are going to find the directory with the files. | ||
+ | |||
+ | <pre> | ||
+ | $ ls my_files/ | ||
+ | 1.txt 2.txt | ||
+ | </pre> | ||
+ | |||
+ | Of course, if you only want one file from the created directory, you just need to use ''transfer_output_files=my_files/2.txt'' for instance. In that case, only the file will be transferred, the directory is not going to be created in the submit side. | ||
− | + | === transfer_output_remaps === | |
− | + | On the other hand, if you want that to transfer the output file to another directory, you should use ''transfer_output_remaps'' option. | |
− | |||
<pre> | <pre> | ||
− | $ | + | executable = test-output.sh |
+ | output = test-$(ClusterId).$(ProcId).out | ||
+ | error = test-$(ClusterId).$(ProcId).err | ||
+ | log = test-$(ClusterId).$(ProcId).log | ||
+ | transfer_output_files=1.txt | ||
+ | transfer_output_remaps="1.txt=outputs/1-$(ClusterId).$(ProcId).txt" | ||
+ | |||
+ | queue | ||
</pre> | </pre> | ||
− | The | + | The submit file above is going to transfer the 1.txt file from the scratch dir to ./outputs/1-$(ClusterId).$(ProcId).txt |
+ | |||
+ | === transfer_executable === | ||
+ | |||
+ | The Transfer executable mechanism defaults to false for all HTCondor jobs. This means that the executable is searched in the job destination, in other words, the WN. | ||
+ | |||
+ | If for some reason, this configuration does not fit your needs, you should add in your submit file: | ||
+ | |||
+ | <pre>+transfer_exec = true</pre> | ||
+ | |||
+ | Using this option, your executable will be transferred to the WN on the $_CONDOR_SCRATCH_DIR directory. | ||
+ | |||
+ | You can find a (rare) situation where the executable is not present in your user interface but you know for sure that is present in the WNs. Take into account that HTCondor always looks for your executable before submitting your job unless you explicitly add transfer_executable = false in your submit file. | ||
== Accounting Group == | == Accounting Group == | ||
− | The priority of your job is calculated depending on the Accounting Group you belong to. The common users do not have to worry about the Accounting Group, as it will be automatically taken considering | + | The priority of your job is calculated depending on the Accounting Group you belong to. The common users do not have to worry about the Accounting Group, as it will be automatically taken considering the primary group. |
Anyway, if you are in two groups and need to change your Accounting Group for any submission, you can add +experiment="experiment" option in your submit file. Thus, for instance, there is one user that has main group vip and secondary group virgo and want to submit a job that only accounts to virgo. | Anyway, if you are in two groups and need to change your Accounting Group for any submission, you can add +experiment="experiment" option in your submit file. Thus, for instance, there is one user that has main group vip and secondary group virgo and want to submit a job that only accounts to virgo. | ||
Line 531: | Line 645: | ||
executable = test.sh | executable = test.sh | ||
args = --cpu 1 --timeout 120 | args = --cpu 1 --timeout 120 | ||
− | output = | + | output = test-$(ClusterId).$(ProcId).out |
− | error = | + | error = test-$(ClusterId).$(ProcId).err |
− | log = | + | log = test-$(ClusterId).$(ProcId).log |
+experiment="virgo" | +experiment="virgo" | ||
Line 545: | Line 659: | ||
*atlas | *atlas | ||
+ | *bioinfo | ||
*co2flux | *co2flux | ||
*cta | *cta | ||
Line 554: | Line 669: | ||
*neutrinos | *neutrinos | ||
*paus | *paus | ||
+ | *uab-giq | ||
*vip | *vip | ||
*virgo | *virgo | ||
Furthermore, the automatic assignation of the Accounting Groups is also done for those groups that have dedicated User Interfaces (such as mic.magic, at3 and ui01-virgo). In that case, the Accounting Group is created according to the User Interface group owner. In other words, it does not matter the main group of the user, if I submit a job from mic.magic machine, my AccountingGroup belongs to magic. | Furthermore, the automatic assignation of the Accounting Groups is also done for those groups that have dedicated User Interfaces (such as mic.magic, at3 and ui01-virgo). In that case, the Accounting Group is created according to the User Interface group owner. In other words, it does not matter the main group of the user, if I submit a job from mic.magic machine, my AccountingGroup belongs to magic. | ||
+ | |||
+ | == Priority per user basis == | ||
+ | |||
+ | In your submit file you can also add a priority argument that can be any integer, being 0 the default. Jobs with higher priority will run before jobs with lower numerical priority. This works per user basis, in other words, it is a way to order the execution of your jobs. Thus, a user with many jobs can use this command to order the jobs but it has no effect on whether or not these job will run before another user's jobs. | ||
+ | |||
+ | Note that is recommended to use + and - to specify positive or negative priority value for your jobs. | ||
+ | |||
+ | This can be done also with ''condor_prio'' command [[https://htcondor.readthedocs.io/en/v9_0/man-pages/condor_prio.html 4]]: | ||
+ | |||
+ | <pre>condor_prio -p +10 52165.0</pre> | ||
+ | |||
+ | == Interactive submission == | ||
+ | As it happens with Torque/Maui, HTCondor also has the possibility to submit interactive jobs using the option "-i" or "-interactive". | ||
+ | |||
+ | <pre> | ||
+ | $ condor_submit -i | ||
+ | </pre> | ||
+ | |||
+ | You have to wait in queue and finally you enter inside a WN: | ||
+ | |||
+ | <pre> | ||
+ | $ condor_submit -i | ||
+ | Submitting job(s). | ||
+ | 1 job(s) submitted to cluster 52166. | ||
+ | Waiting for job to start... | ||
+ | Welcome to slot1_9@td730.pic.es! | ||
+ | You will be logged out after 7200 seconds of inactivity. | ||
+ | [testuser@td730 dir_23132]$ | ||
+ | </pre> | ||
+ | |||
+ | Notice that you dispose of 7200 seconds of inactivity. Other defaults are: | ||
+ | |||
+ | * 1 Core | ||
+ | * 512 MB of RAM (caution, not the 2048 MB for the other jobs!) | ||
+ | * Walltime of 48 hours (such as all jobs with medium flavour) | ||
+ | |||
+ | |||
+ | Anyway, you can use a submit file to launch an interactive job: | ||
+ | |||
+ | <pre> | ||
+ | $ condor_submit -i test.sub | ||
+ | </pre> | ||
+ | |||
+ | The session created in the node is affected by the same restrictions of cpu, memory, disk, etc. However, there are options of the submit file that has no sense in an interactive session: executable or arguments. | ||
+ | |||
+ | == Notifications == | ||
+ | |||
+ | Although HTCondor has an email notification system, we do not allow users to use this system on our cluster. Potential misuse of this feature can result in thousands of emails and security issues. | ||
+ | |||
+ | == Dagman == | ||
+ | |||
+ | HTCondor Dagman (Directed Acyclic Graph Manager) is a meta-scheduler for HTCondor [[https://htcondor.readthedocs.io/en/v9_0/users-manual/dagman-workflows.html 5]]. In other words, you can submit to the queue a scheduler that manages the execution order, the hierarchy, of several jobs. This is the best tool to handle multiple jobs interdependent. | ||
+ | |||
+ | For instance, we have two jobs, A and B, where B depends directly on A, this means that we have to be sure that A is finished before submitting B. This is the simplest use case of a dagman workflow. | ||
+ | |||
+ | Next, you can see the 2 jobs. Job A consists of writing several files in your scratch dir and job B to remove it. Note that we need to create one submit file per each job. | ||
+ | |||
+ | <pre> | ||
+ | $ cat jobA.sh | ||
+ | #!/bin/bash | ||
+ | |||
+ | ARG1=$1 | ||
+ | |||
+ | cd $_CONDOR_SCRATCH_DIR | ||
+ | mkdir test | ||
+ | cd test | ||
+ | cat << EOF > ./dagmanfile${ARG1}.test | ||
+ | EOF | ||
+ | |||
+ | sleep 30 | ||
+ | |||
+ | $ cat jobB.sh | ||
+ | #!/bin/bash | ||
+ | |||
+ | for a in $(seq 1 100); do | ||
+ | rm -f $HOME/condor/dagman/test/dagmanfile${a}.test | ||
+ | echo "dagmanfile$a.test erased" | ||
+ | done | ||
+ | |||
+ | sleep 30 | ||
+ | </pre> | ||
+ | |||
+ | And here you have the 2 submit files. Take into account that we are transferring back the test directory from scratch to our submit path. The job B is going to remove these files later. | ||
+ | |||
+ | <pre> | ||
+ | $ cat dagjobA.sub | ||
+ | executable = jobA.sh | ||
+ | args = $(Item) | ||
+ | output = jobA-output-$(ClusterId).$(ProcId).out | ||
+ | error = jobA-error-$(ClusterId).$(ProcId).err | ||
+ | log = jobA-log-$(ClusterId).$(ProcId).log | ||
+ | transfer_output_files=test | ||
+ | queue from seq 1 1 100 | | ||
+ | |||
+ | $ cat dagjobB.sub | ||
+ | executable = jobB.sh | ||
+ | output = jobB-output-$(ClusterId).$(ProcId).out | ||
+ | error = jobB-error-$(ClusterId).$(ProcId).err | ||
+ | log = jobB-log-$(ClusterId).$(ProcId).log | ||
+ | queue | ||
+ | </pre> | ||
+ | |||
+ | Then, there is the dagman file: | ||
+ | |||
+ | <pre> | ||
+ | $ cat dagjob.dag | ||
+ | JOB A dagjobA.sub | ||
+ | JOB B dagjobB.sub | ||
+ | PARENT A CHILD B | ||
+ | </pre> | ||
+ | |||
+ | To submit the dagman file you should use ''condor_submit_dag''. | ||
+ | |||
+ | <pre> | ||
+ | $ condor_submit_dag dagjob.dag | ||
+ | |||
+ | ----------------------------------------------------------------------- | ||
+ | File for submitting this DAG to HTCondor : dagjob.dag.condor.sub | ||
+ | Log of DAGMan debugging messages : dagjob.dag.dagman.out | ||
+ | Log of HTCondor library output : dagjob.dag.lib.out | ||
+ | Log of HTCondor library error messages : dagjob.dag.lib.err | ||
+ | Log of the life of condor_dagman itself : dagjob.dag.dagman.log | ||
+ | |||
+ | Submitting job(s). | ||
+ | 1 job(s) submitted to cluster 60771. | ||
+ | ----------------------------------------------------------------------- | ||
+ | </pre> | ||
+ | |||
+ | Several files are generated: the submit file to HTCondor of your dagman jobs, the log, the library standard output and standard error messages and the log file of the condor_dagman. Take into account that this standard output and error consist in the dagman execution, the regular standard output and error of your jobs will be created as normal HTCondor jobs. | ||
+ | |||
+ | Querying with the condor_q we can try to understand what is happening. | ||
+ | |||
+ | <pre> | ||
+ | $ condor_q 60771 | ||
+ | |||
+ | |||
+ | -- Schedd: submit01.pic.es : <193.109.174.82:9618?... @ 07/05/19 16:21:39 | ||
+ | OWNER BATCH_NAME SUBMITTED DONE RUN IDLE TOTAL JOB_IDS | ||
+ | testuser dagjob.dag+60771 7/5 16:21 _ 100 _ 1 60772.0-99 | ||
+ | |||
+ | Total for query: 100 jobs; 0 completed, 0 removed, 0 idle, 100 running, 0 held, 0 suspended | ||
+ | Total for all users: 920 jobs; 148 completed, 0 removed, 632 idle, 140 running, 0 held, 0 suspended | ||
+ | </pre> | ||
+ | |||
+ | The dagman scheduler has Id 60771, then the 100 first jobs generated by the jobA have the ClusterId.ProcId 60772.0-99. | ||
+ | |||
+ | <pre> | ||
+ | $ condor_q 60771 | ||
+ | |||
+ | |||
+ | -- Schedd: submit01.pic.es : <193.109.174.82:9618?... @ 07/05/19 16:22:53 | ||
+ | OWNER BATCH_NAME SUBMITTED DONE RUN IDLE TOTAL JOB_IDS | ||
+ | testuser dagjob.dag+60771 7/5 16:21 _ 1 _ 1 60773.0 | ||
+ | |||
+ | Total for query: 1 jobs; 0 completed, 0 removed, 0 idle, 1 running, 0 held, 0 suspended | ||
+ | Total for all users: 816 jobs; 144 completed, 0 removed, 631 idle, 41 running, 0 held, 0 suspended | ||
+ | </pre> | ||
+ | |||
+ | After the 100 instances of job A are finished, you can see a test directory in your submit path with several dagmanfilesX.file. Then, the job B starts. | ||
+ | |||
+ | jobA and jobB work as regular HTCondor jobs (flavours, requests, etc.). The standard output, error and log files for all the jobs will be generated. Furthermore, there is the different dagman files we have commented above. | ||
+ | |||
+ | Think in a more complex workflow like the next one: | ||
+ | |||
+ | [[image:dagman.png]] | ||
+ | |||
+ | The dagman file has to be something like this: | ||
+ | |||
+ | <pre> | ||
+ | JOB A jobA.sub | ||
+ | JOB B jobB.sub | ||
+ | JOB C jobC.sub | ||
+ | JOB D jobD.sub | ||
+ | JOB E jobE.sub | ||
+ | JOB F jobF.sub | ||
+ | JOB G jobG.sub | ||
+ | JOB H jobH.sub | ||
+ | PARENT A CHILD B C D | ||
+ | PARENT B C CHILD E | ||
+ | PARENT D CHILD F G H | ||
+ | </pre> | ||
+ | |||
+ | === SCRIPT === | ||
+ | |||
+ | There is the SCRIPT command used to specify any processing work you need to execute pre or post the jobs finish. For instance, PRE is used to place files somewhere and POST to clean up files. Take into account that if your scripts give an exit value different from 0 the workflow is going to stop. | ||
+ | |||
+ | A simple example. jobA creates several files in the SCRATCH and jobB add a number in each file. | ||
+ | |||
+ | <pre> | ||
+ | $ cat jobA.sh | ||
+ | #!/bin/bash | ||
+ | |||
+ | ARG1=$1 | ||
+ | cd $_CONDOR_SCRATCH_DIR | ||
+ | mkdir test | ||
+ | cd test/ | ||
+ | touch ./dagmanfile${ARG1}.test | ||
+ | |||
+ | sleep 30 | ||
+ | |||
+ | $ cat jobB.sh | ||
+ | #!/bin/bash | ||
+ | |||
+ | cd $_CONDOR_SCRATCH_DIR | ||
+ | for a in $(seq 1 100); do | ||
+ | echo $a > $HOME/condor/dagman/test/dagmanfile${a}.test | ||
+ | done | ||
+ | |||
+ | sleep 30 | ||
+ | |||
+ | $ cat dagjobA.sub | ||
+ | executable = jobA.sh | ||
+ | args = $(Item) | ||
+ | output = jobA-output-$(ClusterId).$(ProcId).out | ||
+ | error = jobA-error-$(ClusterId).$(ProcId).err | ||
+ | log = jobA-log-$(ClusterId).$(ProcId).log | ||
+ | transfer_output_files=test | ||
+ | queue from seq 1 1 100 | | ||
+ | |||
+ | $ cat dagjobB.sub | ||
+ | executable = jobB.sh | ||
+ | output = jobB-output-$(ClusterId).$(ProcId).out | ||
+ | error = jobB-error-$(ClusterId).$(ProcId).err | ||
+ | log = jobB-log-$(ClusterId).$(ProcId).log | ||
+ | queue | ||
+ | </pre> | ||
+ | |||
+ | After job B execution, we want to create a tar file and remove the created test/ directory. | ||
+ | |||
+ | <pre> | ||
+ | $ cat tar.sh | ||
+ | #!/bin/bash | ||
+ | |||
+ | tar -cvf file-test.tar test/*.test | ||
+ | rm -rf test | ||
+ | </pre> | ||
+ | |||
+ | Thus, this is the dagman file we need: | ||
+ | |||
+ | <pre> | ||
+ | $ cat dagjob-SCRIPT.dag | ||
+ | JOB A dagjobA.sub | ||
+ | JOB B dagjobB.sub | ||
+ | SCRIPT POST B tar.sh | ||
+ | PARENT A CHILD B | ||
+ | </pre> | ||
+ | |||
+ | We submit the job: | ||
+ | |||
+ | <pre> | ||
+ | $ condor_submit_dag dagjob-SCRIPT.dag | ||
+ | |||
+ | ----------------------------------------------------------------------- | ||
+ | File for submitting this DAG to HTCondor : dagjob-SCRIPT.dag.condor.sub | ||
+ | Log of DAGMan debugging messages : dagjob-SCRIPT.dag.dagman.out | ||
+ | Log of HTCondor library output : dagjob-SCRIPT.dag.lib.out | ||
+ | Log of HTCondor library error messages : dagjob-SCRIPT.dag.lib.err | ||
+ | Log of the life of condor_dagman itself : dagjob-SCRIPT.dag.dagman.log | ||
+ | |||
+ | Submitting job(s). | ||
+ | 1 job(s) submitted to cluster 60807. | ||
+ | ----------------------------------------------------------------------- | ||
+ | </pre> | ||
+ | |||
+ | Monitor the behavior. | ||
+ | |||
+ | <pre> | ||
+ | $ condor_q | ||
+ | |||
+ | |||
+ | -- Schedd: submit01.pic.es : <193.109.174.82:9618?... @ 07/08/19 10:28:15 | ||
+ | OWNER BATCH_NAME SUBMITTED DONE RUN IDLE TOTAL JOB_IDS | ||
+ | testuser dagjob-SCRIPT.dag+60807 7/8 10:28 _ _ _ 1 0.0 | ||
+ | |||
+ | Total for query: 0 jobs; 0 completed, 0 removed, 0 idle, 0 running, 0 held, 0 suspended | ||
+ | Total for testuser: 0 jobs; 0 completed, 0 removed, 0 idle, 0 running, 0 held, 0 suspended | ||
+ | Total for all users: 223 jobs; 0 completed, 0 removed, 0 idle, 223 running, 0 held, 0 suspended | ||
+ | |||
+ | $ condor_q | ||
+ | |||
+ | |||
+ | -- Schedd: submit01.pic.es : <193.109.174.82:9618?... @ 07/08/19 10:28:18 | ||
+ | OWNER BATCH_NAME SUBMITTED DONE RUN IDLE TOTAL JOB_IDS | ||
+ | testuser dagjob-SCRIPT.dag+60807 7/8 10:28 _ _ 100 1 60808.0-99 | ||
+ | |||
+ | Total for query: 100 jobs; 0 completed, 0 removed, 100 idle, 0 running, 0 held, 0 suspended | ||
+ | Total for testuser: 100 jobs; 0 completed, 0 removed, 100 idle, 0 running, 0 held, 0 suspended | ||
+ | Total for all users: 323 jobs; 0 completed, 0 removed, 100 idle, 223 running, 0 held, 0 suspended | ||
+ | </pre> | ||
+ | |||
+ | After the job B finishes, a files-test.tar file is created and the test directory removed. | ||
+ | |||
+ | == Submitting lots of jobs (factory) == | ||
+ | |||
+ | There is a maximum total number of jobs you can have: 5000 jobs in total. However, the limit per submission is 2500 jobs, in other words, you cannot submit more than 2500 jobs using the same submit file. | ||
+ | |||
+ | When you need to submit a lot of jobs on HTCondor, you can reduce the load on the servers and control the number of jobs on queue with the '''late materialization job factory''' [[https://htcondor.readthedocs.io/en/latest/users-manual/submitting-a-job.html#submitting-lots-of-jobs 6]]. | ||
+ | |||
+ | What do we consider lots of jobs? More than 2000 jobs in the same submission at PIC can be an approximate number. | ||
+ | |||
+ | === max_materialize === | ||
+ | |||
+ | The scenario could be that you need to submit lots of jobs but do not want to have all of them running at the same time or affect other colleagues that belong to the same experiment. | ||
+ | |||
+ | For a dummy example, I want to submit 100 jobs but I only want 10 to run concurrently at most. Add max_materialize in your submit file: | ||
+ | |||
+ | <pre> | ||
+ | max_materialize=10 | ||
+ | </pre> | ||
+ | |||
+ | <pre> | ||
+ | $ condor_q 8116114 | ||
+ | |||
+ | |||
+ | -- Schedd: submit01.pic.es : <193.109.174.82:9618?... @ 02/08/24 13:27:09 | ||
+ | OWNER BATCH_NAME SUBMITTED DONE RUN IDLE TOTAL JOB_IDS | ||
+ | cacosta ID: 8116114 2/8 13:26 _ 10 _ 100 8116114.0-9 | ||
+ | |||
+ | Total for query: 10 jobs; 0 completed, 0 removed, 0 idle, 10 running, 0 held, 0 suspended | ||
+ | Total for all users: 67 jobs; 0 completed, 0 removed, 1 idle, 59 running, 7 held, 0 suspended | ||
+ | |||
+ | </pre> | ||
+ | |||
+ | There are 100 jobs in total and 10 run, no more. | ||
+ | |||
+ | There is a special way to query this kind of submission: | ||
+ | |||
+ | <pre> | ||
+ | $ condor_q 8116114 -factory | ||
+ | |||
+ | |||
+ | -- Schedd: submit01.pic.es : <193.109.174.82:9618?... @ 02/08/24 13:41:05 | ||
+ | ID OWNER SUBMITTED LIMIT PRESNT RUN IDLE HOLD NEXTID MODE DIGEST | ||
+ | 8116114. cacosta 2/8 13:26 10 10 10 0 0 20 Norm /var/lib/condor/spool/6114/condor_submit.8116114.digest | ||
+ | </pre> | ||
+ | |||
+ | |||
+ | === max_idle === | ||
+ | |||
+ | There is also the max_idle option, to control the number of jobs idle. | ||
+ | |||
+ | <pre> | ||
+ | max_idle=10 | ||
+ | </pre> | ||
+ | |||
+ | |||
+ | The same example as before, 100 jobs. | ||
+ | |||
+ | |||
+ | <pre> | ||
+ | $ condor_q | ||
+ | |||
+ | |||
+ | -- Schedd: submit01.pic.es : <193.109.174.82:9618?... @ 02/08/24 14:30:03 | ||
+ | OWNER BATCH_NAME SUBMITTED DONE RUN IDLE TOTAL JOB_IDS | ||
+ | cacosta ID: 8116197 2/8 14:29 _ 5 10 100 8116197.0-14 | ||
+ | |||
+ | Total for query: 15 jobs; 0 completed, 0 removed, 10 idle, 5 running, 0 held, 0 suspended | ||
+ | Total for cacosta: 15 jobs; 0 completed, 0 removed, 10 idle, 5 running, 0 held, 0 suspended | ||
+ | Total for all users: 133 jobs; 0 completed, 0 removed, 30 idle, 96 running, 7 held, 0 suspended | ||
+ | </pre> | ||
+ | |||
+ | There are always 10 jobs idle, but it is the only constraint, so, at the end, the 100 jobs can run: | ||
+ | |||
+ | <pre> | ||
+ | $ condor_q -factory | ||
+ | |||
+ | |||
+ | -- Schedd: submit01.pic.es : <193.109.174.82:9618?... @ 02/08/24 14:36:42 | ||
+ | ID OWNER SUBMITTED LIMIT PRESNT RUN IDLE HOLD NEXTID MODE DIGEST | ||
+ | 8116197. cacosta 2/8 14:29 100 100 100 0 0 100 Norm /var/lib/condor/spool/6197/condor_submit.8116197.digest | ||
+ | |||
+ | </pre> | ||
= Monitoring your jobs = | = Monitoring your jobs = | ||
Line 565: | Line 1,054: | ||
== condor_q == | == condor_q == | ||
− | The '''condor_q''' command is the one to query the queue of the schedd [[ | + | The '''condor_q''' command is the one to query the queue of the schedd [[https://htcondor.readthedocs.io/en/v9_0/man-pages/condor_q.html 7]]. As other commands in HTcondor, condor_q allows to specify clearly what you want using the "-constraint" option to filter the Job Attributes you want to query. To know all the jobs attributes, you use condor_q -l job_id. Furthermore, you can tune the output with "-autoformat" or "-af" option. |
It is better to show the potential of condor_q in an example: | It is better to show the potential of condor_q in an example: | ||
Line 575: | Line 1,064: | ||
-- Schedd: submit01.pic.es : <193.109.174.82:9618?... @ 03/14/19 09:42:46 | -- Schedd: submit01.pic.es : <193.109.174.82:9618?... @ 03/14/19 09:42:46 | ||
ID OWNER SUBMITTED RUN_TIME ST PRI SIZE CMD | ID OWNER SUBMITTED RUN_TIME ST PRI SIZE CMD | ||
− | 630.0 | + | 630.0 testuser 3/14 09:42 0+00:00:00 I 0 0.0 test.sh --cpu 1 --timeout 10s |
− | 630.1 | + | 630.1 testuser 3/14 09:42 0+00:00:00 I 0 0.0 test.sh --cpu 1 --timeout 10s |
− | 630.2 | + | 630.2 testuser 3/14 09:42 0+00:00:00 I 0 0.0 test.sh --cpu 1 --timeout 10s |
− | 630.3 | + | 630.3 testuser 3/14 09:42 0+00:00:00 I 0 0.0 test.sh --cpu 1 --timeout 10s |
− | 630.4 | + | 630.4 testuser 3/14 09:42 0+00:00:00 I 0 0.0 test.sh --cpu 1 --timeout 10s |
Total for query: 5 jobs; 0 completed, 0 removed, 5 idle, 0 running, 0 held, 0 suspended | Total for query: 5 jobs; 0 completed, 0 removed, 5 idle, 0 running, 0 held, 0 suspended | ||
− | Total for | + | Total for testuser: 5 jobs; 0 completed, 0 removed, 5 idle, 0 running, 0 held, 0 suspended |
Total for all users: 7 jobs; 0 completed, 0 removed, 5 idle, 2 running, 0 held, 0 suspended | Total for all users: 7 jobs; 0 completed, 0 removed, 5 idle, 2 running, 0 held, 0 suspended | ||
</pre> | </pre> | ||
Line 598: | Line 1,087: | ||
630 4 4 2048 | 630 4 4 2048 | ||
</pre> | </pre> | ||
+ | |||
+ | === analyze and better-analyze === | ||
Furthermore, the condor_q command allows the options -analyze and -better-analyze that show you the reason why your job is not running. | Furthermore, the condor_q command allows the options -analyze and -better-analyze that show you the reason why your job is not running. | ||
Line 711: | Line 1,202: | ||
* '''OwnerGroup'''. Main group or experiment of the submitter user. | * '''OwnerGroup'''. Main group or experiment of the submitter user. | ||
− | There are other HTCondor "magic" numbers that you can consult [[https://htcondor-wiki.cs.wisc.edu/index.cgi/wiki?p=MagicNumbers | + | There are other HTCondor "magic" numbers that you can consult [[https://htcondor-wiki.cs.wisc.edu/index.cgi/wiki?p=MagicNumbers 8]]. |
== condor_wait== | == condor_wait== | ||
− | The '''condor_wait''' command allows us to watch and extract information from the user log file [[ | + | The '''condor_wait''' command allows us to watch and extract information from the user log file [[https://htcondor.readthedocs.io/en/v9_0/man-pages/condor_wait.html 9]]. This command waits forever until the job is finished unless a wait time is specified (with -wait option). Furthermore, as condor_wait monitors the log file, it requires a job successfully submitted to be executed. |
It is not as useful as condor_q but can give you information about several jobs if you collect them in the same log file. | It is not as useful as condor_q but can give you information about several jobs if you collect them in the same log file. | ||
Line 737: | Line 1,228: | ||
== condor_history == | == condor_history == | ||
− | Once the job is removed from the queue, you can query it using '''condor_history''' [[ | + | Once the job is removed from the queue, you can query it using '''condor_history''' [[https://htcondor.readthedocs.io/en/v9_0/man-pages/condor_history.html 10]]. The condor_history command allows similar constraints than condor_q. The jobs kept in history for remote access are limited, thus, if you need information for an old job that does not appear in your condor_history query, please ask your contact. |
It is recommended to use the option "-limit N" where N is the number of jobs you want to query to perform faster queries. | It is recommended to use the option "-limit N" where N is the number of jobs you want to query to perform faster queries. | ||
<pre> | <pre> | ||
− | $ condor_history -const 'Owner == " | + | $ condor_history -const 'Owner == "testuser" && JobStatus ==4' -limit 5 |
ID OWNER SUBMITTED RUN_TIME ST COMPLETED CMD | ID OWNER SUBMITTED RUN_TIME ST COMPLETED CMD | ||
− | 78.0 | + | 78.0 testuser 3/14 10:05 0+00:01:42 C 3/14 10:07 /home/testuser/condor/test-local/remote/test.sh --cpu 1 --timeout 100s |
− | 76.0 | + | 76.0 testuser 3/14 10:03 0+00:01:41 C 3/14 10:04 /home/testuser/condor/test-local/remote/test.sh --cpu 1 --timeout 100s |
− | 75.0 | + | 75.0 testuser 3/14 09:59 0+00:01:45 C 3/14 10:01 /home/testuser/condor/test-local/remote/test.sh --cpu 1 --timeout 100s |
− | 74.0 | + | 74.0 testuser 3/14 09:13 0+00:00:25 C 3/14 09:14 /home/testuser/condor/test-local/remote/test-output.sh |
− | 73.0 | + | 73.0 testuser 3/14 09:07 0+00:00:22 C 3/14 09:08 /home/testuser/condor/test-local/remote/test-output.sh |
</pre> | </pre> | ||
== condor_tail == | == condor_tail == | ||
− | The '''condor_tail''' command checks the standard output and error of job while this is running [[ | + | The '''condor_tail''' command checks the standard output and error of job while this is running [[https://htcondor.readthedocs.io/en/v9_0/man-pages/condor_tail.html 11]]. |
− | |||
− | |||
<pre> | <pre> | ||
Line 762: | Line 1,251: | ||
</pre> | </pre> | ||
− | The default is to check the standard output and the "-f" option, follow, acts as ''tail -f'' Linux command, | + | The default is to check the standard output and the "-f" option, follow, do not acts anymore as ''tail -f'' Linux command, it tail the file until interrupted and shows the last 1024 bytes. To check the standard error you need to use "-stderr" option. |
+ | |||
+ | Take into account that condor_tails needs the ClusterId.ProcId of the job to work correctly. | ||
== condor_ssh_to_job == | == condor_ssh_to_job == | ||
− | Finally, another way to monitor how your job is evolving is the '''condor_ssh_to_job''' command [[ | + | Finally, another way to monitor how your job is evolving is the '''condor_ssh_to_job''' command [[https://htcondor.readthedocs.io/en/v9_0/man-pages/condor_ssh_to_job.html 12]]. Using this command, the user can enter into the job directory of the node and check what is happening. |
<pre> | <pre> | ||
Line 778: | Line 1,269: | ||
= Removing your jobs = | = Removing your jobs = | ||
− | To remove your jobs, you have to use '''condor_rm''' command [[ | + | To remove your jobs, you have to use '''condor_rm''' command [[https://htcondor.readthedocs.io/en/v9_0/man-pages/condor_rm.html 13]]. |
The most common way to use condor_rm is just specifying the ClusterId and/or the ProcId of your job. For instance, we have a batch of 4 jobs. | The most common way to use condor_rm is just specifying the ClusterId and/or the ProcId of your job. For instance, we have a batch of 4 jobs. | ||
Line 788: | Line 1,279: | ||
-- Schedd: condor-ui01.pic.es : <193.109.175.231:9618?... @ 04/01/19 15:37:55 | -- Schedd: condor-ui01.pic.es : <193.109.175.231:9618?... @ 04/01/19 15:37:55 | ||
ID OWNER SUBMITTED RUN_TIME ST PRI SIZE CMD | ID OWNER SUBMITTED RUN_TIME ST PRI SIZE CMD | ||
− | 154.0 | + | 154.0 testuser 4/1 15:37 0+00:00:00 I 0 0.0 test.sh --timeout 600s --vm 2 --vm-bytes 2G |
− | 154.1 | + | 154.1 testuser 4/1 15:37 0+00:00:00 I 0 0.0 test.sh --timeout 600s --vm 2 --vm-bytes 2G |
− | 154.2 | + | 154.2 testuser 4/1 15:37 0+00:00:00 I 0 0.0 test.sh --timeout 600s --vm 2 --vm-bytes 2G |
− | 154.3 | + | 154.3 testuser 4/1 15:37 0+00:00:00 I 0 0.0 test.sh --timeout 600s --vm 2 --vm-bytes 2G |
Total for query: 4 jobs; 0 completed, 0 removed, 4 idle, 0 running, 0 held, 0 suspended | Total for query: 4 jobs; 0 completed, 0 removed, 4 idle, 0 running, 0 held, 0 suspended | ||
Line 845: | Line 1,336: | ||
Remember that there is no concept of differentiated queues with static requirements in HTCondor. Therefore, executing condor_q you are querying your jobs in the whole schedd, do not expect the different queues showed as with qstat command. At PIC we use the flavour option to limit the walltime of the jobs in a similar way as they were limited by the queues in Torque/Maui. | Remember that there is no concept of differentiated queues with static requirements in HTCondor. Therefore, executing condor_q you are querying your jobs in the whole schedd, do not expect the different queues showed as with qstat command. At PIC we use the flavour option to limit the walltime of the jobs in a similar way as they were limited by the queues in Torque/Maui. | ||
+ | |||
+ | = Issues = | ||
+ | |||
+ | == Why are my jobs held? == | ||
+ | |||
+ | There are several reasons why your jobs can be held. You can ask HTCondor why your jobs are held using the next command: | ||
+ | |||
+ | <pre> | ||
+ | $ condor_q -const 'JobStatus == 5' -af HoldReason | ||
+ | </pre> | ||
+ | |||
+ | There are at least 3 situations when your job is held that is controlled by different Periodic Hold checkings that are controlled by the administrator. | ||
+ | |||
+ | === Your job walltime XXs exceeded the Maximum Walltime XXs === | ||
+ | |||
+ | The HoldReason indicates that your job has exceeded the maximum walltime for its flavour. Remember that the maximum time your job has been running (10800, 172800 or 345600 seconds depending in your flavour, short, medium or long respectively) can be controlled requesting the flavours. | ||
+ | |||
+ | === Your job memory XXMB exceeded the Job Memory Limit XXMB === | ||
+ | |||
+ | Thus, in that case, your job is consuming more memory than requested (exceeding 1.5 the memory requested!). You should resubmit your job increasing the memory requested. | ||
+ | |||
+ | === CPU usage exceeded request_cpus === | ||
+ | |||
+ | The HoldReason, in this case, means that your job is taking more cpus than requesting for the last hour. You should resubmit your job increasing the number of requested cpus. | ||
+ | |||
+ | == Why are my jobs idle? == | ||
+ | |||
+ | This is one of the most typical questions when you submit jobs to a batch system. You have always to assume that your job will be idle for some time and this time is not always the same, it depends in the requirements of your jobs, the share of your group and the total use of the farm. | ||
+ | |||
+ | Typically, if your needs are smaller (1 CPU and 1 GB of RAM for instance) your job will be idle for less time than if you need 8 CPUs or more than 2 GB RAM per core. | ||
+ | |||
+ | Anyway, remember that you have the options -analyze and -better-analyze in condor_q command that gives you information about why your job is idle (without taking into account the priority). These options are especially useful to detect jobs that are Idle because the requirements are not met. | ||
+ | |||
+ | https://pwiki.pic.es/index.php?title=HTCondor_User_Guide#analyze_and_better-analyze | ||
+ | |||
+ | When your job is requirements cannot match the WorkerNodes available, you will see a WARNING message like this using -analyze or better-analyze | ||
+ | |||
+ | <pre> | ||
+ | $ condor_q -analyze 267201 | ||
+ | |||
+ | |||
+ | -- Schedd: submit01.pic.es : <193.109.174.82:9618?... | ||
+ | The Requirements expression for job 267201.000 is | ||
+ | |||
+ | (TARGET.WN_property == ifThenElse(MY.WN_property is undefined,"default",MY.WN_property)) && (TARGET.Arch == "X86_64") && (TARGET.OpSys == "LINUX") && | ||
+ | (TARGET.Disk >= RequestDisk) && (TARGET.Memory >= RequestMemory) && (TARGET.Cpus >= RequestCpus) && ((TARGET.FileSystemDomain == MY.FileSystemDomain) || | ||
+ | (TARGET.HasFileTransfer)) | ||
+ | |||
+ | |||
+ | No successful match recorded. | ||
+ | Last failed match: Fri Feb 7 16:58:04 2020 | ||
+ | |||
+ | Reason for last match failure: no match found | ||
+ | |||
+ | 267201.000: Run analysis summary ignoring user priority. Of 338 machines, | ||
+ | 338 are rejected by your job's requirements | ||
+ | 0 reject your job because of their own requirements | ||
+ | 0 match and are already running your jobs | ||
+ | 0 match but are serving other users | ||
+ | 0 are able to run your job | ||
+ | |||
+ | WARNING: Be advised: | ||
+ | No machines matched the jobs's constraints | ||
+ | </pre> | ||
+ | |||
+ | = AlmaLinux 9 migration = | ||
+ | |||
+ | AlmaLinux 9 is the default OS for all the WorkerNodes of our cluster since 2024/03/27. | ||
+ | |||
+ | * Gneral ui.pic.es are now on AlmaLinux 9 | ||
+ | * Do not use +WN_property="alma9" anymore. If you use it, your jobs will stay idle forever. | ||
+ | * Remember that you can use the apptainer image to run inside a container using the option: | ||
+ | |||
+ | <pre> | ||
+ | +SingularityImage = "/opt/apptainer-images/pic-centos7.sif" | ||
+ | </pre> | ||
= References and links = | = References and links = | ||
Line 850: | Line 1,417: | ||
You can find many documentation about HTCondor. Here you have a list of useful links from this manual. | You can find many documentation about HTCondor. Here you have a list of useful links from this manual. | ||
− | *[1] | + | *[1] https://htcondor.readthedocs.io/en/v9_0/users-manual/index.html |
+ | |||
+ | *[2] condor_submit: https://htcondor.readthedocs.io/en/9_0/users-manual/submitting-a-job.html | ||
+ | |||
+ | *[3] Queue command: https://htcondor.readthedocs.io/en/v9_0/users-manual/submitting-a-job.html | ||
+ | |||
+ | *[4] Condo_prio command: https://htcondor.readthedocs.io/en/v9_0/man-pages/condor_prio.html | ||
+ | |||
+ | *[5] DAGMAN workflows: https://htcondor.readthedocs.io/en/v9_0/users-manual/dagman-workflows.html | ||
− | *[ | + | *[6] Submitting a lot of jobs: https://htcondor.readthedocs.io/en/latest/users-manual/submitting-a-job.html#submitting-lots-of-jobs |
− | *[ | + | *[7] condor_q: https://htcondor.readthedocs.io/en/v9_0/man-pages/condor_q.html |
− | *[ | + | *[8] HTCondor magic numbers: https://htcondor-wiki.cs.wisc.edu/index.cgi/wiki?p=MagicNumbers |
− | *[ | + | *[9] condor_wait: https://htcondor.readthedocs.io/en/v9_0/man-pages/condor_wait.html |
− | *[ | + | *[10] condor_history: https://htcondor.readthedocs.io/en/v9_0/man-pages/condor_history.html |
− | *[ | + | *[11] condor_tail: https://htcondor.readthedocs.io/en/v9_0/man-pages/condor_tail.html |
− | *[ | + | *[12] condor_ssh_to_job: https://htcondor.readthedocs.io/en/v9_0/man-pages/condor_ssh_to_job.html |
− | *[ | + | *[13] condor_rm https://htcondor.readthedocs.io/en/v9_0/man-pages/condor_rm.html |
− | * | + | * HTcondor User Guide Tutorial: https://docs.google.com/presentation/d/1-64fEcfLyxLzSpZH-tbV-SjMAc0CvMpWhE0I4oaDKkQ/edit?usp=sharing |
− | * | + | * HTCondor Tutorial examples: https://github.com/PortdInformacioCientifica/htcondor-tutorial |
Latest revision as of 10:41, 28 November 2024
Introduction
At PIC HTCondor is the current production batch system replacing the old Torque/Maui environment.
The aim of this document is to show how to submit jobs to the HTCondor infrastructure to all of the non-grid users of the PIC batch system. In other words, this document is a guide to submit local jobs to the HTCondor infrastructure at PIC.
This User Guide begins with a presentation of the batch system concepts in HTCondor in comparison with the old Torque/Maui ones. Then, there is a Quick Start section focused on the minimum knowledge needed to submit a job in an HTCondor pool. The remaining sections try to give a deeper approach in how to submit, monitor and remove jobs in PIC HTCondor pool.
Furthermore, you can access to the HTCondor User Guide slides and a git repository with several examples.
- HTcondor User Guide Tutorial: https://docs.google.com/presentation/d/1-64fEcfLyxLzSpZH-tbV-SjMAc0CvMpWhE0I4oaDKkQ/edit?usp=sharing
- HTCondor Tutorial examples: https://github.com/PortdInformacioCientifica/htcondor-tutorial
We recommend to look at the HTCondor User Manual [1] if you want a deeper approach to the HTCondor concepts.
Basic batch concepts
HTCondor does not work as other batch systems where you submit your job to a differentiated queue that has some specifications. It employs the language of ClassAds (the same concept of classified advertisements) in order to match workload requests and resources. In other words, the jobs and the machines have their particular attributes (number of CPUs, memory, etc.) and the central manager of HTcondor does the matchmaking between these attributes.
Furthermore, similarly to Torque/Maui, there is the concept of fair-share which aims at ensuring that all groups and users are provided resources as needed in correspondence to their respective quota (e.g. the Atlas T2 quota equals to 9% of our resources). The fair-share concept implies that your jobs and the jobs of your experiment will have a greater priority while they are agreed at or below the share, if you are consuming more resources than your share, then the next job with more priority should belong to another experiment.
When a user submits a job from the user interface, it is queued in the HTCondor queue system (schedd) and, according to its priority and its requirements, the job is assigned by the batch system's Central Manager (running collector and negotiator daemons) to be executed in a Worker Node (startd) that matches its requirements. Once the job has finished, files such as the job log, the standard output and the standard error are retrieved back from the Worker Node to the submit machine.
It has to be remarked that, as it happens for all the batch systems, there is a strong dependence between the resources you require and the time on the queue. In other words, single-core jobs needing 2 GB of RAM will find a match faster than multi-core jobs that need many cores or jobs that require a lot of memory.
Quick start
Before taking a deeper view in all the elements of the job submission, we will show you the basic commands for a quick start guide to HTcondor.
In our old Torque/Maui environment, the user would log into a machine, prepare the input and submit jobs to a queue using qsub command. Now, in a very similar way, the user logs into the User Interface, prepares a submit file, and then creates and inserts jobs into the queue using a condor_submit command.
Since November 2019, all the users can submit directly their jobs from the user interfaces.
Next, you can find the basic skeleton of an HTcondor submit file (called test.sub in this case).
$ cat test.sub executable = test.sh args = -c 1 -t 60 output = condor.out error = condor.err log = condor.log queue
The script just executes a stress command (a simple workload generator for UNIX systems). You can specify the number of cpus you want to stress (-c 1) and the timeout of the process (-t 60 seconds).
$ cat test.sh #!/bin/bash /bin/stress $@
This example can be easily understood as follows: it must include the executable with your script or command, the arguments (args) of your command and where to store the STDOUT (output), the STDERR (error) and the HTCondor log which reports the status of the job. Finally, you can find the "queue" command which tells condor how many instances of the job you want to run ("queue 1", or simply "queue", to submit one, or "queue N" to submit N jobs, for instance). You can find more information about these variables in the next sections of this manual.
Then, you can submit your job using condor_submit.
$ condor_submit test.sub Submitting job(s). 1 job(s) submitted to cluster 281.
Make sure that your script or executable is correctly created, for example, this means that it has the correct execution permissions and your script has the shebang (the character sequence line which starts with #! at the beginning of the script). In other words, your executable has to be runnable from the command line.
On the other hand, in order to monitor the status of your job, you can query the queue with the condor_q command (in a similar way as you do with qstat in Torque). The default condor_q output can vary slightly from one version to another in HTCondor.
$ condor_q -- Schedd: submit01.pic.es : <193.109.174.82:9618?... @ 02/25/19 13:07:52 OWNER BATCH_NAME SUBMITTED DONE RUN IDLE TOTAL JOB_IDS testuser ID: 281 2/25 13:07 _ _ 1 1 281.0 Total for query: 1 jobs; 0 completed, 0 removed, 1 idle, 0 running, 0 held, 0 suspended Total for testuser: 1 jobs; 0 completed, 0 removed, 1 idle, 0 running, 0 held, 0 suspended Total for all users: 1 jobs; 0 completed, 0 removed, 1 idle, 0 running, 0 held, 0 suspended
It returns the owner, the batch name of your job, the submission date, the status (Done, Run or Idle) and the JobIds.
Using the option -nobatch reports an output that does not group the jobs.
$ condor_q -nobatch -- Schedd: submit01.pic.es : <193.109.174.82:9618?... @ 02/25/19 13:10:10 ID OWNER SUBMITTED RUN_TIME ST PRI SIZE CMD 281.0 testuser 2/25 13:09 0+00:00:00 I 0 0.0 test.sh -c 1 -t 60 Total for query: 1 jobs; 0 completed, 0 removed, 1 idle, 0 running, 0 held, 0 suspended Total for testuser: 1 jobs; 0 completed, 0 removed, 1 idle, 0 running, 0 held, 0 suspended Total for all users: 1 jobs; 0 completed, 0 removed, 1 idle, 0 running, 0 held, 0 suspended
Finally, to remove the jobs, you can use condor_rm that works as qdel in Torque/Maui.
$ condor_rm 281 All jobs in cluster 281 have been marked for removal
Submitting your jobs
After a basic view of how to submit a job, we are going to explain more details about the job submission, in particular about the options of the submit file. You can find detailed information about condor_submit and the characteristics of the submit file in the documentation [2].
First of all, it has to be mentioned that the only options that are mandatory in a submit file are the "executable" and the "queue" command.
Executable, input, arguments, outputs, errors and logs
You must specify your executable and you can specify the input, output and error logs in your submit files as we have seen before:
executable = exec input = input.txt output = out.txt error = err.txt log = log.txt
Thus, you can specify the location of the input of your application, considering that HTCondor uses the input to pipe into the stdin of the executable. On the other hand, there is the output which contains the standard output (stdout) and the error which contains the standard error (stderr). The log file reports the status of the job by HTCondor. If any of these options is not defined in the submit file, HTCondor redirects standard output and standard error to /dev/null.
In the log file, you can see the submission host, the node where the job is executed, information about the memory consumption and a final summary of the resources used by your job. Here you have an example of a typical log file:
$ cat test-736.0.log 000 (736.000.000) 03/25 10:00:22 Job submitted from host: <193.109.174.82:9618?addrs=193.109.174.82-9618&noUDP&sock=961738_da40_3> ... 001 (736.000.000) 03/25 10:00:40 Job executing on host: <192.168.101.48:9618?addrs=192.168.101.48-9618&noUDP&sock=14755_47fd_3> ... 006 (736.000.000) 03/25 10:00:49 Image size of job updated: 972 1 - MemoryUsage of job (MB) 952 - ResidentSetSize of job (KB) ... 005 (736.000.000) 03/25 10:00:50 Job terminated. (1) Normal termination (return value 0) Usr 0 00:00:10, Sys 0 00:00:00 - Run Remote Usage Usr 0 00:00:00, Sys 0 00:00:00 - Run Local Usage Usr 0 00:00:10, Sys 0 00:00:00 - Total Remote Usage Usr 0 00:00:00, Sys 0 00:00:00 - Total Local Usage 111 - Run Bytes Sent By Job 28 - Run Bytes Received By Job 111 - Total Bytes Sent By Job 28 - Total Bytes Received By Job Partitionable Resources : Usage Request Allocated Cpus : 0.02 1 1 Disk (KB) : 17 1 891237 Memory (MB) : 1 200 256 ...
The log file allows us to monitor our jobs using the condor_wait command. You will find more information about condor_q and condor_wait later in this document.
$ condor_wait -status log-62.0.log 62.0.0 submitted 62.0.0 executing on host <192.168.100.29:9618?addrs=192.168.100.29-9618+[--1]-9618&noUDP&sock=73457_f878_3> 62.0.0 completed All jobs done.The command '''condor_wait''' is used to track the information in log file. This command will be explained in next sections.
IMPORTANT NOTE. The executable is mandatory but not the other elements. This is especially relevant for the HTCondor logfile which can introduce extra load in the shared file systems if you submit too many jobs at the same time writing a log. If you plan to not check the HTCondor log, you can skip it and query your jobs by condor_q commands.
ClusterId and JobId
When you submit multiple jobs, it is in general useful to assign unique filenames, for example typically containing the cluster and job ID variables (ClusterId and ProcId respectively). For instance:
executable = test.sh input = input.txt arguments = arg output = test-$(ClusterId).$(ProcId).out error = test-$(ClusterId).$(ProcId).err log = test-$(ClusterId).$(ProcId).log
The job identifiers are $(ClusterId).$(ProcId) in HTcondor. The jobs in the queue are grouped in batches or clusters, the general number of your batch is the ClusterId while the different jobs inside your batch are defined by ProcId. In other words, if you submit only one job, you will obtain a $(ClusterId).0, while if you submit for instance 3 jobs using the same submit file, you will obtain $(ClusterId).0, $(ClusterId).1 and $(ClusterId).2. Furthermore, you can define a name for your batch using +JobBatchName option.
The next test.sub file submits two jobs to the queue.
executable = test.sh arguments = -c 1 -t 60 output = test-$(ClusterId).$(ProcId).out error = test-$(ClusterId).$(ProcId).err log = test-$(ClusterId).$(ProcId).log +JobBatchName="MyJobs" queue 2
$ condor_submit test.sub Submitting job(s).... 2 job(s) submitted to cluster 740.
Using condor_q we can see the batch name and the jobs grouped.
$ condor_q 740 -- Schedd: submit01.pic.es : <193.109.174.82:9618?... @ 03/25/19 11:27:30 OWNER BATCH_NAME SUBMITTED DONE RUN IDLE TOTAL JOB_IDS testuser MyJobs 3/25 11:27 _ _ 2 4 740.0-1 Total for query: 2 jobs; 0 completed, 0 removed, 2 idle, 0 running, 0 held, 0 suspended Total for all users: 24 jobs; 0 completed, 0 removed, 5 idle, 19 running, 0 held, 0 suspended
Using condor_q -nobatch we monitor the status of the jobs ungrouped.
$ condor_q -nobatch 740 -- Schedd: submit01.pic.es : <193.109.174.82:9618?... @ 03/25/19 11:27:33 ID OWNER SUBMITTED RUN_TIME ST PRI SIZE CMD 740.0 testuser 3/25 11:27 0+00:00:00 I 0 0.0 test.sh -c 1 -t 60 740.1 testuser 3/25 11:27 0+00:00:00 I 0 0.0 test.sh -c 1 -t 60 Total for query: 2 jobs; 0 completed, 0 removed, 2 idle, 0 running, 0 held, 0 suspended Total for all users: 24 jobs; 0 completed, 0 removed, 5 idle, 19 running, 0 held, 0 suspended
Requests
You can request the cpu, disk and memory that your jobs need. This is done by request_cpus, request_disk and request_memory options in your submit file. You can use units in your requests.
executable = test.sh args = -c 8 -t 60 output = test-$(ClusterId).$(ProcId).out error = test-$(ClusterId).$(ProcId).err log = test-$(ClusterId).$(ProcId).log request_memory = 4 GB request_cpus = 8 request_disk = 30 GB queue
Thus, this job asks for 4 GB of RAM, 8 CPUs and 30 GB of disk.
$ condor_q 1462 -af RequestCpus RequestMemory RequestDisk 8 4096 31457280
Here you can see the report of the requested resources: the number of cpus, the memory in MB and the disk in KB.
There are default values already defined, so, if you do not use request_cpus, request_memory or request_disk in your submit file, your job will ask for these default values:
- 1 cpu
- 2 GB of memory per CPU
- 15 GB of disk per CPU
The CPU and the memory needed for your job are the most important requirements, on the other hand, the disk means the local and temporal disk used by your job and can be left as the default value for the major part of the cases.
In case you think that your job requires too many cores, memory, or disk, do not hesitate to ask your Contact person at PIC.
Note than in request_memory or request_disk, M or MB indicates MiB, G or GB indicates GiB, etc.
Multi-core
The request_cpus option in your submit_file defines if your job is single-core or multi-core. Then, the slot is created in the WNs with the resources that satisfy your request. In other words, submitting multi-core jobs is as easy as indicating request_cpus=X where X > 1. Although you can ask for the number of slots you desire, note that our pool is better tuned for multi-core jobs of 8 cpus (therefore it will be easier to satisfy such requests, hence these jobs will remain in queue shorter).
GPUs
Right now, there are 18 GPUs at PIC. Since January 2024, the 2 machines with GeForce GTX 1050 Ti GPUs, gpu02 and gpu03, are not accessible directly to users from batch or Jupyter.
There are two other machines with GPUs at PIC. The gpu01 with 8 GPUs RTX 2080 Ti and the gpu05 with 8 GPUs V100. Gpu01 priority use is Jupyter while gpu05 priority use is the Magnesia group. You can access directly to any of these machines by using "request_gpus" option, however take into consideration that you will run in preemption mode, which means that if a higher-priority job from Jupyter or Magnesia needs the resources, your job will be killed, and put in the queue again.
Example of submit file to run in gpu01 or gpu05:
executable = test.sh output = test.out error = test.err log = test.log request_cpus=1 request_memory = 4GB request_gpus=1 queue
If you want to run specifically on gpu01 or gpu05, you can add requirements to your submit file such as:
Requirements = regexp("gpu01",Name)
Doing this, your job will land on the desired machine.
Flavours
The maximum walltime of your job can be specified using a flavour. There are 3 such flavours: short, medium and long.
- short: 3 hours
- medium: 48 hours
- long: 96 hours
executable = test.sh args = -c 1 -t 60 output = output-$(ClusterId).$(ProcId).out error = error-$(ClusterId).$(ProcId).err log = log-$(ClusterId).$(ProcId).log +flavour="short" queue
If you do not choose any flavour explicitly, the flavour medium is the default which corresponds to 48 hours of walltime. Please do not forget to use the "" or the flavour medium will be chosen.
Once the job arrives at the time limit, it will be held and it remains in this status for 6 hours before being removed from the queue. Thus, the user can check the jobs held in the queue for 6 hours. You will find more information about the JobStatus in later sections.
The priority works in this order short > medium > long being the shortest jobs the ones with higher priority.
Moreover, the flavours strictly control the memory consumption of your jobs. Jobs that exceed 50% over the requested memory will be also held.
If for some reason your jobs need walltime over the 96 hours, please ask your contact.
Environment
The jobs find several grid variables defined, the $HOME variable and a general $PATH (/bin:/usr/local/bin:/usr/bin). However, the user may define environment variables for the job's environment by using the environment command.
For instance, for the next script and submission file:
$ cat test.sh #!/bin/bash echo 'My HOME directory is: ' $HOME echo 'My Workdir is: ' $PWD echo 'My PATH is: ' $PATH echo 'My SOFTWARE directory is: ' $SOFT_DIR
$ cat test.sub executable = test.sh output = output-$(ClusterId).$(ProcId).out error = error-$(ClusterId).$(ProcId).err log = log-$(ClusterId).$(ProcId).log queue
Once the job is executed, the next output file is obtained:
$ cat output-128.0.out My HOME directory is: /home/testuser My Workdir is: /home/execute/dir_13143 My PATH is: /bin:/usr/local/bin:/usr/bin My SOFTWARE directory is:
As you can see in the output, the $HOME directory is defined but there is not a $HOME/bin directory in the $PATH, although $HOME/bin is a directory added int the $PATH in the .bashrc of the user, HTCondor does not load the information of the .bashrc by default. Furthermore, the $SOFT_DIR variable is also empty. Taking into account that these variables, $PATH and $SOFTWARE are known and defined, for instance, in the .bashrc, we can submit the job in different ways, using the environment command or directly adding the needed exports in your script.
- 1) Using environment command
$ cat test1.sub executable = test.sh output = output-$(ClusterId).$(ProcId).out error = error-$(ClusterId).$(ProcId).err log = log-$(ClusterId).$(ProcId).log environment=PATH=$ENV(PATH);SOFT_DIR=/software/dteam/ queue
$ cat output-129.0.out My HOME directory is: /home/testuser My Workdir is: /home/execute/dir_13420 My PATH is: /bin:/usr/lib64/qt-3.3/bin:/usr/local/bin:/usr/bin:/usr/local/sbin:/usr/sbin:/home/testuser/bin:/home/testuser/bin My SOFTWARE directory is: /software/dteam/
The $ENV(variable) allows access to the environment variables available in the submit server.
- 2) Adding exports in your script
$ cat test2.sh #!/bin/bash export PATH=$PATH:$HOME/bin export SOFT_DIR=/software/dteam echo 'My HOME directory is: ' $HOME echo 'My Workdir is: ' $PWD echo 'My PATH is: ' $PATH echo 'My SOFTWARE directory is: ' $SOFT_DIR
$ cat test2.sub executable = test2.sh output = output-$(ClusterId).$(ProcId).out error = error-$(ClusterId).$(ProcId).err log = log-$(ClusterId).$(ProcId).log queue
$ cat output-130.0.out My HOME directory is: /home/testuser My Workdir is: /home/execute/dir_15903 My PATH is: /bin:/usr/local/bin:/usr/bin:/home/testuser/bin My SOFTWARE directory is: /software/dteam
Notice that the PATH is different from the first example, it only adds $HOME/bin and not the whole PATH that was loaded with $ENV(PATH).
Queue
Queue is the command in your submit file to select the number of job instances you want to submit. Basically, you can specify the number of jobs of your batch using queue N at the end of your submit file, being N the number of jobs per batch. This is a very powerful tool that allows users to submit several jobs in different ways using the same submission script.
Matching pattern
Another example, we want to submit jobs that match the filenames we have in our current directory.
executable = /bin/echo arguments = $(filename) output = test-$(ClusterId).$(ProcId).out error = test-$(ClusterId).$(ProcId).err log = test-$(ClusterId).$(ProcId).log queue filename matching files Hello*
In our current directory:
$ ls Hello* Hello1 Hello2 Hello3
So, 3 jobs will be submitted.
$ condor_submit test-queue.sub Submitting job(s)... 3 job(s) submitted to cluster 321.
$ grep Hello output-321.*out output-321.0.out:Hello1 output-321.1.out:Hello2 output-321.2.out:Hello3
From file
We have an executable test.sh that can have two arguments -c $(arg1), -y $(arg2) and we want to submit 4 jobs using a different set of arguments. This can be done using queue ... from file.
executable = test.sh arguments = -c $(arg1) -t $(arg2) output = test-$(ClusterId).$(ProcId).out error = test-$(ClusterId).$(ProcId).err log = test-$(ClusterId).$(ProcId).log queue arg1,arg2 from arg_list.txt
Where the arg_list.txt is:
$ cat arg_list.txt 1, 15 2, 10 1, 12 4, 13
In list
Similar that using from file, you can specify your different elements in a list. The next example will submit 4 jobs, each one with the argument specified in the list.
executable = test.sh arguments = -c 1 -t $(arg1) output = test-$(ClusterId).$(ProcId).out error = test-$(ClusterId).$(ProcId).err log = test-$(ClusterId).$(ProcId).log queue arg1 in (10 15 5 20)
Multiple queue statements (deprecated)
IMPORTANT NOTE. Although this was the simplest and most used way to submit several jobs using queue, the HTCondor developers want to remove it and favor the other ways to use the queue command. For instance, the multiple queue statements system is not working anymore for DAGMan jobs (since HTCondor 23.0.6 update) and in the future will also not work with condor_submit. Hence, we do not recommend to use it. We keep it here to prevent users to use it.
We want to submit 2 jobs using the same executable, each one with their arguments and different cpus.
executable = test.sh output = test-$(ClusterId).$(ProcId).out error = test-$(ClusterId).$(ProcId).err log = test-$(ClusterId).$(ProcId).log args = -c1 -t 60 request_cpus = 1 queue args = -c 8 -t 60 request_cpus = 8 queue
Thus, there will be two jobs submitted, the first one using 1 cpu and the second one using 8.
In conclusion, as you can see there are several ways to use queue command. You can find more examples in the documentation [3].
Transfer files
First of all, note that the standard output and standard error of your job will always be transferred back when the job finishes and here we understand finishing in all the ways: completed jobs, removed jobs and held jobs. However, all the other files generated in the scratch dir of the WN will not be transferred back if you do not specify which files.
To transfer the input file you should use input= or transfer_input_files= options in your submit file.
At PIC, the option when_to_transfer_output is forced to be ON_EXIT_OR_EVICT, thus, the output files chosen would be transferred back when the job finishes in any state or when the job is evicted (held).
transfer_output_files
The next example shows a script called test-output.sh that writes in two files:
#!/bin/bash for i in {1..10}; do echo "Number $i" >> $_CONDOR_SCRATCH_DIR/1.txt sleep 2s echo "Hello" >> $_CONDOR_SCRATCH_DIR/2.txt done
This, after submitting this job, the script will generate the file 1.txt and 2.txt in the scratch directory. Note that the condor variable $_CONDOR_SCRATCH_DIR is really useful to be sure that you are pointing to the scratch directory of the node. This is very important if you need to recover any of the files generated in the $_CONDOR_STRACH_DIR of the WN, you should use the option transfer_output_files in your submit file.
executable = test-output.sh output = test-$(ClusterId).$(ProcId).out error = test-$(ClusterId).$(ProcId).err log = test-$(ClusterId).$(ProcId).log transfer_output_files=1.txt queue
With transfer_output_files command, you can decide the files you want to be transferred back. Using the same script test-output.sh that generates 1.txt and 2.txt, the 1.txt is the only file which is going to be transferred to the submit machine when the job finishes.
Unfortunately, transfer_output_files does not allow wildcards. Note that if you use a wrong filename, in other words, a file that is not really in the scratch directory, an empty file will be generated after job completion.
Thus, as wildcards are not allowed, the best way to transfer several output files is to use directories. Consider the next script:
$ cat test-output-dir.sh #!/bin/bash cd $_CONDOR_SCRATCH_DIR mkdir my_files cd my_files for i in {1..10}; do echo "Number $i" >> ./1.txt sleep 2s echo "Hello" >> ./2.txt done
We are creating a local directory in our scratch dir called my_files and putting there the 1.txt and 2.txt files. Then, we transfer back the whole directory because we are using transfer_output_files=my_files:
executable = test-output-dir.sh output = test-out1-$(ClusterId).$(ProcId).out error = test-out1-$(ClusterId).$(ProcId).err log = test-out1-$(ClusterId).$(ProcId).log transfer_output_files=my_files queue
In our submit directory, after the job finishes, we are going to find the directory with the files.
$ ls my_files/ 1.txt 2.txt
Of course, if you only want one file from the created directory, you just need to use transfer_output_files=my_files/2.txt for instance. In that case, only the file will be transferred, the directory is not going to be created in the submit side.
transfer_output_remaps
On the other hand, if you want that to transfer the output file to another directory, you should use transfer_output_remaps option.
executable = test-output.sh output = test-$(ClusterId).$(ProcId).out error = test-$(ClusterId).$(ProcId).err log = test-$(ClusterId).$(ProcId).log transfer_output_files=1.txt transfer_output_remaps="1.txt=outputs/1-$(ClusterId).$(ProcId).txt" queue
The submit file above is going to transfer the 1.txt file from the scratch dir to ./outputs/1-$(ClusterId).$(ProcId).txt
transfer_executable
The Transfer executable mechanism defaults to false for all HTCondor jobs. This means that the executable is searched in the job destination, in other words, the WN.
If for some reason, this configuration does not fit your needs, you should add in your submit file:
+transfer_exec = true
Using this option, your executable will be transferred to the WN on the $_CONDOR_SCRATCH_DIR directory.
You can find a (rare) situation where the executable is not present in your user interface but you know for sure that is present in the WNs. Take into account that HTCondor always looks for your executable before submitting your job unless you explicitly add transfer_executable = false in your submit file.
Accounting Group
The priority of your job is calculated depending on the Accounting Group you belong to. The common users do not have to worry about the Accounting Group, as it will be automatically taken considering the primary group.
Anyway, if you are in two groups and need to change your Accounting Group for any submission, you can add +experiment="experiment" option in your submit file. Thus, for instance, there is one user that has main group vip and secondary group virgo and want to submit a job that only accounts to virgo.
executable = test.sh args = --cpu 1 --timeout 120 output = test-$(ClusterId).$(ProcId).out error = test-$(ClusterId).$(ProcId).err log = test-$(ClusterId).$(ProcId).log +experiment="virgo" queue
Using the option +experiment="virgo" the job will have the share of the virgo experiment and will also be accounted in our records as a virgo job.
The local experiments available now at PIC are:
- atlas
- bioinfo
- co2flux
- cta
- des
- desi
- euclid
- magic
- mice
- neutrinos
- paus
- uab-giq
- vip
- virgo
Furthermore, the automatic assignation of the Accounting Groups is also done for those groups that have dedicated User Interfaces (such as mic.magic, at3 and ui01-virgo). In that case, the Accounting Group is created according to the User Interface group owner. In other words, it does not matter the main group of the user, if I submit a job from mic.magic machine, my AccountingGroup belongs to magic.
Priority per user basis
In your submit file you can also add a priority argument that can be any integer, being 0 the default. Jobs with higher priority will run before jobs with lower numerical priority. This works per user basis, in other words, it is a way to order the execution of your jobs. Thus, a user with many jobs can use this command to order the jobs but it has no effect on whether or not these job will run before another user's jobs.
Note that is recommended to use + and - to specify positive or negative priority value for your jobs.
This can be done also with condor_prio command [4]:
condor_prio -p +10 52165.0
Interactive submission
As it happens with Torque/Maui, HTCondor also has the possibility to submit interactive jobs using the option "-i" or "-interactive".
$ condor_submit -i
You have to wait in queue and finally you enter inside a WN:
$ condor_submit -i Submitting job(s). 1 job(s) submitted to cluster 52166. Waiting for job to start... Welcome to slot1_9@td730.pic.es! You will be logged out after 7200 seconds of inactivity. [testuser@td730 dir_23132]$
Notice that you dispose of 7200 seconds of inactivity. Other defaults are:
- 1 Core
- 512 MB of RAM (caution, not the 2048 MB for the other jobs!)
- Walltime of 48 hours (such as all jobs with medium flavour)
Anyway, you can use a submit file to launch an interactive job:
$ condor_submit -i test.sub
The session created in the node is affected by the same restrictions of cpu, memory, disk, etc. However, there are options of the submit file that has no sense in an interactive session: executable or arguments.
Notifications
Although HTCondor has an email notification system, we do not allow users to use this system on our cluster. Potential misuse of this feature can result in thousands of emails and security issues.
Dagman
HTCondor Dagman (Directed Acyclic Graph Manager) is a meta-scheduler for HTCondor [5]. In other words, you can submit to the queue a scheduler that manages the execution order, the hierarchy, of several jobs. This is the best tool to handle multiple jobs interdependent.
For instance, we have two jobs, A and B, where B depends directly on A, this means that we have to be sure that A is finished before submitting B. This is the simplest use case of a dagman workflow.
Next, you can see the 2 jobs. Job A consists of writing several files in your scratch dir and job B to remove it. Note that we need to create one submit file per each job.
$ cat jobA.sh #!/bin/bash ARG1=$1 cd $_CONDOR_SCRATCH_DIR mkdir test cd test cat << EOF > ./dagmanfile${ARG1}.test EOF sleep 30 $ cat jobB.sh #!/bin/bash for a in $(seq 1 100); do rm -f $HOME/condor/dagman/test/dagmanfile${a}.test echo "dagmanfile$a.test erased" done sleep 30
And here you have the 2 submit files. Take into account that we are transferring back the test directory from scratch to our submit path. The job B is going to remove these files later.
$ cat dagjobA.sub executable = jobA.sh args = $(Item) output = jobA-output-$(ClusterId).$(ProcId).out error = jobA-error-$(ClusterId).$(ProcId).err log = jobA-log-$(ClusterId).$(ProcId).log transfer_output_files=test queue from seq 1 1 100 | $ cat dagjobB.sub executable = jobB.sh output = jobB-output-$(ClusterId).$(ProcId).out error = jobB-error-$(ClusterId).$(ProcId).err log = jobB-log-$(ClusterId).$(ProcId).log queue
Then, there is the dagman file:
$ cat dagjob.dag JOB A dagjobA.sub JOB B dagjobB.sub PARENT A CHILD B
To submit the dagman file you should use condor_submit_dag.
$ condor_submit_dag dagjob.dag ----------------------------------------------------------------------- File for submitting this DAG to HTCondor : dagjob.dag.condor.sub Log of DAGMan debugging messages : dagjob.dag.dagman.out Log of HTCondor library output : dagjob.dag.lib.out Log of HTCondor library error messages : dagjob.dag.lib.err Log of the life of condor_dagman itself : dagjob.dag.dagman.log Submitting job(s). 1 job(s) submitted to cluster 60771. -----------------------------------------------------------------------
Several files are generated: the submit file to HTCondor of your dagman jobs, the log, the library standard output and standard error messages and the log file of the condor_dagman. Take into account that this standard output and error consist in the dagman execution, the regular standard output and error of your jobs will be created as normal HTCondor jobs.
Querying with the condor_q we can try to understand what is happening.
$ condor_q 60771 -- Schedd: submit01.pic.es : <193.109.174.82:9618?... @ 07/05/19 16:21:39 OWNER BATCH_NAME SUBMITTED DONE RUN IDLE TOTAL JOB_IDS testuser dagjob.dag+60771 7/5 16:21 _ 100 _ 1 60772.0-99 Total for query: 100 jobs; 0 completed, 0 removed, 0 idle, 100 running, 0 held, 0 suspended Total for all users: 920 jobs; 148 completed, 0 removed, 632 idle, 140 running, 0 held, 0 suspended
The dagman scheduler has Id 60771, then the 100 first jobs generated by the jobA have the ClusterId.ProcId 60772.0-99.
$ condor_q 60771 -- Schedd: submit01.pic.es : <193.109.174.82:9618?... @ 07/05/19 16:22:53 OWNER BATCH_NAME SUBMITTED DONE RUN IDLE TOTAL JOB_IDS testuser dagjob.dag+60771 7/5 16:21 _ 1 _ 1 60773.0 Total for query: 1 jobs; 0 completed, 0 removed, 0 idle, 1 running, 0 held, 0 suspended Total for all users: 816 jobs; 144 completed, 0 removed, 631 idle, 41 running, 0 held, 0 suspended
After the 100 instances of job A are finished, you can see a test directory in your submit path with several dagmanfilesX.file. Then, the job B starts.
jobA and jobB work as regular HTCondor jobs (flavours, requests, etc.). The standard output, error and log files for all the jobs will be generated. Furthermore, there is the different dagman files we have commented above.
Think in a more complex workflow like the next one:
The dagman file has to be something like this:
JOB A jobA.sub JOB B jobB.sub JOB C jobC.sub JOB D jobD.sub JOB E jobE.sub JOB F jobF.sub JOB G jobG.sub JOB H jobH.sub PARENT A CHILD B C D PARENT B C CHILD E PARENT D CHILD F G H
SCRIPT
There is the SCRIPT command used to specify any processing work you need to execute pre or post the jobs finish. For instance, PRE is used to place files somewhere and POST to clean up files. Take into account that if your scripts give an exit value different from 0 the workflow is going to stop.
A simple example. jobA creates several files in the SCRATCH and jobB add a number in each file.
$ cat jobA.sh #!/bin/bash ARG1=$1 cd $_CONDOR_SCRATCH_DIR mkdir test cd test/ touch ./dagmanfile${ARG1}.test sleep 30 $ cat jobB.sh #!/bin/bash cd $_CONDOR_SCRATCH_DIR for a in $(seq 1 100); do echo $a > $HOME/condor/dagman/test/dagmanfile${a}.test done sleep 30 $ cat dagjobA.sub executable = jobA.sh args = $(Item) output = jobA-output-$(ClusterId).$(ProcId).out error = jobA-error-$(ClusterId).$(ProcId).err log = jobA-log-$(ClusterId).$(ProcId).log transfer_output_files=test queue from seq 1 1 100 | $ cat dagjobB.sub executable = jobB.sh output = jobB-output-$(ClusterId).$(ProcId).out error = jobB-error-$(ClusterId).$(ProcId).err log = jobB-log-$(ClusterId).$(ProcId).log queue
After job B execution, we want to create a tar file and remove the created test/ directory.
$ cat tar.sh #!/bin/bash tar -cvf file-test.tar test/*.test rm -rf test
Thus, this is the dagman file we need:
$ cat dagjob-SCRIPT.dag JOB A dagjobA.sub JOB B dagjobB.sub SCRIPT POST B tar.sh PARENT A CHILD B
We submit the job:
$ condor_submit_dag dagjob-SCRIPT.dag ----------------------------------------------------------------------- File for submitting this DAG to HTCondor : dagjob-SCRIPT.dag.condor.sub Log of DAGMan debugging messages : dagjob-SCRIPT.dag.dagman.out Log of HTCondor library output : dagjob-SCRIPT.dag.lib.out Log of HTCondor library error messages : dagjob-SCRIPT.dag.lib.err Log of the life of condor_dagman itself : dagjob-SCRIPT.dag.dagman.log Submitting job(s). 1 job(s) submitted to cluster 60807. -----------------------------------------------------------------------
Monitor the behavior.
$ condor_q -- Schedd: submit01.pic.es : <193.109.174.82:9618?... @ 07/08/19 10:28:15 OWNER BATCH_NAME SUBMITTED DONE RUN IDLE TOTAL JOB_IDS testuser dagjob-SCRIPT.dag+60807 7/8 10:28 _ _ _ 1 0.0 Total for query: 0 jobs; 0 completed, 0 removed, 0 idle, 0 running, 0 held, 0 suspended Total for testuser: 0 jobs; 0 completed, 0 removed, 0 idle, 0 running, 0 held, 0 suspended Total for all users: 223 jobs; 0 completed, 0 removed, 0 idle, 223 running, 0 held, 0 suspended $ condor_q -- Schedd: submit01.pic.es : <193.109.174.82:9618?... @ 07/08/19 10:28:18 OWNER BATCH_NAME SUBMITTED DONE RUN IDLE TOTAL JOB_IDS testuser dagjob-SCRIPT.dag+60807 7/8 10:28 _ _ 100 1 60808.0-99 Total for query: 100 jobs; 0 completed, 0 removed, 100 idle, 0 running, 0 held, 0 suspended Total for testuser: 100 jobs; 0 completed, 0 removed, 100 idle, 0 running, 0 held, 0 suspended Total for all users: 323 jobs; 0 completed, 0 removed, 100 idle, 223 running, 0 held, 0 suspended
After the job B finishes, a files-test.tar file is created and the test directory removed.
Submitting lots of jobs (factory)
There is a maximum total number of jobs you can have: 5000 jobs in total. However, the limit per submission is 2500 jobs, in other words, you cannot submit more than 2500 jobs using the same submit file.
When you need to submit a lot of jobs on HTCondor, you can reduce the load on the servers and control the number of jobs on queue with the late materialization job factory [6].
What do we consider lots of jobs? More than 2000 jobs in the same submission at PIC can be an approximate number.
max_materialize
The scenario could be that you need to submit lots of jobs but do not want to have all of them running at the same time or affect other colleagues that belong to the same experiment.
For a dummy example, I want to submit 100 jobs but I only want 10 to run concurrently at most. Add max_materialize in your submit file:
max_materialize=10
$ condor_q 8116114 -- Schedd: submit01.pic.es : <193.109.174.82:9618?... @ 02/08/24 13:27:09 OWNER BATCH_NAME SUBMITTED DONE RUN IDLE TOTAL JOB_IDS cacosta ID: 8116114 2/8 13:26 _ 10 _ 100 8116114.0-9 Total for query: 10 jobs; 0 completed, 0 removed, 0 idle, 10 running, 0 held, 0 suspended Total for all users: 67 jobs; 0 completed, 0 removed, 1 idle, 59 running, 7 held, 0 suspended
There are 100 jobs in total and 10 run, no more.
There is a special way to query this kind of submission:
$ condor_q 8116114 -factory -- Schedd: submit01.pic.es : <193.109.174.82:9618?... @ 02/08/24 13:41:05 ID OWNER SUBMITTED LIMIT PRESNT RUN IDLE HOLD NEXTID MODE DIGEST 8116114. cacosta 2/8 13:26 10 10 10 0 0 20 Norm /var/lib/condor/spool/6114/condor_submit.8116114.digest
max_idle
There is also the max_idle option, to control the number of jobs idle.
max_idle=10
The same example as before, 100 jobs.
$ condor_q -- Schedd: submit01.pic.es : <193.109.174.82:9618?... @ 02/08/24 14:30:03 OWNER BATCH_NAME SUBMITTED DONE RUN IDLE TOTAL JOB_IDS cacosta ID: 8116197 2/8 14:29 _ 5 10 100 8116197.0-14 Total for query: 15 jobs; 0 completed, 0 removed, 10 idle, 5 running, 0 held, 0 suspended Total for cacosta: 15 jobs; 0 completed, 0 removed, 10 idle, 5 running, 0 held, 0 suspended Total for all users: 133 jobs; 0 completed, 0 removed, 30 idle, 96 running, 7 held, 0 suspended
There are always 10 jobs idle, but it is the only constraint, so, at the end, the 100 jobs can run:
$ condor_q -factory -- Schedd: submit01.pic.es : <193.109.174.82:9618?... @ 02/08/24 14:36:42 ID OWNER SUBMITTED LIMIT PRESNT RUN IDLE HOLD NEXTID MODE DIGEST 8116197. cacosta 2/8 14:29 100 100 100 0 0 100 Norm /var/lib/condor/spool/6197/condor_submit.8116197.digest
Monitoring your jobs
The basic tools to monitor your jobs by command line interface are condor_q (the principal one), condor_wait, condor_history, condor_tail and condor_ssh_to_job.
condor_q
The condor_q command is the one to query the queue of the schedd [7]. As other commands in HTcondor, condor_q allows to specify clearly what you want using the "-constraint" option to filter the Job Attributes you want to query. To know all the jobs attributes, you use condor_q -l job_id. Furthermore, you can tune the output with "-autoformat" or "-af" option.
It is better to show the potential of condor_q in an example:
$ condor_q -const "RequestCpus > 1 && JobStatus == 1" -nobatch -- Schedd: submit01.pic.es : <193.109.174.82:9618?... @ 03/14/19 09:42:46 ID OWNER SUBMITTED RUN_TIME ST PRI SIZE CMD 630.0 testuser 3/14 09:42 0+00:00:00 I 0 0.0 test.sh --cpu 1 --timeout 10s 630.1 testuser 3/14 09:42 0+00:00:00 I 0 0.0 test.sh --cpu 1 --timeout 10s 630.2 testuser 3/14 09:42 0+00:00:00 I 0 0.0 test.sh --cpu 1 --timeout 10s 630.3 testuser 3/14 09:42 0+00:00:00 I 0 0.0 test.sh --cpu 1 --timeout 10s 630.4 testuser 3/14 09:42 0+00:00:00 I 0 0.0 test.sh --cpu 1 --timeout 10s Total for query: 5 jobs; 0 completed, 0 removed, 5 idle, 0 running, 0 held, 0 suspended Total for testuser: 5 jobs; 0 completed, 0 removed, 5 idle, 0 running, 0 held, 0 suspended Total for all users: 7 jobs; 0 completed, 0 removed, 5 idle, 2 running, 0 held, 0 suspended
We use the constraint (-const) to filter our jobs. Here, the query filter for the jobs that request more than one cpu, that are Idle (JobStatus ==1) and using -nobatch the command shows your jobs ungrouped.
On the other hand, filtering the same jobs, we can decide the format of the output just showing the job attributes we are interested in.
$ condor_q -const "RequestCpus > 1 && JobStatus == 1" -nobatch -af ClusterId ProcId RequestCpus RequestMemory 630 0 4 2048 630 1 4 2048 630 2 4 2048 630 3 4 2048 630 4 4 2048
analyze and better-analyze
Furthermore, the condor_q command allows the options -analyze and -better-analyze that show you the reason why your job is not running.
$ condor_q -analyze 146.0 -- Schedd: condor-ui01.pic.es : <193.109.175.231:9618?... The Requirements expression for job 146.000 is (TARGET.WN_property == ifThenElse(MY.WN_property is undefined,"default",MY.WN_property)) && (TARGET.Arch == "X86_64") && (TARGET.OpSys == "LINUX") && (TARGET.Disk >= RequestDisk) && (TARGET.Memory >= RequestMemory) && ((TARGET.FileSystemDomain == MY.FileSystemDomain) || (TARGET.HasFileTransfer)) 146.000: Job has not yet been considered by the matchmaker. 146.000: Run analysis summary ignoring user priority. Of 276 machines, 3 are rejected by your job's requirements 1 reject your job because of their own requirements 0 match and are already running your jobs 0 match but are serving other users 272 are able to run your job
$ condor_q -better-analyze 148 -- Schedd: condor-ui01.pic.es : <193.109.175.231:9618?... The Requirements expression for job 148.000 is (TARGET.WN_property == ifThenElse(MY.WN_property is undefined,"default",MY.WN_property)) && (TARGET.Arch == "X86_64") && (TARGET.OpSys == "LINUX") && (TARGET.Disk >= RequestDisk) && (TARGET.Memory >= RequestMemory) && ((TARGET.FileSystemDomain == MY.FileSystemDomain) || (TARGET.HasFileTransfer)) Job 148.000 defines the following attributes: FileSystemDomain = "condor-ui01.pic.es" RequestCpus = 1 RequestDisk = 20971520 * RequestCpus RequestMemory = 2048 * RequestCpus The Requirements expression for job 148.000 reduces to these conditions: Slots Step Matched Condition ----- -------- --------- [0] 3761 TARGET.WN_property == ifThenElse(MY.WN_property is undefined,"default",MY.WN_property) [5] 277 TARGET.Disk >= RequestDisk [6] 274 [0] && [5] [7] 3744 TARGET.Memory >= RequestMemory [8] 253 [6] && [7] [10] 3765 TARGET.HasFileTransfer 148.000: Job has not yet been considered by the matchmaker. 148.000: Run analysis summary ignoring user priority. Of 276 machines, 3 are rejected by your job's requirements 1 reject your job because of their own requirements 0 match and are already running your jobs 0 match but are serving other users 272 are able to run your job
The jobs in these examples are not considered by the matchmaker yet but you can see that there are 272 machines available that can run your job.
Useful Job Attributes
There are several Job Attributes in your job. Here you have a list of few of them:
- JobStatus. Number indicating the status of your job. Relevant Job Status numbers:
JobStatus | Name | Symbol | Description |
---|---|---|---|
1 | Idle | I | Job is idle, queued waiting for resources |
2 | Running | R | Job is running |
3 | Removed | X | Job has been removed by user or admin |
4 | Completed | C | Job is completed |
5 | Held | H | Job is in hold state, it will not be scheduled until released |
- RemoteHost. The WN where the jobs are running.
- ResidentSetSize_RAW. The maximum observed physical memory consumed by the job in KiB while running.
- DiskUsage_RAW. The maximum observed disk usage by the job in KiB while running.
- RemoteUserCpu. Total number of seconds of user CPU time the job has used.
- Owner. The submitter user.
- OwnerGroup. Main group or experiment of the submitter user.
There are other HTCondor "magic" numbers that you can consult [8].
condor_wait
The condor_wait command allows us to watch and extract information from the user log file [9]. This command waits forever until the job is finished unless a wait time is specified (with -wait option). Furthermore, as condor_wait monitors the log file, it requires a job successfully submitted to be executed.
It is not as useful as condor_q but can give you information about several jobs if you collect them in the same log file.
For instance, monitoring 3 jobs:
$ condor_wait -status log-test.log 176.0.0 submitted 176.1.0 submitted 176.2.0 submitted 176.1.0 executing on host <192.168.100.15:9618?addrs=192.168.100.15-9618+[2001-67c-1148-301--213]-9618&noUDP&sock=24729_9d98_3> 176.2.0 executing on host <192.168.100.173:9618?addrs=192.168.100.173-9618+[2001-67c-1148-301--73]-9618&noUDP&sock=14012_a51f_3> 176.0.0 executing on host <192.168.101.68:9618?addrs=192.168.101.68-9618&noUDP&sock=2583_4d37_3> 176.1.0 completed 176.2.0 completed 176.0.0 completed All jobs done.
condor_history
Once the job is removed from the queue, you can query it using condor_history [10]. The condor_history command allows similar constraints than condor_q. The jobs kept in history for remote access are limited, thus, if you need information for an old job that does not appear in your condor_history query, please ask your contact.
It is recommended to use the option "-limit N" where N is the number of jobs you want to query to perform faster queries.
$ condor_history -const 'Owner == "testuser" && JobStatus ==4' -limit 5 ID OWNER SUBMITTED RUN_TIME ST COMPLETED CMD 78.0 testuser 3/14 10:05 0+00:01:42 C 3/14 10:07 /home/testuser/condor/test-local/remote/test.sh --cpu 1 --timeout 100s 76.0 testuser 3/14 10:03 0+00:01:41 C 3/14 10:04 /home/testuser/condor/test-local/remote/test.sh --cpu 1 --timeout 100s 75.0 testuser 3/14 09:59 0+00:01:45 C 3/14 10:01 /home/testuser/condor/test-local/remote/test.sh --cpu 1 --timeout 100s 74.0 testuser 3/14 09:13 0+00:00:25 C 3/14 09:14 /home/testuser/condor/test-local/remote/test-output.sh 73.0 testuser 3/14 09:07 0+00:00:22 C 3/14 09:08 /home/testuser/condor/test-local/remote/test-output.sh
condor_tail
The condor_tail command checks the standard output and error of job while this is running [11].
$ condor_tail -f 65.0 stress: info: [9] dispatching hogs: 1 cpu, 0 io, 0 vm, 0 hdd
The default is to check the standard output and the "-f" option, follow, do not acts anymore as tail -f Linux command, it tail the file until interrupted and shows the last 1024 bytes. To check the standard error you need to use "-stderr" option.
Take into account that condor_tails needs the ClusterId.ProcId of the job to work correctly.
condor_ssh_to_job
Finally, another way to monitor how your job is evolving is the condor_ssh_to_job command [12]. Using this command, the user can enter into the job directory of the node and check what is happening.
$ condor_ssh_to_job 66.0 Welcome to slot1_7@td622.pic.es! Your condor job is running with pid(s) 577 1163. $ ls condor_exec.exe _condor_stderr _condor_stdout out.txt tmp var
Removing your jobs
To remove your jobs, you have to use condor_rm command [13].
The most common way to use condor_rm is just specifying the ClusterId and/or the ProcId of your job. For instance, we have a batch of 4 jobs.
$ condor_q 154 -nobatch -- Schedd: condor-ui01.pic.es : <193.109.175.231:9618?... @ 04/01/19 15:37:55 ID OWNER SUBMITTED RUN_TIME ST PRI SIZE CMD 154.0 testuser 4/1 15:37 0+00:00:00 I 0 0.0 test.sh --timeout 600s --vm 2 --vm-bytes 2G 154.1 testuser 4/1 15:37 0+00:00:00 I 0 0.0 test.sh --timeout 600s --vm 2 --vm-bytes 2G 154.2 testuser 4/1 15:37 0+00:00:00 I 0 0.0 test.sh --timeout 600s --vm 2 --vm-bytes 2G 154.3 testuser 4/1 15:37 0+00:00:00 I 0 0.0 test.sh --timeout 600s --vm 2 --vm-bytes 2G Total for query: 4 jobs; 0 completed, 0 removed, 4 idle, 0 running, 0 held, 0 suspended Total for all users: 7 jobs; 0 completed, 2 removed, 4 idle, 0 running, 1 held, 0 suspended
Use ClusterId.ProcId to remove just one job of the cluster:
$ condor_rm 154.2 Job 154.2 marked for removal
Use ClusterId to remove all the jobs in a cluster:
$ condor_rm 154 All jobs in cluster 154 have been marked for removal
You can use constraints to remove all the jobs that meet any condition.
$ condor_rm -const 'RequestCpus > 1' All jobs matching constraint (RequestCpus > 1) have been marked for removal
Or you can remove all your jobs just using the "-all" option.
From Torque to HTCondor
Although you can find more commands in HTCondor than the ones you are used in Torque/PBS, the most common commands in Torque (qsub, qdel and qstat) have their equivalent in HTCondor.
Torque | HTCondor | Description |
---|---|---|
qsub | condor_submit | To submit jobs to the farm |
qdel | condor_rm | To remove your job or all the jobs from an user |
qstat | condor_q | To query the state of your jobs |
HTCondor has a powerful language to query the pool and more and interesting options to monitor your job that has not their equivalent in Torque/Maui (condor_history for instance), thus, do not hesitate to look into the HTCondor documentation to create your queries and learn more about the commands.
Remember that there is no concept of differentiated queues with static requirements in HTCondor. Therefore, executing condor_q you are querying your jobs in the whole schedd, do not expect the different queues showed as with qstat command. At PIC we use the flavour option to limit the walltime of the jobs in a similar way as they were limited by the queues in Torque/Maui.
Issues
Why are my jobs held?
There are several reasons why your jobs can be held. You can ask HTCondor why your jobs are held using the next command:
$ condor_q -const 'JobStatus == 5' -af HoldReason
There are at least 3 situations when your job is held that is controlled by different Periodic Hold checkings that are controlled by the administrator.
Your job walltime XXs exceeded the Maximum Walltime XXs
The HoldReason indicates that your job has exceeded the maximum walltime for its flavour. Remember that the maximum time your job has been running (10800, 172800 or 345600 seconds depending in your flavour, short, medium or long respectively) can be controlled requesting the flavours.
Your job memory XXMB exceeded the Job Memory Limit XXMB
Thus, in that case, your job is consuming more memory than requested (exceeding 1.5 the memory requested!). You should resubmit your job increasing the memory requested.
CPU usage exceeded request_cpus
The HoldReason, in this case, means that your job is taking more cpus than requesting for the last hour. You should resubmit your job increasing the number of requested cpus.
Why are my jobs idle?
This is one of the most typical questions when you submit jobs to a batch system. You have always to assume that your job will be idle for some time and this time is not always the same, it depends in the requirements of your jobs, the share of your group and the total use of the farm.
Typically, if your needs are smaller (1 CPU and 1 GB of RAM for instance) your job will be idle for less time than if you need 8 CPUs or more than 2 GB RAM per core.
Anyway, remember that you have the options -analyze and -better-analyze in condor_q command that gives you information about why your job is idle (without taking into account the priority). These options are especially useful to detect jobs that are Idle because the requirements are not met.
https://pwiki.pic.es/index.php?title=HTCondor_User_Guide#analyze_and_better-analyze
When your job is requirements cannot match the WorkerNodes available, you will see a WARNING message like this using -analyze or better-analyze
$ condor_q -analyze 267201 -- Schedd: submit01.pic.es : <193.109.174.82:9618?... The Requirements expression for job 267201.000 is (TARGET.WN_property == ifThenElse(MY.WN_property is undefined,"default",MY.WN_property)) && (TARGET.Arch == "X86_64") && (TARGET.OpSys == "LINUX") && (TARGET.Disk >= RequestDisk) && (TARGET.Memory >= RequestMemory) && (TARGET.Cpus >= RequestCpus) && ((TARGET.FileSystemDomain == MY.FileSystemDomain) || (TARGET.HasFileTransfer)) No successful match recorded. Last failed match: Fri Feb 7 16:58:04 2020 Reason for last match failure: no match found 267201.000: Run analysis summary ignoring user priority. Of 338 machines, 338 are rejected by your job's requirements 0 reject your job because of their own requirements 0 match and are already running your jobs 0 match but are serving other users 0 are able to run your job WARNING: Be advised: No machines matched the jobs's constraints
AlmaLinux 9 migration
AlmaLinux 9 is the default OS for all the WorkerNodes of our cluster since 2024/03/27.
- Gneral ui.pic.es are now on AlmaLinux 9
- Do not use +WN_property="alma9" anymore. If you use it, your jobs will stay idle forever.
- Remember that you can use the apptainer image to run inside a container using the option:
+SingularityImage = "/opt/apptainer-images/pic-centos7.sif"
References and links
You can find many documentation about HTCondor. Here you have a list of useful links from this manual.
- [2] condor_submit: https://htcondor.readthedocs.io/en/9_0/users-manual/submitting-a-job.html
- [3] Queue command: https://htcondor.readthedocs.io/en/v9_0/users-manual/submitting-a-job.html
- [4] Condo_prio command: https://htcondor.readthedocs.io/en/v9_0/man-pages/condor_prio.html
- [5] DAGMAN workflows: https://htcondor.readthedocs.io/en/v9_0/users-manual/dagman-workflows.html
- [6] Submitting a lot of jobs: https://htcondor.readthedocs.io/en/latest/users-manual/submitting-a-job.html#submitting-lots-of-jobs
- [8] HTCondor magic numbers: https://htcondor-wiki.cs.wisc.edu/index.cgi/wiki?p=MagicNumbers
- [9] condor_wait: https://htcondor.readthedocs.io/en/v9_0/man-pages/condor_wait.html
- [10] condor_history: https://htcondor.readthedocs.io/en/v9_0/man-pages/condor_history.html
- [11] condor_tail: https://htcondor.readthedocs.io/en/v9_0/man-pages/condor_tail.html
- [12] condor_ssh_to_job: https://htcondor.readthedocs.io/en/v9_0/man-pages/condor_ssh_to_job.html
- HTcondor User Guide Tutorial: https://docs.google.com/presentation/d/1-64fEcfLyxLzSpZH-tbV-SjMAc0CvMpWhE0I4oaDKkQ/edit?usp=sharing
- HTCondor Tutorial examples: https://github.com/PortdInformacioCientifica/htcondor-tutorial