From 91e64eb13a61eb550919f195014106f9a9ae213c Mon Sep 17 00:00:00 2001 From: dave <dave@dtu.dk> Date: Fri, 26 Aug 2016 16:07:02 +0200 Subject: [PATCH] updated how-to-dlcs docs --- docs/howto-make-dlcs.md | 69 +++++++++++++++++++++++++++++++---------- 1 file changed, 53 insertions(+), 16 deletions(-) diff --git a/docs/howto-make-dlcs.md b/docs/howto-make-dlcs.md index 46db2adf..d6a92f59 100644 --- a/docs/howto-make-dlcs.md +++ b/docs/howto-make-dlcs.md @@ -389,24 +389,53 @@ number of cpu's requested (using ```-c``` or ```--nr_cpus```) and minimum of required free cpu's on the cluster (using ```--cpu_free```, 48 by default). Jobs will be launched after a predefined sleep time (as set by the ```--tsleep``` option, and set to 5 seconds by default). After the initial sleep -time a new job will be launched every 0.1 second. If the launch condition is not -met (```nr_cpus > cpu's used by user AND cpu's free on cluster > cpu_free```), -the program will wait 5 seconds before trying to launch a new job again. +time a new job will be launched every 0.5 second. If the launch condition is not +met: + +``` +nr_cpus > cpu's used by user +AND cpu's free on cluster > cpu_free +AND jobs queued by user < cpu_user_queue) +``` + +the program will sleep 5 seconds before trying to launch a new job again. Depending on the amount of jobs and the required computation time, it could take a while before all jobs are launched. When running the launch script from the login node, this might be a problem when you have to close your ssh/putty -session before all jobs are launched. In that case the user should use a -dedicated compute node for launching jobs. To run the launch script on a -compute instead of the login node, use the ```--node``` option. You can inspect -the progress in the ```launch_scheduler_log.txt``` file. +session before all jobs are launched. In that case the user can use the +```--crontab``` argument: it will trigger the ```launch.py``` script every 5 +minutes to check if more jobs can be launched until all jobs have been +executed. The user does not need to have an active ssh/putty session for this to +work. You can follow the progress and configuration of ```launch.py``` in +crontab mode in the following files: + +* ```launch_scheduler_log.txt``` +* ```launch_scheduler_config.txt```: you can change your launch settings on the fly +* ```launch_scheduler_state.txt``` +* ```launch_pbs_filelist.txt```: remaining jobs, when a job is launched it is +removed from this list + +You can check if ```launch.py``` is actually active as a crontab job with: + +``` +crontab -l +``` + +```launch.py``` will clean-up the crontab after all jobs are launched, but if +you need to prevent it from launching new jobs before that, you can clean up your +crontab with: + +``` +crontab -r +``` -The ```launch.py``` script has some different options, and you can read about +The ```launch.py``` script has various different options, and you can read about them by using the help function (the output is included for your convenience): ```bash g-000 $ launch.py --help -usage: +Usage: launch.py -n nr_cpus @@ -414,7 +443,7 @@ launch.py --crontab when running a single iteration of launch.py as a crontab jo File list is read from "launch_pbs_filelist.txt", and the configuration can be changed on the fly by editing the file "launch_scheduler_config.txt". -options: +Options: -h, --help show this help message and exit --depend Switch on for launch depend method -n NR_CPUS, --nr_cpus=NR_CPUS @@ -425,7 +454,11 @@ options: full pbs file path. Escape backslashes! By default it will select all *.p files in pbs_in/. --dry dry run: do not alter pbs files, do not launch - --tsleep=TSLEEP Sleep time [s] after qsub command. Default=5 seconds + --tsleep=TSLEEP Sleep time [s] when cluster is too bussy to launch new + jobs. Default=5 seconds + --tsleep_short=TSLEEP_SHORT + Sleep time [s] between between successive job + launches. Default=0.5 seconds. --logfile=LOGFILE Save output to file. -c, --cache If on, files are read from cache --cpu_free=CPU_FREE No more jobs will be launched when the cluster does @@ -436,15 +469,19 @@ options: No more jobs will be launched after having cpu_user_queue number of jobs in the queue. This prevents users from filling the queue, while still - allowing to aim for a high cpu_free target. - Default=500 + allowing to aim for a high cpu_free target. Default=5 --qsub_cmd=QSUB_CMD Is set automatically by --node flag --node If executed on dedicated node. Although this works, consider using --crontab instead. Default=False --sort Sort pbs file list. Default=False - --crontab Crontab mode. Implies --cache, and not compatible with - --node. When all jobs are done, crontab -r will remove - all the user's current crontab jobs. Default=False + --crontab Crontab mode: %prog will check every 5 (default) + minutes if more jobs can be launched. Not compatible + with --node. When all jobs are done, crontab -r will + remove all existing crontab jobs of the current user. + Use crontab -l to inspect current crontab jobs, and + edit them with crontab -e. Default=False + --every_min=EVERY_MIN + Crontab update interval in minutes. Default=5 --debug Debug print statements. Default=False ``` -- GitLab