From 501f1263e3dc0e0e7105aad25dfe96fcaaf4825c Mon Sep 17 00:00:00 2001
From: David Robert Verelst <dave@dtu.dk>
Date: Thu, 1 Mar 2018 15:40:20 +0100
Subject: [PATCH] add docs for launch.py as taken from wetb

---
 docs/launch.md | 139 +++++++++++++++++++++++++++++++++++++++++++++++++
 1 file changed, 139 insertions(+)
 create mode 100644 docs/launch.md

diff --git a/docs/launch.md b/docs/launch.md
new file mode 100644
index 0000000..9869f8b
--- /dev/null
+++ b/docs/launch.md
@@ -0,0 +1,139 @@
+Launching the jobs on the cluster
+---------------------------------
+
+Use ssh (Linux, Mac) or putty (MS Windows) to connect to the cluster.
+
+The ```launch.py``` is a generic tool that helps with launching an arbitrary
+number of pbs launch script on a PBS Torque cluster. Launch scripts here
+are defined as files with a ```.p``` extension. The script will look for any
+```.p``` files in a specified folder (```pbs_in/``` by default, which the user
+can change using the  ```-p``` or ```--path_pbs``` flag) and save them in a
+file list called ```pbs_in_file_cache.txt```. When using the option ```-c``` or
+```--cache```, the script will not look for pbs files, but instead read them
+directly from the ```pbs_in_file_cache.txt``` file.
+
+The launch script has a simple build in scheduler that has been successfully
+used to launch 50.000 jobs. This scheduler is configured by two parameters:
+number of cpu's requested (using ```-c``` or ```--nr_cpus```) and minimum
+of required free cpu's on the cluster (using ```--cpu_free```, 48 by default).
+Jobs will be launched after a predefined sleep time (as set by the
+```--tsleep``` option, and set to 5 seconds by default). After the initial sleep
+time a new job will be launched every 0.5 second. If the launch condition is not
+met:
+
+```
+nr_cpus > cpu's used by user
+AND cpu's free on cluster > cpu_free
+AND jobs queued by user < cpu_user_queue
+```
+
+the program will sleep 5 seconds before trying to launch a new job again.
+
+Depending on the amount of jobs and the required computation time, it could
+take a while before all jobs are launched. When running the launch script from
+the login node, this might be a problem when you have to close your ssh/putty
+session before all jobs are launched. In that case the user can use the
+```--crontab``` argument: it will trigger the ```launch.py``` script every 5
+minutes to check if more jobs can be launched until all jobs have been
+executed. The user does not need to have an active ssh/putty session for this to
+work. You can follow the progress and configuration of ```launch.py``` in
+crontab mode in the following files:
+
+* ```launch_scheduler_log.txt```
+* ```launch_scheduler_config.txt```: you can change your launch settings on the fly
+* ```launch_scheduler_state.txt```
+* ```launch_pbs_filelist.txt```: remaining jobs, when a job is launched it is
+removed from this list
+
+You can check if ```launch.py``` is actually active as a crontab job with:
+
+```
+crontab -l
+```
+
+```launch.py``` will clean-up the crontab after all jobs are launched, but if
+you need to prevent it from launching new jobs before that, you can clean up your
+crontab with:
+
+```
+crontab -r
+```
+
+The ```launch.py``` script has various different options, and you can read about
+them by using the help function (the output is included for your convenience):
+
+```bash
+g-000 $ launch.py --help
+Usage:
+
+launch.py -n nr_cpus
+
+launch.py --crontab when running a single iteration of launch.py as a crontab job every 5 minutes.
+File list is read from "launch_pbs_filelist.txt", and the configuration can be changed on the fly
+by editing the file "launch_scheduler_config.txt".
+
+Options:
+  -h, --help            show this help message and exit
+  --depend              Switch on for launch depend method
+  -n NR_CPUS, --nr_cpus=NR_CPUS
+                        number of cpus to be used
+  -p PATH_PBS_FILES, --path_pbs_files=PATH_PBS_FILES
+                        optionally specify location of pbs files
+  --re=SEARCH_CRIT_RE   regular expression search criterium applied on the
+                        full pbs file path. Escape backslashes! By default it
+                        will select all *.p files in pbs_in/.
+  --dry                 dry run: do not alter pbs files, do not launch
+  --tsleep=TSLEEP       Sleep time [s] when cluster is too bussy to launch new
+                        jobs. Default=5 seconds
+  --tsleep_short=TSLEEP_SHORT
+                        Sleep time [s] between between successive job
+                        launches. Default=0.5 seconds.
+  --logfile=LOGFILE     Save output to file.
+  -c, --cache           If on, files are read from cache
+  --cpu_free=CPU_FREE   No more jobs will be launched when the cluster does
+                        not have the specified amount of cpus free. This will
+                        make sure there is room for others on the cluster, but
+                        might mean less cpus available for you. Default=48
+  --cpu_user_queue=CPU_USER_QUEUE
+                        No more jobs will be launched after having
+                        cpu_user_queue number of jobs in the queue. This
+                        prevents users from filling the queue, while still
+                        allowing to aim for a high cpu_free target. Default=5
+  --qsub_cmd=QSUB_CMD   Is set automatically by --node flag
+  --node                If executed on dedicated node. Although this works,
+                        consider using --crontab instead. Default=False
+  --sort                Sort pbs file list. Default=False
+  --crontab             Crontab mode: %prog will check every 5 (default)
+                        minutes if more jobs can be launched. Not compatible
+                        with --node. When all jobs are done, crontab -r will
+                        remove all existing crontab jobs of the current user.
+                        Use crontab -l to inspect current crontab jobs, and
+                        edit them with crontab -e. Default=False
+  --every_min=EVERY_MIN
+                        Crontab update interval in minutes. Default=5
+  --debug               Debug print statements. Default=False
+
+```
+
+Then launch the actual jobs (each job is a ```*.p``` file in ```pbs_in```) using
+100 cpu's:
+
+```bash
+g-000 $ cd /mnt/mimer/hawc2sim/demo/A0001
+g-000 $ launch.py -n 100 -p pbs_in/
+```
+
+If the launching process requires hours, and you have to close you SHH/PuTTY
+session before it reaches the end, you can either use the ```--node``` or the
+```--crontab``` argument. When using ```--node```, ```launch.py``` will run on
+a dedicated cluster node, submitted as a PBS job. When using ```--crontab```,
+```launch.py``` will be run once every 5 minutes as a ```crontab``` job on the
+login node. This is preferred since you are not occupying a node with a very
+simple and light job. ```launch.py``` will remove all the users crontab jobs
+at the end with ```crontab -r```.
+
+```bash
+g-000 $ cd /mnt/mimer/hawc2sim/demo/A0001
+g-000 $ launch.py -n 100 -p pbs_in/ --crontab
+```
+
-- 
GitLab