Title: | A Lightweight Wrapper for 'Slurm' |
---|---|
Description: | 'Slurm', Simple Linux Utility for Resource Management <https://slurm.schedmd.com/>, is a popular 'Linux' based software used to schedule jobs in 'HPC' (High Performance Computing) clusters. This R package provides a specialized lightweight wrapper of 'Slurm' with a syntax similar to that found in the 'parallel' R package. The package also includes a method for creating socket cluster objects spanning multiple nodes that can be used with the 'parallel' package. |
Authors: | George Vega Yon [aut, cre] , Paul Marjoram [ctb, ths] , National Cancer Institute (NCI) [fnd] (Grant Number 5P01CA196569-02), Michael Schubert [rev] (JOSS reviewer, <https://orcid.org/0000-0002-6862-5221>), Michel Lang [rev] (JOSS reviewer, <https://orcid.org/0000-0001-9754-0393>) |
Maintainer: | George Vega Yon <[email protected]> |
License: | MIT + file LICENSE |
Version: | 0.5-4 |
Built: | 2024-11-05 05:27:58 UTC |
Source: | https://github.com/uscbiostats/slurmr |
When submitting array jobs using sbatch
, users can specify indices in several
ways. These could be specified as, for example, ranges, "1-9
", lists,
"1,2,5
", or intervals as "1-7:3
", which translates into "1, 4, 7
". This
function expands those cases.
expand_array_indexes(x)
expand_array_indexes(x)
x |
A character vector. Array indexes (see details). |
x
is assumed to be in the form of [jobid](_[array expression])
,
where the expression after the underscore is optional. The first
The function will return an expanded version of this, e.g. if x = "8123_[1,3-6]"
the resulting expression will be the vector "8123_1", "8123_3", "8123_4",
"8123_5", and "8123_6".
This function was developed mainly to be used internally.
A character vector with the expanded indices.
expand_array_indexes(c("512", "123_1", "55_[1-5]", "122_[1, 5-6]", "44_[1-3:2]")) # [1] "512" "123_1" "55_1" "55_2" "55_3" "55_4" "55_5" # "122_1" "122_5" "122_6" "44_1" "44_3"
expand_array_indexes(c("512", "123_1", "55_[1-5]", "122_[1, 5-6]", "44_[1-3:2]")) # [1] "512" "123_1" "55_1" "55_2" "55_3" "55_4" "55_5" # "122_1" "122_5" "122_6" "44_1" "44_3"
This data frame contains information regarding the job state codes that Slurm
returns when querying the status of a given job. The last column, type
,
shows a description of how that corresponding state is considered in the
package's various operations. This is used in the function status.
JOB_STATE_CODES
JOB_STATE_CODES
A data frame with 24 rows and 4 columns.
Slurm's website https://slurm.schedmd.com/squeue.html
This function is essentially a wrapper of the function parallel::makePSOCKcluster.
makeSlurmCluster
main feature is adding node addresses.
makeSlurmCluster( n, job_name = random_job_name(), tmp_path = opts_slurmR$get_tmp_path(), cluster_opt = list(), max_wait = 300L, verb = TRUE, ... ) ## S3 method for class 'slurm_cluster' stopCluster(cl)
makeSlurmCluster( n, job_name = random_job_name(), tmp_path = opts_slurmR$get_tmp_path(), cluster_opt = list(), max_wait = 300L, verb = TRUE, ... ) ## S3 method for class 'slurm_cluster' stopCluster(cl)
n |
Integer scalar. Size of the cluster object (see details). |
job_name |
Character. Name of the job to be passed to |
tmp_path |
Character. Path to the directory where all the data (including scripts) will be stored. Notice that this path must be accessible by all the nodes in the network (See opts_slurmR). |
cluster_opt |
A list of arguments passed to parallel::makePSOCKcluster. |
max_wait |
Integer scalar. Wait time before exiting with error while trying to read the nodes information. |
verb |
Logical scalar. If |
... |
Further arguments passed to Slurm_EvalQ via |
cl |
An object of class |
By default, if the time
option is not specified via ...
,
then it is set to the value 01:00:00
, this is, 1 hour.
Once a job is submitted via Slurm, the user gets access to the nodes associated with it, which allows users to star new processes within those. By means of this, we can create Socket, also known as "PSOCK", clusters across nodes in a Slurm environment. The name of the hosts are retrieved and passed later on to parallel::makePSOCKcluster.
It has been the case that R fails to create the cluster with the following message in the Slurm log file:
srun: fatal: SLURM_MEM_PER_CPU, SLURM_MEM_PER_GPU, and SLURM_MEM_PER_NODE are mutually exclusive
In such cases, setting the memory, for example, upfront can solve the problem. For example:
cl <- makeSlurmCluster(20, mem = 20)
If the problem persists, i.e., the cluster cannot be created, make sure that your Slurm cluster allows Socket connections between nodes.
The method stopCluster
for slurm_cluster
stops the cluster doing
the following:
Closes the connection by calling the stopCluster
method for PSOCK
objects.
Cancel the Slurm job using scancel
.
A object of class c("slurm_cluster", "SOCKcluster", "cluster")
. It
is the same as what is returned by parallel::makePSOCKcluster with the main
difference that it has two extra attributes:
SLURM_JOBID
Which is the id of the Job that initialized that cluster.
By default, R limits the number of simultaneous connections (see this thread
in R-sig-hpc https://stat.ethz.ch/pipermail/r-sig-hpc/2012-May/001373.html)
Current maximum is 128 (R version 3.6.1). To modify that limit, you would need
to reinstall R updating the macro NCONNECTIONS
in the file src/main/connections.c
.
For now, if the user sets n
above 128 it will get an immediate warning
pointing to this issue, in particular, specifying that the cluster object
may not be able to be created.
## Not run: # Creating a cluster with 100 workers/offpring/child R sessions cl <- makeSlurmCluster(100) # Computing the mean of a 100 random uniforms within each worker # for this we can use any of the function available in the parallel package. ans <- parSapply(1:200, function(x) mean(runif(100))) # We simply call stopCluster as we would do with any other cluster # object stopCluster(ans) # We can also specify SBATCH options directly (...) cl <- makeSlurmCluster(200, partition = "thomas", time = "02:00:00") stopCluster(cl) ## End(Not run)
## Not run: # Creating a cluster with 100 workers/offpring/child R sessions cl <- makeSlurmCluster(100) # Computing the mean of a 100 random uniforms within each worker # for this we can use any of the function available in the parallel package. ans <- parSapply(1:200, function(x) mean(runif(100))) # We simply call stopCluster as we would do with any other cluster # object stopCluster(ans) # We can also specify SBATCH options directly (...) cl <- makeSlurmCluster(200, partition = "thomas", time = "02:00:00") stopCluster(cl) ## End(Not run)
This function will create an object of class slurmR_rscript
that can be used
to write the R component in a batch job.
new_rscript( njobs, tmp_path, job_name, pkgs = list_loaded_pkgs(), libPaths = .libPaths() )
new_rscript( njobs, tmp_path, job_name, pkgs = list_loaded_pkgs(), libPaths = .libPaths() )
njobs |
Integer. Number of jobs to use in the job-array. This specifies the number of R sessions to initialize. This does not specify the number of cores to be used. |
tmp_path |
Character. Path to the directory where all the data (including scripts) will be stored. Notice that this path must be accessible by all the nodes in the network (See opts_slurmR). |
job_name |
Character. Name of the job to be passed to |
pkgs |
A named list with packages to be included. Each element of the list must be a path to the R library, while the names of the list are the names of the R packages to be loaded. |
libPaths |
A character vector. See .libPaths. |
An environment of class slurmR_rscript
. This has the following accessible
components:
add_rds
Add rds files to be loaded in each job.", x
is a named list
with the objects that should be loaded in the jobs. If index = TRUE
the
function assumes that the user will be accessing a particular subset of x
during the job, which is accessed according to INDICES[[ARRAY_ID]]
. The
option compress
is passed to saveRDS.
One important side effect is that when this function is called, the object
will be saved in the current job directory, this is opts_slurmR$get_tmp_path()
.
append
Adds a line to the R script. Its only argument, x
is a character
vector that will be added to the R script.
rscript
A character vector. This is the actual R script that will be written
at the end.
finalize
Adds the final line of the R script. This function receives a
character scalar x
which is used as the name of the object to be saved.
If missing, the function will save a NULL object. The compress
argument
is passed to saveRDS.
set_seed
Adds a vector of seeds to be used across the jobs. This vector
of seeds should be of length njobs
. The other two parameters of the
function are passed to set.seed. By default the seed is picked as follows:
seeds <- sample.int(.Machine$integer.max, njobs, replace = FALSE)
write
Finalizes the process by writing the R script in the corresponding
folder to be used with Slurm.
sbatch
and slurmR
internalsMost of the functions in the slurmR
package use tmp_path
and job-name
options to write and submit jobs to Slurm. These options have global
defaults that are set and retrieved using opts_slurmR
. These options
also include SBATCH options and things to do before calling RScript,
e.g., loading modules on an HPC cluster.
opts_slurmR
opts_slurmR
An object of class opts_slurmR
of length 17.
Whatever the path specified on tmp_path
, all nodes should have access to it.
Moreover, it is recommended to use a path located in a high-performing drive.
See for example disk staging.
The tmp_path
directory is only created at the time that one of the functions
needs to I/O files. Job creation calls like Slurm_EvalQ and Slurm_lapply
do such.
The "preamble" options can be specified if, for example, the current cluster
needs to load R, a compiler, or other programs via a module
command.
Current supported options are:
Debugging mode
debug_on : function ()
Activates the debugging mode. When active, jobs will be submitted using sh and not sbatch. Also, only a single chunk of the data will be processed.
debug_off : function ()
Deactivates the debugging mode.
get_debug : function ()
Returns TRUE of debug mode is on
Verbose mode
verbose_on : function ()
Deactivates the verbose mode. When ON, sbatch prints the Rscript and batch files on screen so that the user knows what will be submitted to Slurm.
verbose_off : function ()
Deactivates the verbose mode.
get_verbose : function ()
Returns TRUE if verbose mode is on.
Slurm options
set_tmp_path : function (path, recursive = TRUE)
Sets the tempfile path for I/O
get_tmp_path : function ()
Retrieves tempfile path for I/O
set_job_name : function (path, check = TRUE, overwrite = TRUE)
Changes the job-name. When changing the name of the job the function will check whether the folder chdir/job-name is empty or not. If empty/not created it will create it, otherwise it will delete its contents (if 'overwrite = TRUE“, else it will return with an Error)..
get_job_name : function (check = TRUE)
Returns the current value of 'job-name'.
set_preamble : function (...)
Sets "preamble" to the RScript call. For example, it could be used for loading modules, setting env variables, etc., needed during the R session. Options are passed as characters.
get_preamble : function ()
Returns the preamble, e.g., module loads, environment variable definitions, etc., that will be included in sbatch submissions.
Other options
get_cmd : function ()
If debug mode is active, then it returns 'sh', otherwise 'sbatch'
For general set/retrieve options
set_opts : function (...)
A generic function to set options.
get_opts_job : function (...)
A generic function to retrieve options for the job (Slurm).
get_opts_r : function (...)
A generic function to retrieve options in R.
Nuke
While reloading the package should reset all the options, if needed, the user
can also use the function opts_slurmR$reset()
.
# Common setup ## Not run: opts_slurmR$set_tmp_path("/staging/pdt/vegayon") opts_slurmR$set_job_name("simulations-1") opts_slurm$set_opts(partition="thomas", account="lc_pdt") opts_slurm$set_preamble("module load gcc")# if needed ## End(Not run)
# Common setup ## Not run: opts_slurmR$set_tmp_path("/staging/pdt/vegayon") opts_slurmR$set_job_name("simulations-1") opts_slurm$set_opts(partition="thomas", account="lc_pdt") opts_slurm$set_preamble("module load gcc")# if needed ## End(Not run)
Utility function
parse_flags(...) ## Default S3 method: parse_flags(...) ## S3 method for class 'list' parse_flags(x, ...)
parse_flags(...) ## Default S3 method: parse_flags(...) ## S3 method for class 'list' parse_flags(x, ...)
... |
Options to be parsed as bash flags. |
x |
A named list. |
A character vector with the processed flags.
Other utilities:
Slurm_clean()
,
Slurm_env()
,
Slurm_log()
,
WhoAmI()
,
snames()
,
status()
cat(parse_flags(a=1, b=TRUE, hola=2, y="I have spaces", ms=2, `cpus-per-task`=4)) # -a 1 -b --hola=2 -y "I have spaces" --ms=2 --cpus-per-task=4
cat(parse_flags(a=1, b=TRUE, hola=2, y="I have spaces", ms=2, `cpus-per-task`=4)) # -a 1 -b --hola=2 -y "I have spaces" --ms=2 --cpus-per-task=4
Generate a random job name
random_job_name()
random_job_name()
A character scalar that can be used as job. All names will start with
the prefix slurmr-job-
and then some random string. This is a wrapper of
the function tempfile()
and uses as tmpdir
argument
opts_slurmR$get_tmp_path()
.
random_job_name()
random_job_name()
Read a slurm batch file and capture the SBATCH options
read_sbatch(x)
read_sbatch(x)
x |
Character scalar. Either the path to the batch file to process, or a character vector. |
A named vector of the options starting with #SBATCH
in the file.
If no option is found, then returns a character vector length 0.
# Reading in an example script x <- system.file("example.slurm", package="slurmR") read_sbatch(x)
# Reading in an example script x <- system.file("example.slurm", package="slurmR") read_sbatch(x)
The functions sbatch
, scancel
, squeue
, sacct
, and slurm.conf
are
wrappers of calls to Slurm functions via system2.
slurm_available() squeue(x = NULL, ...) ## Default S3 method: squeue(x = NULL, ...) ## S3 method for class 'slurm_job' squeue(x, ...) scancel(x = NULL, ...) ## Default S3 method: scancel(x = NULL, ...) ## S3 method for class 'slurm_job' scancel(x = NULL, ...) sacct(x, ...) ## Default S3 method: sacct(x = NULL, brief = TRUE, parsable = TRUE, allocations = TRUE, ...) ## S3 method for class 'slurm_job' sacct(x, ...) slurm.conf() SchedulerParameters() sacct_(x = NULL, ..., no_sacct = FALSE) sbatch(x, wait = FALSE, submit = TRUE, ...) ## S3 method for class 'slurm_job' sbatch(x, wait = FALSE, submit = TRUE, ...) ## S3 method for class 'character' sbatch(x, wait = FALSE, submit = TRUE, ...)
slurm_available() squeue(x = NULL, ...) ## Default S3 method: squeue(x = NULL, ...) ## S3 method for class 'slurm_job' squeue(x, ...) scancel(x = NULL, ...) ## Default S3 method: scancel(x = NULL, ...) ## S3 method for class 'slurm_job' scancel(x = NULL, ...) sacct(x, ...) ## Default S3 method: sacct(x = NULL, brief = TRUE, parsable = TRUE, allocations = TRUE, ...) ## S3 method for class 'slurm_job' sacct(x, ...) slurm.conf() SchedulerParameters() sacct_(x = NULL, ..., no_sacct = FALSE) sbatch(x, wait = FALSE, submit = TRUE, ...) ## S3 method for class 'slurm_job' sbatch(x, wait = FALSE, submit = TRUE, ...) ## S3 method for class 'character' sbatch(x, wait = FALSE, submit = TRUE, ...)
x |
Either an object of class |
... |
Further flags passed to the command line function. |
brief , parsable , allocations
|
Logical. When |
no_sacct |
Logical. Skip |
wait |
Logical scalar. When |
submit |
Logical, when |
The function slurm_available
checks whether Slurm is available in
the system or not. It is usually called before calling any bash wrapper.
If available, the function will return TRUE
, otherwise FALSE
.
The wrapper of squeue includes the flag -o%all
which returns all
available fields separated by a vertical bar. This cannot be changed since it
is the easiest way of processing the information in R.
The function slurm.conf
is a wrapper of the function scontrol
that
returns configuration info about Slurm, in particular, the underlying command
that is called is scontrol show conf
. This returns a named character vector
with configuration info about the cluster. The name of this function matches
the name of the file that holds this information.
The function SchedulerParameters
is just a wrapper of slurm.conf.
It processes the field "SchedulerParameters" included in the configuration
file and has information relevant for the scheduler.
sacct.
is an alternative that works around when sacct
fails due to
lack of accounting on. This function is not intended for direct call.
In the case of sbatch
, function takes an object of class slurm_job
and
submits it to the queue. In debug mode the job will be submitted via sh
instead.
The method for character scalars is used to submit jobs using a slurm script.
In the case of sbatch
, depends on what x
is:
If x
is of class slurm_job, then it returns the same object including
the Slurm job ID (if the job was submitted to the queue).
If x
is a file path (e.g. a bash script), an integer with the jobid number
(again, if the job was submitted to Slurm).
The functions squeue
and sacct
return a data frame with the information
returned by the command line utilities. The function scancel
returns NULL.
slurm_available()
returns a logical scalar equal to TRUE
if Slurm is
available.
slurm.conf()
and SchedulerParameters()
return information about the
Slurm cluster, if available.
# Are we under a Slurm Cluster? slurm_available() ## Not run: # What is the maximum number of jobs (array size) that the system # allows? sconfig <- slurm.conf() # We first retrieve the info. sconfig["MaxArraySize"] ## End(Not run) ## Not run: # Submitting a simple job job <- Slurm_EvalQ(slurmR::WhoAmI(), njobs = 4L, plan = "submit") # Checking the status of the job (we can simply print) job status(job) # or use the state function sacct(job) # or get more info with the sactt wrapper. # Suppose one of the jobs is taking too long to complete (say #4) # we can stop it and resubmit the job as follows: scancel(job) # Resubmitting only 4 sbatch(job, array = 4) # A new jobid will be assigned ## End(Not run)
# Are we under a Slurm Cluster? slurm_available() ## Not run: # What is the maximum number of jobs (array size) that the system # allows? sconfig <- slurm.conf() # We first retrieve the info. sconfig["MaxArraySize"] ## End(Not run) ## Not run: # Submitting a simple job job <- Slurm_EvalQ(slurmR::WhoAmI(), njobs = 4L, plan = "submit") # Checking the status of the job (we can simply print) job status(job) # or use the state function sacct(job) # or get more info with the sactt wrapper. # Suppose one of the jobs is taking too long to complete (say #4) # we can stop it and resubmit the job as follows: scancel(job) # Resubmitting only 4 sbatch(job, array = 4) # A new jobid will be assigned ## End(Not run)
The functions of the family Slurm_*apply generate a set of temporary files that are used for the job design, submission and collection. This function will remove all the contents of directory created by calling those functions.
Slurm_clean(x)
Slurm_clean(x)
x |
An object of class |
If the job is finalized, it returns 0 if able to clean the directory otherwise return whatever unlink returns after trying to remove the job path.
Other post submission:
Slurm_collect()
,
Slurm_log()
,
status()
Other utilities:
Slurm_env()
,
Slurm_log()
,
WhoAmI()
,
parse_flags()
,
snames()
,
status()
## Not run: job <- Slurm_EvalQ(1 + 1, 2, plan = "collect") # This will remove all the files generated by Slurm_EvalQ Slurm_clean(job) ## End(Not run)
## Not run: job <- Slurm_EvalQ(1 + 1, 2, plan = "collect") # This will remove all the files generated by Slurm_EvalQ Slurm_clean(job) ## End(Not run)
This function takes an object of class slurm_job
and retrieves the results,
this is, combines the R objects generated by each job. Object of class
slurm_job
.
Slurm_collect(...) ## S3 method for class 'slurm_job' Slurm_collect(x, any. = FALSE, wait = 10L, ...)
Slurm_collect(...) ## S3 method for class 'slurm_job' Slurm_collect(x, any. = FALSE, wait = 10L, ...)
... |
Further arguments passed to the method. |
x |
An object of class slurm_job. |
any. |
Logical. When |
wait |
Integer scalar. Number of seconds to wait before checking the
state of a job if the first try returned |
If the given job has hooks, which is a list of functions, these will be applied sequentially to the set of retrieved results before returning.
By default, it returns a concatenated list of the output files generated by each job. If the job object has a hook, it will apply each hook to the full list before returning. See new_slurm_job.
Other post submission:
Slurm_clean()
,
Slurm_log()
,
status()
## Not run: # Collecting a job after calling it job <- Slurm_EvalQ(slurmR::WhoAmI(), njobs = 4, plan = "wait") Slurm_collect(job) # Collecting a job from a previous R session job <- read_slurm_job("/path/to/a/job/tmp_dir") Slurm_collect(job) ## End(Not run)
## Not run: # Collecting a job after calling it job <- Slurm_EvalQ(slurmR::WhoAmI(), njobs = 4, plan = "wait") Slurm_collect(job) # Collecting a job from a previous R session job <- read_slurm_job("/path/to/a/job/tmp_dir") Slurm_collect(job) ## End(Not run)
This function is used within the R script written by slurmR
to get the
current value of SLURM_ARRAY_TASK_ID
, an environment variable that Slurm
creates when running an array. In the case that opts_slurmR$get_debug() == TRUE
,
the function will return a 1 (see opts_slurmR).
Slurm_env(x = "SLURM_ARRAY_TASK_ID")
Slurm_env(x = "SLURM_ARRAY_TASK_ID")
x |
Character scalar. Environment variable to get. |
If slurm is available and the R session is running under a job
array, meaning that SLURM_ARRAY_TASK_ID
is defined, then it returns that
value, otherwise it will return 1
.
Other utilities:
Slurm_clean()
,
Slurm_log()
,
WhoAmI()
,
parse_flags()
,
snames()
,
status()
Submit an expression to be evaluated to multiple jobs.
Slurm_EvalQ( expr, njobs = 2L, job_name = opts_slurmR$get_job_name(), tmp_path = opts_slurmR$get_tmp_path(), plan = "collect", sbatch_opt = list(), rscript_opt = list(), seeds = NULL, compress = TRUE, export = NULL, export_env = NULL, libPaths = .libPaths(), hooks = NULL, overwrite = TRUE, preamble = NULL )
Slurm_EvalQ( expr, njobs = 2L, job_name = opts_slurmR$get_job_name(), tmp_path = opts_slurmR$get_tmp_path(), plan = "collect", sbatch_opt = list(), rscript_opt = list(), seeds = NULL, compress = TRUE, export = NULL, export_env = NULL, libPaths = .libPaths(), hooks = NULL, overwrite = TRUE, preamble = NULL )
expr |
An expression to be passed to Slurm. |
njobs |
Integer. Number of jobs to use in the job-array. This specifies the number of R sessions to initialize. This does not specify the number of cores to be used. |
job_name |
Character. Name of the job to be passed to |
tmp_path |
Character. Path to the directory where all the data (including scripts) will be stored. Notice that this path must be accessible by all the nodes in the network (See opts_slurmR). |
plan |
A character scalar. (See the_plan). |
sbatch_opt |
List of options to be passed to |
rscript_opt |
List. Options to be passed to |
seeds |
Integer vector of length |
compress |
Logical scalar (default |
export |
A named list with objects to be included in the Spawned sessions. |
export_env |
An environment. Environment where the objects listed in
|
libPaths |
A character vector. See .libPaths. |
hooks |
A list of functions (passed to new_slurm_job). |
overwrite |
Logical scalar. When |
preamble |
Character vector. Each element is then added to the Slurm
batch file between the |
A list of length njobs
.
Utilities to deal with objects of class slurm_job
. The function new_slurm_job
,
which is mostly intended to be for internal used, creates an object of class
slurm_job
. The function last_submitted_job
returns the last submitted
job in the current R session, and the functions read/write_slurm_job
are
utility functions to read and write R jobs respectively.
new_slurm_job( call, rscript, bashfile, robjects, njobs, opts_job, opts_r, hooks = NULL ) ## S3 method for class 'slurm_job' print(x, ...) read_slurm_job(path) write_slurm_job(x, path = NULL) last_submitted_job() last_job()
new_slurm_job( call, rscript, bashfile, robjects, njobs, opts_job, opts_r, hooks = NULL ) ## S3 method for class 'slurm_job' print(x, ...) read_slurm_job(path) write_slurm_job(x, path = NULL) last_submitted_job() last_job()
call |
The original call |
rscript , bashfile
|
The R script and bash file path. |
robjects |
A character vector of R objects that will be imported in the job. |
njobs |
Integer. Number of jobs to start (array). |
opts_job , opts_r
|
List. In the case of |
hooks |
List of functions. To be called on the collected results after it finalizes. |
x |
An object of class |
... |
Further arguments passed to the method. |
path |
Character scalar. Path to either a directory with a |
In the case of the function new_slurm_job
, besides of creating the
object of class slurm_job
, the function calls write_slurm_job
and stores
the job object in an rds
class file. The name and location of
the saved rds file is generated using the function snames("job")
.
The read_slurm_job
can help the user recovering a previously saved
slurm_job
object. If path
is a directory, then the function will assume
that the file that is looking for lives within that directory and is named
job.rds
. Otherwise, if a file, then it will read it directly. In any case,
it will check that the read object is an object of class slurm_job
.
The write_slurm_job
function simply takes a slurm_job
object
and saves it in, if path
is not specified, whatever the job$options$chdir
folder is under the name job.rds
. If a path is specified, the it is directly
passed to saveRDS()
.
The las_submitted_job
function will return the latest slurm_job
object that was submitted via sbatch in the current session. The last_job
function is just an alias of the later. If no job has been submitted, then
the resulting value will be NULL
.
An environment of class slurm_job
. This has the following items:
call
The original call (Slurm_lapply, Slurm_Map, etc.)
rscript
The full path to the R script to be executed by bash file.
bashfile
The full path to the bash file to be executed by sbatch.
robjects
Ignored.
njobs
The number of jobs to be submitted (job array).
opts_job
,opts_r
Two lists of options as returned by opts_slurmR$get_opts_job()
and opts_slurmR$get_r_opts() at the moment of the creation of the slurm_job
.
hooks
A list of functions to be called on the collected objects
by Slurm_collect.
In the case of the function write_slurm_job
, it returns the full
path to the file.
## Not run: # The last_job function can be handy when `plan = "collect"` in a called, # for example job <- Slurm_lapply(1:1000, function(i) runif(100), njobs = 2, plan = "collect") # Post collection analysis status(last_job()) ## End(Not run)
## Not run: # The last_job function can be handy when `plan = "collect"` in a called, # for example job <- Slurm_lapply(1:1000, function(i) runif(100), njobs = 2, plan = "collect") # Post collection analysis status(last_job()) ## End(Not run)
After submission, the functions of type Slurm_*apply generate
log files, one per each job in the job array. The Slurm_log
function can be
used to check the log files of jobs in the array that failed.
Slurm_log(x, which. = NULL, cmd = NULL)
Slurm_log(x, which. = NULL, cmd = NULL)
x |
An object of class slurm_job. |
which. |
An integer scalar. The number of the array job to check. This
should range between 1 and |
cmd |
Character scalar. The name of the command to use to call view the
log file. Default to |
If other than less
is used, then the function will try to
check by calling cmd --version
. If returns with error, it assumes the
function is not available. Using the cmd
argument only works in interactive
mode.
Whatever the command-line call returns.
Other post submission:
Slurm_clean()
,
Slurm_collect()
,
status()
Other utilities:
Slurm_clean()
,
Slurm_env()
,
WhoAmI()
,
parse_flags()
,
snames()
,
status()
## Not run: x <- Slurm_EvalQ(slurmR::whoami(), plan = "wait") Slurm_log(x) # Checking the R log ## End(Not run)
## Not run: x <- Slurm_EvalQ(slurmR::whoami(), plan = "wait") Slurm_log(x) # Checking the R log ## End(Not run)
*apply
family of functions.The Slurm version of the *apply
family of functions.
Slurm_Map( f, ..., njobs = 2L, mc.cores = 1L, job_name = opts_slurmR$get_job_name(), tmp_path = opts_slurmR$get_tmp_path(), plan = "collect", sbatch_opt = list(), rscript_opt = list(), seeds = NULL, compress = TRUE, export = NULL, export_env = NULL, libPaths = .libPaths(), hooks = NULL, overwrite = TRUE, preamble = NULL ) Slurm_lapply( X, FUN, ..., njobs = 2L, mc.cores = 1L, job_name = opts_slurmR$get_job_name(), tmp_path = opts_slurmR$get_tmp_path(), plan = "collect", sbatch_opt = list(), rscript_opt = list(), seeds = NULL, compress = TRUE, export = NULL, export_env = NULL, libPaths = .libPaths(), hooks = NULL, overwrite = TRUE, preamble = NULL ) Slurm_sapply(X, FUN, ..., simplify = TRUE, USE.NAMES = TRUE)
Slurm_Map( f, ..., njobs = 2L, mc.cores = 1L, job_name = opts_slurmR$get_job_name(), tmp_path = opts_slurmR$get_tmp_path(), plan = "collect", sbatch_opt = list(), rscript_opt = list(), seeds = NULL, compress = TRUE, export = NULL, export_env = NULL, libPaths = .libPaths(), hooks = NULL, overwrite = TRUE, preamble = NULL ) Slurm_lapply( X, FUN, ..., njobs = 2L, mc.cores = 1L, job_name = opts_slurmR$get_job_name(), tmp_path = opts_slurmR$get_tmp_path(), plan = "collect", sbatch_opt = list(), rscript_opt = list(), seeds = NULL, compress = TRUE, export = NULL, export_env = NULL, libPaths = .libPaths(), hooks = NULL, overwrite = TRUE, preamble = NULL ) Slurm_sapply(X, FUN, ..., simplify = TRUE, USE.NAMES = TRUE)
njobs |
Integer. Number of jobs to use in the job-array. This specifies the number of R sessions to initialize. This does not specify the number of cores to be used. |
job_name |
Character. Name of the job to be passed to |
tmp_path |
Character. Path to the directory where all the data (including scripts) will be stored. Notice that this path must be accessible by all the nodes in the network (See opts_slurmR). |
plan |
A character scalar. (See the_plan). |
sbatch_opt |
List of options to be passed to |
rscript_opt |
List. Options to be passed to |
seeds |
Integer vector of length |
compress |
Logical scalar (default |
export |
A named list with objects to be included in the Spawned sessions. |
export_env |
An environment. Environment where the objects listed in
|
libPaths |
A character vector. See .libPaths. |
hooks |
A list of functions (passed to new_slurm_job). |
overwrite |
Logical scalar. When |
preamble |
Character vector. Each element is then added to the Slurm
batch file between the |
X , FUN , f , mc.cores , ...
|
Arguments passed to either parallel::mclapply or parallel::mcMap. |
simplify , USE.NAMES
|
Logical scalar. See sapply. |
The function Slurm_lapply
will submit njobs
to the queue and distribute
X
according to parallel::splitIndices. For example, if X
is list with
1,000 elements, and njobs = 2
, then Slurm_lapply
will submit 2 jobs with
500 elements of X
each (2 chunks of data). The same principle applies to
Slurm_sapply
and Slurm_Map
, this is, the data is split by chunks so all
the information is sent at once when the job is submitted.
Just like sapply is to lapply, Slurm_sapply
is just a wrapper of
Slurm_lapply
with an extra argument, simplify
. When TRUE
, once the job
is collected, the function simplify2array is called.
If plan == "collect"
, then whatever the analogous function returns,
otherwise, an object of class slurm_job.
Job Array Support https://slurm.schedmd.com/job_array.html
For resubmitting a job, see the example in sbatch.
## Not run: # A job drawing 1e6 uniforms on 10 jobs (array) # The option plan = "wait" makes it return only once the job is completed. job1 <- Slurm_lapply(1:20, function(i) runif(1e6), njobs=10, plan = "wait") # To collect ans <- Slurm_collect(job1) # As before, but this time not waiting, and now we are passing more # arguments to the function # plan = "none" only creates the job object (and the files), we submit # later job1 <- Slurm_lapply(1:20, function(i, a) runif(1e6, a), a = -1, njobs=10, plan = "none") # We submit job1 <- sbatch(job1) # In order to cancel a job scancel(job1) # How to clean up Slurm_clean(job1) ## End(Not run)
## Not run: # A job drawing 1e6 uniforms on 10 jobs (array) # The option plan = "wait" makes it return only once the job is completed. job1 <- Slurm_lapply(1:20, function(i) runif(1e6), njobs=10, plan = "wait") # To collect ans <- Slurm_collect(job1) # As before, but this time not waiting, and now we are passing more # arguments to the function # plan = "none" only creates the job object (and the files), we submit # later job1 <- Slurm_lapply(1:20, function(i, a) runif(1e6, a), a = -1, njobs=10, plan = "none") # We submit job1 <- sbatch(job1) # In order to cancel a job scancel(job1) # How to clean up Slurm_clean(job1) ## End(Not run)
'Slurm', Simple Linux Utility for Resource Management https://slurm.schedmd.com/, is a popular 'Linux' based software used to schedule jobs in 'HPC' (High Performance Computing) clusters. This R package provides a specialized lightweight wrapper of 'Slurm' with a syntax similar to that found in the 'parallel' R package. The package also includes a method for creating socket cluster objects spanning multiple nodes that can be used with the 'parallel' package.
To cite slurmR in publications use:
Vega Yon et al., (2019). slurmR: A lightweight wrapper for HPC with Slurm. Journal of Open Source Software, 4(39), 1493, https://doi.org/10.21105/joss.01493
A BibTeX entry for LaTeX users is
@Article{, title = {slurmR: A lightweight wrapper for HPC with Slurm}, author = {George {Vega Yon} and Paul Marjoram}, journal = {The Journal of Open Source Software}, year = {2019}, month = {jul}, volume = {4}, number = {39}, doi = {10.21105/joss.01493}, url = {https://doi.org/10.21105/joss.01493}, }
Helper functions to use slurmR
's docker image. This requires
having an internet connection and docker
installed in your
system.
docker_available(path = "") slurmr_docker_pull(path = "") slurmr_docker_run(path = "", pull = TRUE, timeout = 60) slurmr_docker_stop(UUID = "", path = "")
docker_available(path = "") slurmr_docker_pull(path = "") slurmr_docker_run(path = "", pull = TRUE, timeout = 60) slurmr_docker_stop(UUID = "", path = "")
path |
Path to the |
pull |
Logical scalar. When |
timeout |
Integer. Number of seconds to wait for docker to start the slurmR image. |
UUID |
String. Universally Unique Identifier. |
Starting version 0.5-0, a Docker image with Slurm, R, and slurmR is available at https://hub.docker.com/r/uscbiostats/slurmr. The source code (Dockerfile) is available in the project GitHub repository: https://github.com/USCbiostats/slurmR.
# This example requires having Docker installed in the system ## Not run: # Start the docker image. By default it will try to pull the # image from Docker Hub if available # This opens a bash session with R + Slurm + slurmR slurmr_docker_run() # Will pull the docker image slurmr_docker_pull() ## End(Not run)
# This example requires having Docker installed in the system ## Not run: # Start the docker image. By default it will try to pull the # image from Docker Hub if available # This opens a bash session with R + Slurm + slurmR slurmr_docker_run() # Will pull the docker image slurmr_docker_pull() ## End(Not run)
Using opts_slurmR$get_tmp_path
and opts_slurmR$get_job_name
creates
file names with full path to the objects. This function is intended for
internal use only.
snames(type, array_id = NULL, tmp_path = NULL, job_name = NULL)
snames(type, array_id = NULL, tmp_path = NULL, job_name = NULL)
type |
can be any of r, sh, out, or rds. |
array_id |
Integer. ID of the array to create the name. |
tmp_path |
Character scalar. Path to the temp directory used by the job to write files. |
job_name |
Character scalar. Name of the job. |
By default, the parameters tmp_path
and job_name
are retrieved from
the current options specified in opts_slurmR.
A character scalar. The normalized path to the corresponding file.
Other utilities:
Slurm_clean()
,
Slurm_env()
,
Slurm_log()
,
WhoAmI()
,
parse_flags()
,
status()
This function sources R scripts using Slurm by creating a batch script file and submitting it via sbatch.
sourceSlurm( file, job_name = NULL, tmp_path = opts_slurmR$get_tmp_path(), rscript_opt = list(vanilla = TRUE), plan = "submit", ... ) slurmr_cmd( cmd_path, cmd_name = "slurmr", add_alias = TRUE, bashrc_path = "~/.bashrc" )
sourceSlurm( file, job_name = NULL, tmp_path = opts_slurmR$get_tmp_path(), rscript_opt = list(vanilla = TRUE), plan = "submit", ... ) slurmr_cmd( cmd_path, cmd_name = "slurmr", add_alias = TRUE, bashrc_path = "~/.bashrc" )
file |
Character. Path to the R script to source using Slurm. |
job_name |
Character. Name of the job to be passed to |
tmp_path |
Character. Path to the directory where all the data (including scripts) will be stored. Notice that this path must be accessible by all the nodes in the network (See opts_slurmR). |
rscript_opt |
List. Options to be passed to |
plan |
A character scalar. (See the_plan). |
... |
Further options passed to sbatch. |
cmd_path |
Character scalar. Path (directory) where to put the command function. This is usually your home directory. |
cmd_name |
Character scalar. Name of the command (of the file). |
add_alias , bashrc_path
|
Logical scalar and character scalar. When
|
sourceSlurm
checks for flags that may be included in the Slurm job file. If
the R script starts with #!/bin/
or similar, then #SBATCH
flags will be
read from the R script and added to the Slurm job file.
The function slurmr_cmd
writes a simple command that works as a wrapper
of sourceSlurm
. In particular, from command line, if the user wants to source an
R script using sourceSlurm
, we can either:
$ Rscript -e "slurmR::sourceSlurm('path/to/the/script.R', plan = 'submit')"
Or, after calling slurmr_cmd
from within R, do the following instead
$ ./slurmr path/to/the/script.R
And, if you used the option add_alias = TRUE
, then, after restarting bash,
you can run R scripts with Slurm as follows:
$ slurmr path/to/the/script.R
The main side effect of this function is that it creates a file named cmd_name
in the directory specified by cmd_path
, and, if add_alias = TRUE
. it will
create (if not found) or modify (if found) the .bashrc
file adding a line
with an alias. For more information on .bashrc
see here.
In the case of sourceSlurm
, Whatever sbatch returns.
The function slurmr_cmd
returns invisible()
.
# In this example we will be sourcing an R script that also has #SBATCH # flags. Here are the contents file <- system.file("example.R", package="slurmR") cat(readLines(file), sep="\n") # #!/bin/sh # #SBATCH --account=lc_ggv # #SBATCH --time=01:00:00 # #SBATCH --mem-per-cpu=4G # #SBATCH --job-name=Waiting # Sys.sleep(10) # message("done.") # We can directly submit this R script as a job by calling `sourceSlurm`. # (of course you need Slurm to do this!) ## Not run: sourceSlurm(file) ## End(Not run) # The function will create a bash script that is used later to be submitted to # the queue using `sbatch`. The resulting file looks something like this # #!/bin/sh # #SBATCH --job-name=Waiting # #SBATCH --output=/home/vegayon/Documents/slurmR/Waiting.out # #SBATCH --account=lc_ggv # #SBATCH --time=01:00:00 # #SBATCH --mem-per-cpu=4G # /usr/lib/R/bin/Rscript --vanilla /usr/local/lib/R/site-library/slurmR/example.R
# In this example we will be sourcing an R script that also has #SBATCH # flags. Here are the contents file <- system.file("example.R", package="slurmR") cat(readLines(file), sep="\n") # #!/bin/sh # #SBATCH --account=lc_ggv # #SBATCH --time=01:00:00 # #SBATCH --mem-per-cpu=4G # #SBATCH --job-name=Waiting # Sys.sleep(10) # message("done.") # We can directly submit this R script as a job by calling `sourceSlurm`. # (of course you need Slurm to do this!) ## Not run: sourceSlurm(file) ## End(Not run) # The function will create a bash script that is used later to be submitted to # the queue using `sbatch`. The resulting file looks something like this # #!/bin/sh # #SBATCH --job-name=Waiting # #SBATCH --output=/home/vegayon/Documents/slurmR/Waiting.out # #SBATCH --account=lc_ggv # #SBATCH --time=01:00:00 # #SBATCH --mem-per-cpu=4G # /usr/lib/R/bin/Rscript --vanilla /usr/local/lib/R/site-library/slurmR/example.R
Using the sacct function, it checks the status of a particular job and returns information about its current state, with details regarding the jobs (if an array) that are done, running, pending, or failed.
status(x) ## S3 method for class 'slurm_job' status(x) ## Default S3 method: status(x) ## S3 method for class 'slurm_status' x$name
status(x) ## S3 method for class 'slurm_job' status(x) ## Default S3 method: status(x) ## S3 method for class 'slurm_status' x$name
x |
Either a Job id, an object of class |
name |
Character scalar. List of status to retrieve. This can be any of
|
An integer with attributes of class slurm_status
. The attributes
are integer vectors indicating which jobs fail in the categories of done
,
failed
, pending
, and running
(see JOB_STATE_CODES). Possible return
values are:
-1
: No job found. This may be a false negative as the job may still be
on it's way to be submitted.
0
: Job completed.
1
: All jobs are pending resource allocation or are on it's way to start.
2
: All jobs are currently running.
3
: One or more jobs are still running.
99
: One or more jobs failed.
If the job is not an array, then function will return the corresponding code but the attributes will only have a single number, 1, according to the state of the job (completed, failed, pending).
Other utilities:
Slurm_clean()
,
Slurm_env()
,
Slurm_log()
,
WhoAmI()
,
parse_flags()
,
snames()
Other post submission:
Slurm_clean()
,
Slurm_collect()
,
Slurm_log()
## Not run: x <- Slurm_EvalQ(Sys.sleep(100), njobs = 2) status(x) # A possible result: An integer with attributes # Status: All jobs are pending resource allocation or are on it's way to start. (Code 1) # This is a job array. The status of each job, by array id, is the following: # done : - # failed : - # pending : 1, 2. # running : - ## End(Not run)
## Not run: x <- Slurm_EvalQ(Sys.sleep(100), njobs = 2) status(x) # A possible result: An integer with attributes # Status: All jobs are pending resource allocation or are on it's way to start. (Code 1) # This is a job array. The status of each job, by array id, is the following: # done : - # failed : - # pending : 1, 2. # running : - ## End(Not run)
slurm_job
wrapperUsers can choose whether to submit the job or not, to wait for it, and whether they want to collect the results right away after the job has finished. This function will help developers to figure out what set of actions need to be taken depending on the plan.
the_plan(plan)
the_plan(plan)
plan |
A character scalar with either of the following values:
|
This is a helper function that returns a list with three logical values:
wait
, collect
, and submit
. There are four possible cases:
plan == "collect"
, then all three are TRUE
.
plan == "wait"
, then all but collect
are TRUE
.
plan == "submit"
then only submit
equals TRUE
.
plan == "none"
then all three are FALSE
.
In general, bot wait
and submit
will be passed to sbatch.
When collect == TRUE
, then it usually means that the function will be calling
Slurm_collect right after submitting the job via sbatch.
A list with three logical scalars.
This is used in apply functions and in Slurm_EvalQ.
the_plan("none") # $collect # [1] FALSE # # $wait # [1] FALSE # # $submit # [1] FALSE the_plan("wait") # $collect # [1] FALSE # # $wait # [1] TRUE # # $submit # [1] TRUE
the_plan("none") # $collect # [1] FALSE # # $wait # [1] FALSE # # $submit # [1] FALSE the_plan("wait") # $collect # [1] FALSE # # $wait # [1] TRUE # # $submit # [1] TRUE
Wait for a Slurm job to be completed
wait_slurm(x, ...) ## S3 method for class 'slurm_job' wait_slurm(x, ...) ## S3 method for class 'integer' wait_slurm(x, timeout = -1, freq = 0.1, force = TRUE, ...)
wait_slurm(x, ...) ## S3 method for class 'slurm_job' wait_slurm(x, ...) ## S3 method for class 'integer' wait_slurm(x, timeout = -1, freq = 0.1, force = TRUE, ...)
x |
Either a job id number, or an object of class slurm_job. |
... |
Further arguments passed to the method |
timeout |
Integer. Maximum wait time in seconds. If |
freq |
Frequency in seconds to query for the state of the job. |
force |
Logical scalar. When |
Invisible NULL
.
# Waiting is only available if there are Slurm clusters if (slurm_available()) { job <- Slurm_EvalQ(Sys.sleep(1000), plan = "submit", njobs = 2) wait_slurm(job, timeout = 1) # This will return a warning scancel(job) Slurm_clean(job) }
# Waiting is only available if there are Slurm clusters if (slurm_available()) { job <- Slurm_EvalQ(Sys.sleep(1000), plan = "submit", njobs = 2) wait_slurm(job, timeout = 1) # This will return a warning scancel(job) Slurm_clean(job) }
This returns a named vector with the following variables: SLURM_LOCALID, SLURMD_NODENAME, SLURM_ARRAY_TASK_ID, SLURM_CLUSTER_NAME, SLURM_JOB_PARTITION, SLURM_TASK_PID
WhoAmI() whoami()
WhoAmI() whoami()
whoami
is just an alias of WhoAmI
.
A character vector with the corresponding system environment variables' values.
Other utilities:
Slurm_clean()
,
Slurm_env()
,
Slurm_log()
,
parse_flags()
,
snames()
,
status()