LRMS shell-backends overview for developers

CONFIG variables used in LRMS shell-backend:

lrms_common.sh:

$CONFIG_runtimedir            [arex]
$CONFIG_shared_scratch        [arex]
$CONFIG_shared_filesystem     [arex]
$CONFIG_scratchdir            [arex]
$CONFIG_gnu_time          [lrms]
$CONFIG_nodename          [lrms]
$CONFIG_enable_perflog_reporting [common]
$CONFIG_perflogdir        [common]

submit_common.sh:

$CONFIG_defaultmemory         [queue] [lrms]
$CONFIG_hostname          [common]
$CONFIG_controldir        [arex]

lrms=boinc:

$CONFIG_boinc_app_id   [lrms]
$CONFIG_boinc_db_host  [lrms]
$CONFIG_boinc_db_port  [lrms]
$CONFIG_boinc_db_user  [lrms]
$CONFIG_boinc_db_pass  [lrms]
$CONFIG_boinc_db_name  [lrms]

lrms=condor [3]:

# $CONFIG_enable_perflog_reporting    [common] not in reference
# $CONFIG_perflogdir                  [common] not in reference
# $CONFIG_controldir                  [arex] (for perflog)

$CONFIG_condor_requirements   [queue] [lrms]
$CONFIG_condor_rank                   [lrms]
# $CONFIG_shared_filesystem   [arex]
$CONFIG_condor_bin_path               [lrms]
$CONFIG_condor_config                 [lrms]
[3]Here and following # prefix is for options and are used in *_common scripts and not unique to particular backend

lrms=fork:

no variables

lrms=ll:

# $CONFIG_enable_perflog_reporting    [common] not in reference
# $CONFIG_perflogdir                  [common] not in reference
# $CONFIG_controldir                  [arex] (for perflog)

$CONFIG_ll_bin_path                   [lrms]
$CONFIG_ll_consumable_resources       [lrms]
$CONFIG_ll_parallel_single_jobs       *not in reference
# $CONFIG_scratchdir                  [arex]

lrms=lsf:

# $CONFIG_enable_perflog_reporting    [common] not in reference
# $CONFIG_perflogdir                  [common] not in reference
# $CONFIG_controldir                  [arex] (for perflog)

$CONFIG_lsf_architecture              [lrms]
$CONFIG_lsf_bin_path                  [lrms]

lrms=pbs:

# $CONFIG_enable_perflog_reporting    [common] not in reference
# $CONFIG_perflogdir                  [common] not in reference
# $CONFIG_controldir                  [arex] (for perflog)

$CONFIG_pbs_queue_node        [queue]
$CONFIG_pbs_bin_path                  [lrms]
$CONFIG_nodememory                    [queue] ([infosys/cluster] parser substitution fallback only)
$CONFIG_pbs_log_path                  [lrms]
# $CONFIG_shared_filesystem           [arex]

lrms=sge:

# $CONFIG_enable_perflog_reporting    [common] not in reference
# $CONFIG_perflogdir                  [common] not in reference
# $CONFIG_controldir                  [arex] (for perflog)

$CONFIG_sge_root                          [lrms]
$CONFIG_sge_cell                          [lrms]
$CONFIG_sge_qmaster_port              [lrms]
$CONFIG_sge_execd_port                [lrms]
$CONFIG_sge_bin_path                  [lrms]
$CONFIG_sge_jobopts                   [queue] [lrms]
# $CONFIG_scratchdir                  [arex]

lrms=slurm:

# $CONFIG_enable_perflog_reporting    [common] not in reference
# $CONFIG_perflogdir                  [common] not in reference
# $CONFIG_controldir                  [arex] (for perflog)

$CONFIG_slurm_wakeupperiod    [lrms]
$CONFIG_slurm_use_sacct               [lrms]
$CONFIG_slurm_bin_path                [lrms]
# $CONFIG_shared_filesystem           [arex]

Call graph

Submitting jobs

digraph { subgraph cluster_0 { node [style=filled, shape=Rectangle]; label = "sumbit_LRMS_job.sh"; "define joboption_lrms" -> "source lrms_common.sh" -> "source submit_common.sh"; "source submit_common.sh" -> "common_init" -> lslogic; lslogic [ label="LRMS-specific submit" ]; } subgraph cluster_1 { label = "sumbit_common.sh"; style = "dashed"; node [style=filled]; "common_init()"; aux1 [ label="RTEs()" ]; aux2 [ label="Moving files()" ]; aux3 [ label="I/O redicrection()" ]; aux4 [ label="Defining user ENV()" ]; aux5 [ label="Memory requirements()" ]; aux1 -> lslogic; aux2 -> lslogic; aux3 -> lslogic; aux4 -> lslogic; aux5 -> lslogic; # rank hack aux1 -> aux2 -> aux3 -> aux4 -> aux5 [style=invis]; } subgraph cluster_2 { label = "lrms_common.sh"; style = "dashed"; node [style=filled]; "packaging paths" -> "source lrms_common.sh"; "parse_arc_conf()" -> "common_init()"; "parse_grami()" -> "common_init()"; "init_lrms_env()" -> "common_init()"; "packaging paths" [shape=Rectangle] } subgraph cluster_3 { label = "configure_LRMS_env.sh"; node [style=filled, shape=Rectangle]; "set LRMS-specific ENV/fucntions" -> "common_init()"; } "a-rex" -> "define joboption_lrms"; "common_init()" -> "common_init" "arc.conf" -> "parse_arc_conf()"; "grami file" -> "parse_grami()"; # rank hack "packaging paths" -> "set LRMS-specific ENV/fucntions" [style=invis]; "a-rex" [shape=Mdiamond]; "grami file" [shape=Msquare]; "arc.conf" [shape=Msquare]; lslogic -> "LRMS"; "LRMS" [shape=Mdiamond]; }

Scanning jobs

digraph { subgraph cluster_0 { node [style=filled, shape=Rectangle]; label = "scan_LRMS_job.sh"; lslogic [ label="LRMS-specific scan" ]; "define joboption_lrms" -> "source lrms_common.sh" -> "source scan_common.sh"; "source scan_common.sh" -> "common_init" -> lslogic; } subgraph cluster_1 { label = "scan_common.sh"; style = "dashed"; node [style=filled]; "common_init()"; aux1 [ label="Timestamp convertion()" ]; aux2 [ label="Owner UID()" ]; aux3 [ label="Read/Write diag()" ]; aux4 [ label="Save commentfile()" ]; aux1 -> lslogic; aux2 -> lslogic; aux3 -> lslogic; aux4 -> lslogic; # rank hack "common_init()" -> aux1 -> aux2 -> aux3 -> aux4 [style=invis]; } subgraph cluster_2 { label = "lrms_common.sh"; style = "dashed"; node [style=filled]; "packaging paths" -> "source lrms_common.sh"; "parse_arc_conf()" -> "common_init()"; "init_lrms_env()" -> "common_init()"; "parse_grami()"; "packaging paths" [shape=Rectangle] } subgraph cluster_3 { label = "configure_LRMS_env.sh"; node [style=filled, shape=Rectangle]; "set LRMS-specific ENV/fucntions" -> "common_init()"; } "a-rex" -> "define joboption_lrms"; "common_init()" -> "common_init" "arc.conf" -> "parse_arc_conf()"; "controldir" -> lslogic; lslogic -> "LRMS"; # rank hack "source lrms_common.sh" -> "set LRMS-specific ENV/fucntions" [style=invis]; "a-rex" [shape=Mdiamond]; "controldir" [shape=Msquare]; "arc.conf" [shape=Msquare]; "LRMS" [shape=Mdiamond]; }

Canceling jobs

digraph { subgraph cluster_0 { node [style=filled, shape=Rectangle]; label = "cancel_LRMS_job.sh"; lslogic [ label="LRMS-specific cancel" ]; "define joboption_lrms" -> "source lrms_common.sh" -> "source scan_common.sh"; "source scan_common.sh" -> "common_init" -> lslogic; } subgraph cluster_1 { label = "cancel_common.sh"; style = "dashed"; node [style=filled]; "common_init()"; } subgraph cluster_2 { label = "lrms_common.sh"; style = "dashed"; node [style=filled]; "packaging paths" -> "source lrms_common.sh"; "parse_arc_conf()" -> "common_init()"; "init_lrms_env()" "parse_grami()" -> "common_init()"; "packaging paths" [shape=Rectangle] } subgraph cluster_3 { label = "configure_LRMS_env.sh"; node [style=filled, shape=Rectangle]; "set LRMS-specific ENV/fucntions" -> "common_init()"; } "a-rex" -> "define joboption_lrms"; "common_init()" -> "common_init" "arc.conf" -> "parse_arc_conf()"; "grami file" -> "parse_grami()"; lslogic -> "LRMS"; # rank hack "source lrms_common.sh" -> "set LRMS-specific ENV/fucntions" [style=invis]; "a-rex" [shape=Mdiamond]; "grami file" [shape=Msquare]; "arc.conf" [shape=Msquare]; "LRMS" [shape=Mdiamond]; }

Changes in ARC6 memory limits processing:

Current logic of memory limits processing:

  • nodememory - advertise memory for matchmaking: max memory on the nodes (in [infosys/cluster] block or per-queue)
  • defaultmemory - enforce during submission if no memory limit specified in the job description (in [lrms] block or per-queue)

The ARC6 logic is no enforcement = no limit [1]

[1]ARC5 logic was no enforcement = max node memory or 1GB if nodememory is not published (and not used for matchmaking)
Backends behaviour with no memory enforcement limit:
  • boinc - set to hardcoded 2GB
  • condor - no enforcement
  • form - no memory handling at all
  • ll - no enforcement
  • lsf - no enforcement
  • pbs - no enforcement [2]
  • sge - no enforcement
  • slurm - no enforcement
[2]exclusivenode is memory-based and nodememory value is used in this case