CandyPond technical description

This page describes technical details of CandyPond, which stands for “Cache and deliver your pilot on-demand data”.

Description and Purpose

The ARC caching system automatically saves to local disk job input files for use with future jobs. The cache is completely internal to the computing element and cannot be accessed or manipulated from the outside. CandyPond exposes various operations of the cache to the outside and can be useful in a pilot job model where input data for jobs is not known until the job is running on the worker node. When the pilot picks its payload it can contact CandyPond to gain access to a file that is already cached, or if it is not cached ask CandyPond to download the file to cache.

Installation and Configuration

CandyPond is an integral part of A-REX and is available as part of the nordugrid-arc-arex package.

It is enabled in A-REX by adding the block [arex/ws/candypond] to arc.conf. The [arex], [arex/ws] and [arex/data-staging] blocks are also required.

Runtime Environment Configuration

A runtime environment ENV/CANDYPOND exists to provide a convenient python module arccandypond to the job running on the worker node. This can either be used as a command line interface or as a python API.

Note that the ENV/PROXY runtime environment is also needed in order to have access to the proxy on the worker node.

ENV/CANDYPOND will automatically detect the correct URL of the CandyPond service, but if it is desired to use a different URL then it can be set with

arcctl rte params-set ENV/CANDYPOND CANDYPOND_URL <url>

Command Line Interface

arccandypond get <url> <file> can be used in place of whatever usual command the job would use to download input data. This command asks Candypond to download the url to cache if not already present, and link to the file specified in the job’s working directory.

arccandypond check <url> can be used to check if the given url already exists in the cache. It will exit with 0 if the file is present, 1 if not, or 2 if an error occurred.

Python API

The job can import the module and use the CacheLink and CacheCheck methods to perform the equivalent of get and check commands.

Example Use Case

In this example a job is submitted which uses arccandypond to download input data to cache and have it available to the job.

The xrsl file defines the required runtime environments. Note that no input files are specified.

$ cat candypond.xrsl
&
("executable" = "candypond.sh")
("runtimeenvironment" = "ENV/CANDYPOND")
("runtimeenvironment" = "ENV/PROXY")
("jobname" = "candypond_test" )
("walltime" = "3600" )
("cputime" = "3600" )
("stderr" = "stderr")
("stdout" = "stdout")
("gmlog" = "gmlog")
("outputfiles" =
   ("stdout" "")
   ("stderr" "")
   ("gmlog" "")
)

The job script uses candypond to download the input file to cache and link to the job’s working directory:

$ cat candypond.sh
#!/bin/sh
arccandypond get http://www.nordugrid.org:80/data/run.sh run.sh
ls -lrt
echo
cat run.sh

Submit the job:

$ arcsub candypond.xrsl
Job submitted with jobid: https://...

Check the output:

$ arccat https://...
{'http://www.nordugrid.org:80/data/run.sh': ('0', 'Success')}
total 28
-rwx------ 1 dcameron dcameron   257 Apr 10 20:23 candypond.sh
drwx------ 4 dcameron dcameron  4096 Apr 10 20:23 arc
-rw------- 1 dcameron dcameron 10662 Apr 10 20:23 user.proxy
-rw------- 1 dcameron dcameron     0 Apr 10 20:23 stderr
lrwxrwxrwx 1 dcameron dcameron    89 Apr 10 20:23 run.sh -> /opt/var/arc/cache/joblinks/eOGODmV3WYunPSAtDmVmuSEmABFKDmABFKDmJSFKDmGBFKDm5lgA5m/run.sh
-rw------- 1 dcameron dcameron    62 Apr 10 20:23 stdout

#!/bin/sh

GCC=`which g++ 2>/dev/null`
echo $GCC
if [ -z $GCC ]; then
  echo "Could not find the g++-compiler!"
  exit 0
fi

make
chmod 755 prime
./prime $1

Note the symbolic link to the cache.

Issues and Notes

  • Calls to arccandypond get may block for a long time if the file needs to be downloaded to cache and A-REX is already busy with data staging or the file is very large. A timeout option will be added in the future.
  • CandyPond (like the A-REX web service interface it is part of) does not accept legacy proxies. This type of proxy is created by default with older versions of grid/voms-proxy-init, but an RFC-compliant proxy can be generated using the -rfc option.
  • CandyPond links files to the session dir. If a scratch directory is used for executing the job, the cache files are moved there from the session directory. This requires that the scratch dir is accessible from the CanyPond host, so it cannot be used in situations where the scratch directory can only be accessed by the underlying LRMS.