ARC CE REST interface specification

Note

The current interface version is 1.1

The REST API endpoint

The various functionalities of the service are accessible through HTTP(S) URL built upon following pattern:

<service endpoint URL>/rest/<version>/<functionality>

  • <service endpoint URL> represents mounting point of the service and may look like https://arc.example.org:443/arex.

  • <version> is two parts number separated by dot. Current version is 1.1.

  • <functionality> is one of keywords defined below.

Further the part <service endpoint URL>/rest/<version> is referred as <base URL>.

All parts of URL to the right of hostname are case-sensitive.

Depending on Accept header in HTTP request (Accept: application/json, Accept: text/xml or Accept: application/xml), information in the response rendered in either JSON or XML format. If not specified it defaults to text/html and output is compatible with ordinary web browser.

In the HTTP response headers the HTTP Status-Code (RFC7231) indicates the status of the overal request (e.g. 403 corresponds to the forbidden).

For the operations that support multiple (bulk) requests per single API call, in addition to the Status-Code in HTTP header, the per-request Status-Codes are returned. They are included as a part of the response array in HTTP body using the same RFC2731 values following the syntax defined below.

Description of functionalities and operations

Requesting supported versions

GET <service endpoint URL>/rest

Operations:

  • GET - returns list of supported REST API versions

  • POST, PUT, DELETE - not supported

Example response:

The XML response is like:

<versions>
  <version>1.0</version>
  <version>1.1</version>
  <version>1.2</version>
</versions>

The JSON is:

{version: [ "1.0", "1.1", "1.2" ]}

or

{version: "1.0"}

Obtaining CE resource information

GET <base URL>/info[?schema=glue2]

Operations:

  • GET - retrieve generic information about cluster properties. It accepts the optional schema parameter. The default and only supported value in the currect ARC release is glue2. The CRR rendering might be added in the future ARC releases. XML or JSON returned according to request headers.

  • HEAD - supported

  • PUT, POST, DELETE - not supported.

Example QUERY:

GET https://host.domain.org:443/arex/rest/1.0/info?schema=glue2 HTTP/1.1
Accept: application/xml

The XML response is:

<InfoRoot>
  <Domains xmlns="http://schemas.ogf.org/glue/2009/03/spec_2.0_r1" xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance"  xsi:schemaLocation="https://raw.github.com/OGF-GLUE/XSD/master/schema/GLUE2.xsd">
    <AdminDomain BaseType="Domain" CreationTime="2018-11-06T20:26:46Z" Validity="10800">
      <ID>urn:ad:UNDEFINEDVALUE</ID>
      <Name>UNDEFINEDVALUE</Name>
      <Distributed>false</Distributed>
      <Services>
        <ComputingService BaseType="Service" CreationTime="2018-11-06T20:26:46Z" Validity="10800">
          <ID>urn:ogf:ComputingService:arc.zero:arex</ID>
          <Capability>data.transfer.cepush.srm</Capability>
          <Capability>executionmanagement.jobmanager</Capability>
 ... output omitted ...

Operating jobs

GET <base URL>/jobs[?state=<state1>[,<state2>[…]]]

POST <base URL>/jobs?action=new[&queue=<name>][&delegation_id=<id>]

POST <base URL>/jobs?action={info|status|kill|clean|restart|delegations}

Operations:

  • GET - get list of jobs

  • HEAD - supported

  • POST - job submission and management

  • PUT, DELETE - not supported

Get list of jobs

GET <base URL>/jobs retrieves list of jobs belonging to authenticated user as application/xml or application/json. Returned document contains list of job IDs.

It accepts the optional state parameters. When defined the returned document contains only jobs in the requested state(s).

Example QUERY:

GET https://host.domain.org:443/arex/rest/1.0/jobs HTTP/1.1
Accept: application/xml

The XML response is:

<jobs>
  <job>
    <id>1234567890abcdef</id>
  </job>
  <job>
    <id>fedcba0987654321</id>
  </job>
</jobs>

The JSON is:

{
  "job":[
    {"id":"1234567890abcdef"},
    {"id":"fedcba0987654321"}
  ]
}

Job submission (create a new job)

POST <base URL>/jobs?action=new initiates creation of a new job instance or multiple jobs.

Request body contains job description(s), in one of the supported formats: ADL as Content-type: application/xml or xRSL as Content-type: applicaton/rsl.

The optional queue parameter defines the default value for the computing element queue. The value has same effect as xRLS xrsl_queue (or ADL QueueName) and applied to all job descriptions that does not have it specified. If xRSL/ADL already contains queue the value from xRLS/ADL is used instaed.

The optional delegation_id parameter defines the default value for the delegation ID for data staging. The value has same effect as xRLS xrsl_delagationid (or ADL DelegationID) and applied to all job descriptions that does not have it specified. If xRSL/ADL already contains delegationid the value from xRLS/ADL is used instaed.

To pass multiple job descriptions in document body of the same type:

  • ADL descriptions are enclosed in <ActivityDescriptions> element

  • XRSL uses + to merge multiple jobs.

Response contains 201 code. Response body contains an array of elements corresponding to the sequence of the job descriptions in the requests in the same order. The elemenets of the array in the response contains:

  • status-code: a 3-digit integer result code of the attempt to understand and satisfy the request (according to RFC7231)

  • reason: a short textual description of the Status-Code

  • id: job UUID or None if not assigned (non-successfull submission)

  • state: the job state according to state model or None if not available (non-successfull submission)

The XML response is:

<jobs>
  <job>
    <status-code>201</status-code>
    <reason>Created</reason>
    <id>1234567890abcdef</id>
    <state>ACCEPTING</state>
  </job>
  <job>
    <status-code>500</status-code>
    <reason>Requested RTE is missing</reason>
  </job>
</jobs>

The JSON is:

{
  "job":[
    {
      "status-code":"201",
      "reason":"Created",
      "id":"1234567890abcdef",
      "state":"ACCEPTING"
    },
    {
      "status-code":"500",
      "reason":"Requested RTE is missing",
    }
  ]
}

Jobs management

POST <base URL>/jobs?action={info|status|kill|clean|restart|delegations} - job management operations supporting arrays of jobs.

Request body contains list of jobids as JSON/XML (e.g. output of GET <base URL>/jobs can be reused).

Example of the body in XML:

<jobs>
  <job>
    <id>1234567890abcdef</id>
  </job>
  <job>
    <id>fedcba0987654321</id>
  </job>
</jobs>

And in JSON:

{
  "job":[
    {"id":"1234567890abcdef"},
    {"id":"fedcba0987654321"}
  ]
}

Response depends on the requested action:

Job info

POST <base URL>/jobs?action=info retrieves full information about job(s) according to the GLUE2 activity information XML document, or in JSON format.

Response contains 201 code. Response body contains an array of elements corresponding to the job IDs in the requests. The elemenets of the array in the response contains:

  • status-code: a 3-digit integer result code of the attempt to understand and satisfy the request (according to RFC7231). The 200 is only positive response.

  • reason: a short textual description of the Status-Code

  • id: job UUID

  • info_document: GLUE2 activity information about the job or empty documents if not available (request if not satisfiable)

Job status

POST <base URL>/jobs?action=status retrieves information about job(s) current state.

Response body contains an array of elements corresponding to the job IDs in the requests. The elemenets of the array in the response contains:

  • status-code: a 3-digit integer result code of the attempt to understand and satisfy the request (according to RFC7231). The 200 is only positive response.

  • reason: a short textual description of the Status-Code

  • id: job UUID

  • state: the job state according to state model or None if not available

Killing jobs

POST <base URL>/jobs?action=kill send a request to kill job(s).

Response body contains an array of elements corresponding to the job IDs in the requests. The elemenets of the array in the response contains:

  • status-code: a 3-digit integer result code of the attempt to understand and satisfy the request (according to RFC7231). The response code is 202 to indicate request is queued for later execution and is only positive response.

  • reason: a short textual description of the Status-Code

  • id: job UUID

Clean job files

POST <base URL>/jobs?action=clean send a request to clean job(s) files.

Response body contains an array of elements corresponding to the job IDs in the requests. The elemenets of the array in the response contains:

  • status-code: a 3-digit integer result code of the attempt to understand and satisfy the request (according to RFC7231). The response code is 202 to indicate request is queued for later execution and is only positive response.

  • reason: a short textual description of the Status-Code

  • id: job UUID

Restart job

POST <base URL>/jobs?action=restart send a request to restart job(s).

Response body contains an array of elements corresponding to the job IDs in the requests. The elemenets of the array in the response contains:

  • status-code: a 3-digit integer result code of the attempt to understand and satisfy the request (according to RFC7231). The response code is 202 to indicate request is queued for later execution.

  • reason: a short textual description of the Status-Code

  • id: job UUID

Job delegations

POST <base URL>/jobs?action=delegations - retrieves list of delegations associated with the job.

Response body contains an array of elements corresponding to the job IDs in the requests. The elemenets of the array in the response contains:

  • status-code: a 3-digit integer result code of the attempt to understand and satisfy the request (according to RFC7231), 200 is only positive response

  • reason: a short textual description of the Status-Code

  • id: job UUID

  • delegation_id: an array of assigned delegation IDs

File operations

Files belonging to specific job are operated using <base URL>/jobs/<job id> URL.

Working with session directory

GET <base URL>/jobs/<job id>/session/<path>

DELETE <base URL>/jobs/<job id>/session/<path>

PUT <base URL>/jobs/<job id>/session/<path>

Operations:

  • GET, HEAD, PUT, DELETE - supported for files stored in job’s session directory and perform usual actions.

  • GET, HEAD - for directories retrieves list of stored files (consider WebDAV for format)

  • DELETE - for directories removes whole directory

  • PUT - for directory not supported.

  • POST - not supported.

Delegation functionality

GET <base URL>/delegations[?type={x509|jwt}]

POST <base URL>/delegations?action=new[&type={x509|jwt}]

Operations:

  • GET - retrieves list of delegations belonging to authenticated user

  • HEAD - supported

  • POST - create new delegation

  • PUT, DELETE - not supported

POST <base URL>/delegations/<delegation id>?action=get,renew,delete

PUT <base URL>/delegations/<delegation id>

Operations:

  • GET, HEAD - not supported

  • POST - manage particular delegation ID

  • PUT - store x509 delegation public part for particular delegation ID

Get list of delegations

GET <base URL>/delegations[&type={x509|jwt}] - retrieves list of delegations belonging to authenticated user. It accepts the optional type parameter that allowes to filter delegations based on the type. By default all types are returned. Supported values are: x509 for proxy-certificate delegation and jwt for data staging token delegation.

QUERY:

GET https://host.domain.org:443/arex/rest/1.0/delegations HTTP/1.1
Accept: application/xml

The XML response is:

<delegations>
  <delegation>
    <id>1234567890abcdef</id>
    <type>x509</type>
  </delegation>
  <delegation>
    <id>fedcba0987654321</id>
    <type>jwt</type>
  </delegation>
</delegations>

The JSON formatted response (make consistent across specification):

{
  delegation: [
    { "id":"1234567890abcdef", "type": "x509"},
    { "id":"fedcba0987654321", "type": "jwt"}
  ]
}

New delegation

Delegation protocol depends on the delegation type (x509 or jwt) specified with an optional type parameter. If not explicitly specified the default delegation type is x509 for backward compatibility with REST API 1.0.

X.509 delegation

X.509 delegation is a 2-step process:

  1. Step 1 generates pair of private/public keys on server side and communicates X.509 certificate request to the client.

  2. Client sings public key and stores delegated certificate to finish delegation procedure.

Corresponding REST API calls:

1 step

POST <base URL>/delegations?action=new&type=x509 starts a new delegation process. Response is 201 and contains certificate request of application/x-pem-file type and URL of delegation in Location HTTP header with assigned delegation id.

2 step

PUT <base URL>/delegations/<delegation id> stores public part (2nd step). Request body contains signed certificate (Content-type: application/x-pem-file). Response is 200 on success.

JWT delegation

JWT delegation is a single API request.

POST <base URL>/delegations?action=new&type=jwt stores provided JWT token in the ARC CE delegation database. Request should contain the X-Delegation header that provides bearer token. No verification of the delegation token is performed on ARC CE side. It will be passed as it is via Authorization header to the endpoints supporting JWT. Response is 200 and contains URL of delegation in Location HTTP header with assigned delegation id.

QUERY:

POST https://host.domain.org:443/arex/rest/1.0/delegations?action=new&type=jwt HTTP/1.1
X-Delegation: bearer eyJhbGciOiJSUzI1NiIsInR5cCIgOiAiSldUIiwia2lkIiA6ICI3ZzZGSzBPam43YW.....

Delegations management

Delegations are managed one-by-one. The same delegation ID can be re-used for multiple jobs (submitted separately or in batch).

The delegation ID to be used in the job context required to be either explicitely specified as a part of the job description in a description language defined way (e.g. DelegationID in ADL and xrsl_delagationid in xRSL) or passed as delegation_id parameter during job submission.

POST <base URL>/delegations/<delegation id>?action=get,renew,delete used to manage delegation.

Request body is empty and action is defined by action value.

Response is structured depending on the action:

Get delegation
POST <base URL>/delegations/<delegation id>?action=get depending on delegation type returns:
  • for x509 public part of the stored delegation (application/x-pem-file content type)

  • for JWT stored delegation token (application/jwt content type)

Delete delegation

POST <base URL>/delegations/<delegation id>?action=delete removes delegation. Response is 200 with no body expected.

Renew delegation

The process is similar to creation of the new delegation and depends on delegation type (x509 or jwt).

For x509 the POST <base URL>/delegations/<delegation id>?action=renew API call initiates renewal of delegation. Response is 201 with certificate request of application/x-pem-file type that should be followed bu the PUT <base URL>/delegations/<delegation id> call for signed certificate upload.

For jwt the token stored in the ARC CE delegation database will be replaced by the new one supplied via X-Delegation header. Response is 200 on success.

A-REX control directory files access for debugging purposes

GET <base URL>/jobs/<job id>/diagnose/<file type>

Operations:

  • GET - return the content of file in A-REX control directory for requested jobID

  • HEAD - supported

  • POST, PUT, DELETE - not supported

The <file type> matches the controldir file suffix and can be one of the following:

  • failed

  • local

  • errors

  • description

  • diag

  • comment

  • status

  • acl

  • xml

  • input

  • output

  • input_status

  • output_status

  • statistics

REST Interface Job States

Table 6 State identifiers used with ARC REST API

REST API State Name

Description

A-REX Internal State

ACCEPTING

This is the initial job state. The job has reached the cluster, a session directory was created, the submission client can optionally upload files to the sessiondir. The job waits to be detected by the A-REX, the job processing on the CE hasn’t started yet

ACCEPTED

ACCEPTED

In the ACCEPTED state the newly created job has been detected by A-REX but can’t go to the next state due to an internal A-REX limit. The submission client can optionally upload files to the sessiondir.

PENDING:ACCEPTED

PREPARING

The job is undergoing the data stage-in process, input data is being gathered into the session directory (via external downloads or making cached copies available). During this state the submission client still can upload files to the session directory. This is an I/O heavy job state.

PREPARING

PREPARED

The job successfully completed the data stage-in process and is being held waiting in A-REX’s internal queue before it can be passed over to the batch system

PENDING:PREPARING

SUBMITTING

The job environment (via using RTEs) and the job batch submission script is being prepared to be followed by the submission to the batch system via using the available batch submission client interface

SUBMIT

QUEUING

The job is under the control of the local batch system and is “queuing in the batch system”, waiting for a node/available slot

INLRMS

RUNNING

The job is under the control of the local batch system and is “running in the batch system”, executing on an allocated node under the control of the batch system

INLRMS

HELD

The job is under the control of the local batch system and is being put on hold or being suspended, for some reason the job is in a “pending state” of the batch system

INLRMS

EXITINGLRMS

The job is under the control of the local batch system and is finishing its execution on the worker node, the job is “exiting” from the batch system either because the job is completed or because it was terminated

INLRMS

OTHER

The job is under the control of the local batch system and is in some “other” native batch system state which can not be mapped to any of the previously described batch systems states.

INLRMS

EXECUTED

The job has successfully completed in the batch system. The job is waiting to be picked up by the A-REX for further processing or waiting for an available data stage-out slot.

PENDING:INLRMS

FINISHING

The job is undergoing the data stage-out process, A-REX is moving output data to the specified output file locations, the session directory is being cleaned up. Note that failed or terminated jobs can also undergo the FINISHING state. This is an I/O heavy job state

FINISHING

FINISHED

Successful completion of the job on the cluster. The job has finished ALL its activity on the cluster AND no errors occurred during the job’s lifetime.

FINISHED

FAILED

Unsuccessful completion of the job. The job failed during one of the processing stages. The job has finished ALL its activity on the cluster and there occurred some problems during the lifetime of the job.

FINISHED

KILLING

The job was requested to be terminated by an authorized user and as a result it is being killed. A-REX is terminating any active process related to the job, e.g. it interacts with the LRMS by running the job-cancel script or stops data staging processes. Once the job has finished ALL its activity on the cluster it will be moved to the KILLED state.

CANCELLING

KILLED

The job was terminated as a result of an authorized user request. The job has finished ALL its activity on the cluster.

FINISHED

WIPED

The generated result of jobs are kept available in the session directory on the cluster for a while after the job reaches its final state (FINISHED, FAILED or KILLED). Later, the job’s session directory and most of the job related data are going to be deleted from the cluster when an expiration time is exceeded. Jobs with expired session directory lifetime are “deleted” from the cluster in the sense that only a minimal set of info is kept about such a job and their state is changed to WIPED

DELETED

Status of This Document

This document provides normative specificsation for the ARC REST Interface version 1.1.

This specification was designed by the requirements listed below:

  1. Support for versioning: via URL paths like https://arc.zero:443/arex/rest/1.1/jobs

  2. Usable with simple tools (wget, curl)

  3. Friendly to common HTTP REST frameworks

  4. Interactive access to session directory content

  5. Machine readable error/result codes/messages

  6. No drastic changes to information representation and jobs handling

  7. Support for different response formats: xml, json

Plans for functionality extension post version 1.1:

  1. More effective bulk operations: with HTTP v2, will require HTTP v2 development for HED, this feature is postponed till next versions

  2. Resource information functionality: consider filtering through URL options, consider supporting references (relative URLs) to underlying resources.

  3. Scalability for many jobs and delegations: consider filtering through URL options

  4. Jobs: consider a way to provide list of all jobs per site or per VO to special monitoring agents

  5. For sessiondir access add PATCH for files to modifies part of files. Body format need to be defined, all files treated as binary, currently support only non-standard PUT with ranges.