NAV

Introduction

  /$$$$$$                                         
 /$$__  $$                                        
| $$  \ $$  /$$$$$$   /$$$$$$  /$$    /$$ /$$$$$$ 
| $$$$$$$$ /$$__  $$ |____  $$|  $$  /$$//$$__  $$
| $$__  $$| $$  \ $$  /$$$$$$$ \  $$/$$/| $$$$$$$$
| $$  | $$| $$  | $$ /$$__  $$  \  $$$/ | $$_____/
| $$  | $$|  $$$$$$$|  $$$$$$$   \  $/  |  $$$$$$$
|__/  |__/ \____  $$ \_______/    \_/    \_______/
           /$$  \ $$                              
          |  $$$$$$/                              
           \______/                               

The Agave Platform (http://agaveapi.co) is an open source, science-as-a-service API platform for powering your digital lab. Agave allows you to bring together your public, private, and shared high performance computing (HPC), high throughput computing (HTC), Cloud, and Big Data resources under a single, web-friendly REST API.

  • Run code
  • Manage data
  • Collaborate meaningfully
  • Integrate anywhere

The Agave documentation site contains documentation, guides, tutorials, and lots of examples to help you build your own digital lab.

Conventions

Throughout the documentation you will regularly encounter the following variables. These represent user-specific values that should be replaced when attempting any of the calls using your account. Once you log into this site, these values will be replaced with values appropriate for you to use when copying and pasting the examples on your own.

Variable Description Example
${API_HOST} Base hostname of the API. public.agaveapi.co
${API_VERSION} Version of the API endpoint. v2
${API_USERNAME} Username of the current user. nryan
${API_KEY} Client key used to request an access token from the Agave Auth service. hZ_z3f4Hf3CcgvGoMix0aksN4BOD6
${API_SECRET} Client secret used to request an access token from the Agave Auth service. gTgpCecqtOc6Ao3GmZ_FecVSSV8a
${API_TOKEN} de32225c235cf47b9965997270a1496c

JSON Notation

{
    "active": true,
    "created": "2014-09-04T16:59:33.000-05:00",
    "frequency": 60,
    "id": "0001409867973952-5056a550b8-0001-014",
    "internalUsername": null,
    "lastCheck": [
      {
        "created": "2014-10-02T13:03:25.000-05:00",
        "id": "0001412273000497-5056a550b8-0001-015",
        "message": null,
        "result": "PASSED",
        "type": "STORAGE"
      },
      {
        "created": "2014-10-02T13:03:25.000-05:00",
        "id": "0001411825368981-5056a550b8-0001-015",
        "message": null,
        "result": "FAILED",
        "type": "LOGIN"
      }
    ],
    "lastSuccess": "2014-10-02T11:03:13.000-05:00",
    "lastUpdated": "2014-10-02T13:03:25.000-05:00",
    "nextUpdate": "2014-10-02T14:03:15.000-05:00",
    "owner": "systest",
    "target": "demo.storage.example.com",
    "updateSystemStatus": false,
    "_links": {
        "checks": {
            "href": "https://public.tenants.agaveapi.co/monitor/v2/0001409867973952-5056a550b8-0001-014/checks"
        },
        "notifications": {
            "href": "https://public.tenants.agaveapi.co/notifications/v2/?associatedUuid=0001409867973952-5056a550b8-0001-014"
        },
        "owner": {
            "href": "https://public.tenants.agaveapi.co/profiles/v2/systest"
        },
        "self": {
            "href": "https://public.tenants.agaveapi.co/monitor/v2/0001409867973952-5056a550b8-0001-014"
        },
        "system": {
            "href": "https://public.tenants.agaveapi.co/systems/v2/demo.storage.example.com"
        }
    }
}

When describing the JSON objects passed back and forth with the APIs, Javascript dot notation will be used to refer to individual properties. For example, consider the following JSON object.

  • active refers to the top level active attribute in the response object.
  • lastCheck.[].result generically refers to the result attribute contained within any of the objects contained in the lastCheck array.
  • lastCheck.[0].result specifically refers to the result attribute contained within the first object in the lastCheck array.
  • _links.self.href refers to the href attribute in the checks object within the _links object.

Versioning

The current major version of Agave is given in the URI immediately following the API resource name. For example, if the endpoint is https://public.tenants.agaveapi.co/jobs/v2/, the API version would be v2. The current major version of agave is v2.

Slugs

In certain situations, usually where file system paths and names are involved in some way, Agave will generate slugify object names to make them safe to use. Slugs will be created on the fly by applying the following rules:

  1. Lowercase the string
  2. Replace spaces with a dash
  3. Remove any special characters and punctuation that might require encoding in the URL. Allowed characters are alphanumeric characters, numbers, underscores, and periods.

Secure communication

Agave uses SSL to secure communication with the clients. If HTTPS is not specified in the request, the request will be redirected to a secure channel.

Rate limiting

To make the API fast for everybody, rate limits apply. Unsigned requests are processed at the lowest rate limit. Signed requests with a valid access token benefit from higher rate limits — this is true even if endpoint doesn’t require an access token to be passed in the call.

Requests

The Agave API is based on REST principles: data resources are accessed via standard HTTPS requests in UTF-8 format to an API endpoint. Where possible, the API strives to use appropriate HTTP verbs for each action

Verb Description
GET Used for retrieving resources.
POST Used for creating resources.
PUT Used for manipulating resources or collections.
DELETE Used for deleting resources.

Standard query parameters

Several URL query parameters are common across all services. The following table lists them for reference

Name Values Purpose
offset integer (zero-based) Skips the first offset results in the response.
limit integer Limits the number of responses to, at most, this number.
pretty boolean If true, pretty prints the response. Default false.
naked boolean If true, returns only the value of the result attribute in the standard response wrapper.
filter string A comma-delimited list of fields to return for each object in the response. Each field may be referenced using JSON notation. See the Response Customization for more info.

Experimental query parameters

Starting with the 2.1.10 release, two new query parameters have been introduced into the jobs api as an experimental feature. The following table lists them for reference

Name Values Purpose
sort asc, desc The sort order of the response. asc by default.
sortBy string The field by which to sort the response. Any field present in the full representation of the resource that you are querying is supported. Multiple values are not currently supported.

Responses

All data is received and returned as a JSON object. The Live Docs provide a description of all the retrievable objects.

Response Details

{
    "status": "error",
    "message": "Permission denied. You do not have permission to view this system",
    "version": "2.1.10-r8228",
    "result": {}
}

Apart from the response code, all responses from Agave are in the form of a json object. The object takes the following form.

Key Value Type Value Description
status string “success” if the call succeeded or “error” indicating that the call failed.
message string A short description of the cause of the error.
result object, array The JSON response object or array
version string The current full release version of Agave. Ex “2.2.0-r8228”

Here, for example, is the response that occurs when trying to fetch information for system to which you do not have access:

Naked Responses

In situations where you do not care to parse the wrapper for the raw response data, you may request a naked response from the API by adding naked=true in to the request URL. This will return just the value of the result attribute in the response wrapper.

Formatting

By default, all responses are serialized JSON. To receive pre-formatted JSON, add pretty=true to any query string.

Pagination

Pagination using limit and offset query parameters.

curl -sk -H \
    "Authorization: Bearer ${API_KEY}" \
    "https://public.tenants.agaveapi.co/jobs/v2/?offset=50&limit=25"
jobs-list -o 50 -l 25

All resource collections support a way of paging the dataset, taking an offset and limit as query parameters:

Note that offset numbering is zero-based and that omitting the offset parameter will return the first X elements. By default, all search and listing responses from the Science APIs are paginated in groups of 250 objects. The lone exception being the Files API which will return all results by default.

Check the documentation for the specific endpoint to see specific information.

Timestamps

Timestamps are returned in ISO 8601 format offset for Central Standard Time (-05:00) YYYY-MM-DDTHH:MM:SSZ-05:00.

CORS

Many modern applications choose to implement client-server communication exclusively in Javascript. For this reason, Agave provides cross-origin resource sharing (CORS) support so AJAX requests from a web browser are not constrained by cross-origin requests and can safely make GET, PUT, POST, and DELETE requests to the API.

Hypermedia

{
    "associationIds": [],
    "created": "2013-11-16T11:25:38.900-06:00",
    "internalUsername": null,
    "lastUpdated": "2013-11-16T11:25:38.900-06:00",
    "name": "color",
    "owner": "nryan",
    "uuid": "0001384622738900-5056a550b8-0001-012",
    "value": "red",
    "_links": {
        "self": {
            "href": "https://public.tenants.agaveapi.co/meta/v2/data/0001384622738900-5056a550b8-0001-012"
        },
        "owner": {
            "href": "https://public.tenants.agaveapi.co/profiles/v2/nryan"
        }
    }
}

Agave strives to be a fully descriptive hypermedia API. Given any endpoint, you should be able to walk the API through the links provided in the _links object in each resource representation. The following user metadata object contains two referenced objects. The first, self is common to all objects, and contains the URL of that object. The second, owner contains the URL to the profile of the user who created the object.

Customizing Responses

Returns the user id, name, and email for the authenticated user

curl -sk -H \
    "Authorization: Bearer ${API_KEY}" \
    "https://public.tenants.agaveapi.co/profiles/v2/me?filter=username,email
profiles-list -v --filter=username,email me

The response would look something like the following:

{
  "username": "nryan",
  "email": "nryan@rangers.mlb.com"
}

Returns the name, status, app id, and the url to the archived job output for every user job

curl -sk -H \
    "Authorization: Bearer ${API_KEY}" \
    "https://public.tenants.agaveapi.co/jobs/v2/?limit=2&filter=name,status,appId,_links.archiveData.href
jobs-list -v --limit=2 --filter=name,status,appId,_links.archiveData

The response would look something like the following:

[
  {
    "name" : "demo-pyplot-demo-advanced test-1414139896",
    "status": "FINISHED",
    "appId" : "demo-pyplot-demo-advanced-0.1.0",
    "_links": {
      "archiveData": {
        "href": "https://agave.iplantc.org/jobs/v2/0001414144065563-5056a550b8-0001-007/outputs/listings"
      }
    }
  },
  {
    "name": "demo-pyplot-demo-advanced test-1414270831",
    "status": "FINISHED",
    "appId" : "demo-pyplot-demo-advanced-0.1.0",
    "_links": {
      "archiveData": {
        "href": "https://agave.iplantc.org/jobs/v2/3259859908028273126-242ac115-0001-007/outputs/listings"
      }
    }
  }
]

Returns the system id, type, whether it is your default system, and the hostname from the system’s storage config

/systems/v2/?filter=id,type,default,storage.host  
systems-list -v --limit=2 --filter=id,type,default,storage.host  

The response would look something like the following:

[
  {
    "id": "data.agaveapi.co",
    "type": "STORAGE",
    "default": true,
    "storage": {
      "host": "dtn01.prod.agaveapi.co"
    }
  },
  {
    "id": "docker.tacc.utexas.edu",
    "type": "EXECUTION",
    "default": true,
    "storage": {
      "host": "129.114.6.50"
    }
  }
]

In many situations, Agave may return back too much or too little information in the response to a query. For example, when searching jobs, the inputs and parameters fields are not included in the default summary response objects. You can customize the responses you receive from all the Science APIs using the filter query parameter.

The filter query parameter takes a comma-delimited list of fields to return for each object in the response. Each field may be referenced using JSON notation similar to the search syntax (minus the .[operation] suffix. The examples to the right show sample requests and responses.

Status Codes

The API uses the following response status codes, as defined in the RFC 2616 on successful and unsuccessful requests.

Success Codes

Response Code Meaning Description
200 Success The request succeeded. Life is good.
201 Created The request succeeded and a new resource was created. Only applicable on PUT and POST actions.
202 Accepted The request has been accepted for processing, but the processing has not been completed. Common for all async actions such as job submissions, file transfers, etc.
206 Partial Content The server has fulfilled the partial GET request for the resource. This will always be the return status of a request using a Range header.
301 Moved Permanently The requested resource has been assigned a new permanent URI. You should follow the Location header, repeating the request.
304 Not Modified You requested an action that succeeded, but did not modify the resource. Sound, fury, that whole thing.

Error Codes

Response Code Meaning Description
400 Bad Request Your request was invalid
401 Unauthorized Authentication required, but not provided
403 Forbidden You do not have permission to access the given resource
404 Not Found No resource was found at the given URL
405 Method Not Allowed You tried to access a resource with an invalid method
406 Not Acceptable You requested a response format that isn’t supported
410 Gone The resource you requested has been removed and/or deleted
429 Too Many Requests Curb your enthusiasm. You’re going way to fast.
500 Internal Server Error It’s not you, it’s us. We had a problem processing your request. Try again later.
503 Service Unavailable The service is temporarially unavailable. Please try again later.
504 Gateway Timeout The service, while acting as a gateway or proxy, did not receive a timely response from the upstream server.

SDK

Web API

The Agave Science APIs power the Science-as-a-Service functionality of the Agave Platform. These web APIs allow you to manage all aspects of your code, collaborations, data, and your digital lab.

The Science APIs follow basic REST concepts and use JSON to exchange data. Formal documentation of all endpoints is available in Swagger 1.2 and OpenAPI formats. An interactive API browser is available as well.

Tutorials

Guides

The Agave REST APIs enable applications to create and manage digital laboratories that spans campuses, the cloud, and multiple data centers using a cohesive set of web-friendly interfaces.

Authorization

  /$$$$$$   /$$$$$$              /$$     /$$      
 /$$__  $$ /$$__  $$            | $$    | $$      
| $$  \ $$| $$  \ $$ /$$   /$$ /$$$$$$  | $$$$$$$
| $$  | $$| $$$$$$$$| $$  | $$|_  $$_/  | $$__  $$
| $$  | $$| $$__  $$| $$  | $$  | $$    | $$  \ $$
| $$  | $$| $$  | $$| $$  | $$  | $$ /$$| $$  | $$
|  $$$$$$/| $$  | $$|  $$$$$$/  |  $$$$/| $$  | $$
 \______/ |__/  |__/ \______/    \___/  |__/  |__/                                                          

Most requests to the Agave REST APIs require authorization; that is, the user must have granted permission for an application to access the requested data. To prove that the user has granted permission, the request header sent by the application must include a valid access token.

Before you can begin the authorization process, you will need to register your client application. That will give you a unique client key and secret key to use in the authorization flows.

Supported Authorization Flows

The Agave REST APIs currently supports four authorization flows:

  1. The Authorization Code flow first gets a code then exchanges it for an access token and a refresh token. Since the exchange uses your client secret key, you should make that request server-side to keep the integrity of the key. An advantage of this flow is that you can use refresh tokens to extend the validity of the access token.
  2. The Implicit Grant flow is carried out client-side and does not involve secret keys. The access tokens that are issued are short-lived and there are no refresh tokens to extend them when they expire.
  3. Resource Owner Password Credentials flow is suitable for native and mobile applications as well as web services, this flow allows client applications to obtain an access token for a user by directly providing the user credentials in an authentication request. This flow exposes the user’s credentials to the client application and is primarily used in situations where the client application is highly trusted such as the command line.
  4. The Client Credentials flow enables users to interact with their own protected resources directly without requiring browser interaction. This is a critical addition for use at the command line, in scripts, and in offline programs. This flow assumes the person registering the client application and the user on whose behalf requests are made be the same person.
Flow Can fetch a user’s data by requesting access? Uses secret key? (key exchange must happen server-side!) Access token can be refreshed?
Authorization Code Yes Yes Yes
Implicit Grant Yes No No
Resource Owner Password Credentials Yes Yes Yes
Client Credentials No Yes No
Unauthorized No No No

Token lifetimes

There are two kinds of tokens you will obtained: access and refresh. Access token lifetimes are configured by the organization operating each tenant and vary based on the flow used to obtain them. By default, access tokens are valid for 4 hours.

Authorization Flow Access Token Lifetime Refresh Token Lifetime
Authorization 4 hours infinite
Implicit 1 hour n/a
User Credential Password 4 hours infinite
Client Credentials 4 hours n/a

Authorization Code

The method is suitable for long-running applications in which the user logs in once and the access token can be refreshed. Since the token exchange involves sending your secret key, this should happen on a secure location, like a backend service, not from a client like a browser or mobile apps. This flow is described in RFC-6749. This flow is also the authorization flow used in our REST API Tutorial.

Authorization Code Flow Diagram

1. Your application requests authorization

A typical request will look something like this

https://public.tenants.agaveapi.co/authorize/?client_id=gTgp...SV8a&response_type=code&redirect_uri=https%3A%2F%2Fexample.com%2Fcallback&scope=PRODUCTION&state=866

The authorization process starts with your application sending a request to the Agave authorization service. (The reason your application sends this request can vary: it may be a step in the initialization of your application or in response to some user action, like a button click.) The request is sent to the /authorize endpoint of the Authorization service:

The request will include parameters in the query string:

Request body parameter Value
response_type Required. As defined in the OAuth 2.0 specification, this field must contain the value “code”.
client_id Required. The application’s client ID, obtained when the client application was registered with Agave (see Client Registration).
redirect_uri Required. The URI to redirect to after the user grants/denies permission. This URI needs to have been entered in the Redirect URI whitelist that you specified when you registered your application. The value of redirect_uri here must exactly match one of the values you entered when you registered your application, including upper/lowercase, terminating slashes, etc.
scope Optional. A space-separated list of scopes. Currently only PRODUCTION is supported.
state Optional, but strongly recommended. The state can be useful for correlating requests and responses. Because your redirect_uri can be guessed, using a state value can increase your assurance that an incoming connection is the result of an authentication request. If you generate a random string or encode the hash of some client state (e.g., a cookie) in this state variable, you can validate the response to additionally ensure that the request and response originated in the same browser. This provides protection against attacks such as cross-site request forgery. See RFC-6749.

2. The user is asked to authorize access within the scopes

The Agave Authorization service presents details of the scopes for which access is being sought. If the user is not logged in, they are prompted to do so using their API username and password.

When the user is logged in, they are asked to authorize access to the actions and services defined in the scopes.

3. The user is redirected back to your specified URI

Let’s assume you provided the following callback URL.

https://example.com/callback

After the user accepts (or denies) your request, the Agave Authorization service redirects back to the redirect_uri. If the user has accepted your request, the response query string contains a code parameter with the access code you will use in the next step to retrieve an access token.

Sample success redirect back from the server

https://example.com/callback?code=Pq3S..M4sY&state=866
Query parameter Value
access_token An access token that can be provided in subsequent calls, for example to Spotify Web API services.
token_type Value: “bearer”
expires_in The time period (in seconds) for which the access token is valid.
state The value of the state parameter supplied in the request.

If the user has denied access, there will be no access token and the final URL will have a query string containing the following parameters:

# Sample denial redirect back from the server
https://example.com/callback?error=access_denied&state=867
Query parameter Value
error The reason authorization failed, for example: “access_denied”
state The value of the state parameter supplied in the request.

4. Your application requests refresh and access tokens

POST https://public.tenants.agaveapi.co/token

When the authorization code has been received, you will need to exchange it with an access token by making a POST request to the Agave Authorization service, this time to its /token endpoint. The body of this POST request must contain the following parameters:

Request body parameter Value
grant_type Required. As defined in the OAuth 2.0 specification, this field must contain the value “authorization_code”.
code Required. The authorization code returned from the initial request to the Account’s /authorize endpoint.
redirect_uri Required. This parameter is used for validation only (there is no actual redirection). The value of this parameter must exactly match the value of redirect_uri supplied when requesting the authorization code.
client_id Required. The application’s client ID, obtained when the client application was registered with Agave (see Client Registration).
client_secret Required. The application’s client secret key, obtained when the client application was registered with Agave (see Client Registration).

5. The tokens are returned to your application

# An example cURL request
curl -X POST -d "grant_type= authorization_code"
    -d "code=Pq3S..M4sY"
    -d "client_id=gTgp...SV8a"
    -d "client_secret=hZ_z3f...BOD6"
    -d "redirect_uri=https%3A%2F%2Fwww.foo.com%2Fauth"
    https://public.tenants.agaveapi.co/token

The response would look something like this:

{
    "access_token": "a742...12d2",
    "expires_in": 14400,
    "refresh_token": "d77c...Sacf",
    "token_type": "bearer"
}

On success, the response from the Agave Authorization service has the status code 200 OK in the response header, and a JSON object with the fields in the following table in the response body:

Key Value type Value description
access_token string An access token that can be provided in subsequent calls, for example to Agave REST APIs.
token_type string How the access token may be used: always “Bearer”.
expires_in int The time period (in seconds) for which the access token is valid. (Maximum 14400 seconds, or 4 hours.)
refresh_token string A token that can be sent to the Spotify Accounts service in place of an authorization code. (When the access code expires, send a POST request to the Accounts service /token endpoint, but use this code in place of an authorization code. A new access token will be returned. A new refresh token might be returned too.)

6. Use the access token to access the Agave REST APIs

Make a call to the API

curl -H "Authorization: Bearer a742...12d2"
    https://public.tenants.agaveapi.co/profiles/v2/me?pretty=true&naked=true

The response would look something like this:

{
    "create_time": "20140905072223Z",
    "email": "rjohnson@mlb.com",
    "first_name": "Randy",
    "full_name": "Randy Johnson",
    "last_name": "Johnson",
    "mobile_phone": "(123) 456-7890",
    "phone": "(123) 456-7890",
    "status": "Active",
    "uid": 0,
    "username": "rjohnson"
}

Once you have a valid access token, you can include it in Authorization header for all subsequent requests to APIs in the Platform.

7. Requesting access token from refresh token

curl -sku "Authorization: Basic Qt3c...Rm1y="
    -d grant_type=refresh_token
    -d refresh_token=d77c...Sacf
    https://public.tenants.agaveapi.co/token

The response would look something like this.

{
    "access_token": "61e6...Mc96",
    "expires_in": 14400,
    "token_type": "bearer"
}

Access tokens are deliberately set to expire after a short time, usually 4 hours, after which new tokens may be granted by supplying the refresh token originally obtained during the authorization code exchange.

The request is sent to the token endpoint of the Agave Authorization service:

POST https://public.tenants.agaveapi.co/token

The body of this POST request must contain the following parameters:

Request body parameter Value
grant_type Required. Set it to “refresh_token”. refresh_token
refresh_token Required. The refresh token returned from the authorization code exchange.

The header of this POST request must contain the following parameter:

Header parameter Value
Authorization Required. Base 64 encoded string that contains the client ID and client secret key. The field must have the format: Authorization: Basic . (This can also be achieved with curl using the `-u` option and specifying the raw colon separated client_id and client_secret.)

Implicit Grant

Implicit grant flow is for clients that are implemented entirely using JavaScript and running in resource owner’s browser. You do not need any server side code to use it. This flow is described in RFC-6749.

Implicit Flow

1. Your application requests authorization

https://public.tenants.agaveapi.co/authorize?client_id=gTgp...SV8a&redirect_uri=http:%2F%2Fexample.com%2Fcallback&scope=PRODUCTION&response_type=token&state=867

The flow starts off with your application redirecting the user to the /authorize endpoint of the Authorization service. The request will include parameters in the query string:

Request body parameter Value
response_type Required. As defined in the OAuth 2.0 specification, this field must contain the value “token”.
client_id Required. The application’s client ID, obtained when the client application was registered with Agave (see Client Registration).
redirect_uri Required. This parameter is used for validation only (there is no actual redirection). The value of this parameter must exactly match the value of redirect_uri supplied when requesting the authorization code.
scope Required. A space-separated list of scopes. Currently only PRODUCTION is supported.
state Optional, but strongly recommended. The state can be useful for correlating requests and responses. Because your redirect_uri can be guessed, using a state value can increase your assurance that an incoming connection is the result of an authentication request. If you generate a random string or encode the hash of some client state (e.g., a cookie) in this state variable, you can validate the response to additionally ensure that the request and response originated in the same browser. This provides protection against attacks such as cross-site request forgery. See RFC-6749.
show_dialog Optional. Whether or not to force the user to approve the app again if they’ve already done so. If false (default), a user who has already approved the application may be automatically redirected to the URI specified by redirect_uri. If true, the user will not be automatically redirected and will have to approve the app again.

2. The user is asked to authorize access within the scopes

The Agave Authorization service presents details of the scopes for which access is being sought. If the user is not logged in, they are prompted to do so using their API username and password.

When the user is logged in, they are asked to authorize access to the services defined in the scopes. By default all of the Core Science APIs fall under a single scope called, PRODUCTION.

3. The user is redirected back to your specified URI

Let’s assume we specified the following callback address.

https://example.com/callback

A valid success response would be

https://example.com/callback?access_token=Vr17...amUa&token_type=bearer&expires_in=3600&state=867

After the user grants (or denies) access, the Agave Authorization service redirects the user to the redirect_uri. If the user has granted access, the final URL will contain the following data parameters in the query string.

Query parameter Value
access_token An access token that can be provided in subsequent calls, for example to Spotify Web API services.
token_type Value: “bearer”
expires_in The time period (in seconds) for which the access token is valid.
state The value of the state parameter supplied in the request.

If the user has denied access, there will be no access token and the final URL will have a query string containing the following parameters:

A failed response would resemble something like

https://example.com/callback?error=access_denied&state=867
Query parameter Value
error The reason authorization failed, for example: “access_denied”
state The value of the state parameter supplied in the request.

4. Use the access token to access the Agave REST APIs

curl -H "Authorization: Bearer 61e6...Mc96" https://public.tenants.agaveapi.co/profiles/v2/me?pretty=true

The response would look something like this:

{
    "create_time": "20140905072223Z",
    "email": "nryan@mlb.com",
    "first_name": "Nolan",
    "full_name": "Nolan Ryan",
    "last_name": "Ryan",
    "mobile_phone": "(123) 456-7890",
    "phone": "(123) 456-7890",
    "status": "Active",
    "uid": 0,
    "username": "nryan"
}

The access token allows you to make requests to any of the Agave REST APIs on behalf of the authenticated user.

Resource Owner Password Credentials

The method is suitable for scenarios where there is a high degree of trust between the end-user and the client application. This could be a Desktop application, shell script, or server-to-server communication where user authorization is needed. This flow is described in RFC-6749.

1. Your application requests authorization

curl -sku "Authorization: Basic Qt3c...Rm1y="
    -d grant_type=password
    -d username=rjohnson
    -d password=password
    -d scope=PRODUCTION
    https://public.tenants.agaveapi.co/token

The response would look something like this:

{
    "access_token": "3Dsr...pv21",
    "expires_in": 14400,
    "refresh_token": "dyVa...MqR0",
    "token_type": "bearer"
}

The request is sent to the /token endpoint of the Agave Authentication service. The request will include the following parameters in the request body:

Request body parameter Value
grant_type Required. Set it to “refresh_token”.
username Required. The username of an active API user.
password Required. The password of an active API user.
scope Required. A space-separated list of scopes. Currently only PRODUCTION is supported.

The header of this POST request must contain the following parameter:

Header parameter Value
Authorization Required. Base 64 encoded string that contains the client ID and client secret key. The field must have the format: Authorization: Basic . (This can also be achieved with curl using the `-u` option and specifying the raw colon separated client_id and client_secret.)
https://example.com/callback?error=access_denied

If the user has not accepted your request or an error has occurred, the response query string contains an error parameter indicating the error that occurred during login. For example:

2. Use the access token to access the Agave REST APIs

curl -H "Authorization: Bearer 3Dsr...pv21"
    https://public.tenants.agaveapi.co/profiles/v2/me?pretty=true

The response would look something like this:

{
    "create_time": "20140905072223Z",
    "email": "rjohnson@mlb.com",
    "first_name": "Randy",
    "full_name": "Randy Johnson",
    "last_name": "Johnson",
    "mobile_phone": "(123) 456-7890",
    "phone": "(123) 456-7890",
    "status": "Active",
    "uid": 0,
    "username": "rjohnson"
}

The access token allows you to make requests to any of the Agave REST APIs on behalf of the authenticated user.

3. Requesting access token from refresh token

curl -sku "Authorization: Basic Qt3c...Rm1y="
    -d grant_type=refresh_token
    -d refresh_token=dyVa...MqR0
    -d scope=PRODUCTION
    https://public.tenants.agaveapi.co/token

The response would look something like this:

{
    "access_token": "8erF...NGly",
    "expires_in": 14400,
    "token_type": "bearer"
}

Access tokens are deliberately set to expire after a short time, usually 4 hours, after which new tokens may be granted by supplying the refresh token obtained during original request.

The request is sent to the token endpoint of the Agave Authorization service. The body of this POST request must contain the following parameters:

Request body parameter Value
grant_type Required. Set it to “refresh_token”. refresh_token
refresh_token Required. The refresh token returned from the authorization code exchange.
scope Required. A space-separated list of scopes. Required. Currently only PRODUCTION is supported.

The header of this POST request must contain the following parameter:

Header parameter Value
Authorization Required. Base 64 encoded string that contains the client ID and client secret key. The field must have the format: Authorization: Basic . (This can also be achieved with curl using the `-u` option and specifying the raw colon separated client_id and client_secret.)

Client Credentials

The method is suitable for authenticating your requests to the Agave REST API. This flow is described in RFC-6749.

1. Your application requests authorization

curl -sku "Authorization: Basic Qt3c...Rm1y="
    -d grant_type=client_credentials
    -d scope=PRODUCTION
    https://public.tenants.agaveapi.co/token

The response would look something like this:

{
    "access_token": "61e6...Mc96",
    "expires_in": 14400,
    "token_type": "bearer"
}

The request is sent to the /token endpoint of the Agave Authentication service. The request must include the following parameters in the request body:

Request body parameter Value
grant_type Required. Set it to “refresh_token”.
scope Optional. A space-separated list of scopes. Currently on PRODUCTION is supported.

The header of this POST request must contain the following parameter:

Header parameter Value
Authorization Required. Base 64 encoded string that contains the client ID and client secret key. The field must have the format: Authorization: Basic . (This can also be achieved with curl using the `-u` option and specifying the raw colon separated client_id and client_secret.)

2. Use the access token to access the Agave REST APIs

curl -H "Authorization: Bearer 61e6...Mc96"
     https://public.tenants.agaveapi.co/profiles/v2/me

The response would look something like this:

{
    "email": "nryan@mlb.com",
    "firstName" : "Nolan",
    "lastName" : "Ryan",
    "position" : "null",
    "institution" : "Houston Astros",
    "phone": "(123) 456-7890",
    "fax" : null,
    "researchArea" : null,
    "department" : null,
    "city" : "Houston",
    "state" : "TX",
    "country" : "USA",
    "gender" : "M",
    "_links" : {
      "self" : {
        "href" : "https://public.tenants.agaveapi.co/profiles/v2/nryan"
      },
      "users" : {
        "href" : "https://public.tenants.agaveapi.co/profiles/v2/nryan/users"
      }
    }
}

The access token allows you to make requests to any of the Agave REST APIs on behalf of the authenticated user.

Clients and API Keys

  /$$$$$$  /$$ /$$                       /$$
 /$$__  $$| $$|__/                      | $$
| $$  \__/| $$ /$$  /$$$$$$  /$$$$$$$  /$$$$$$   /$$$$$$$
| $$      | $$| $$ /$$__  $$| $$__  $$|_  $$_/  /$$_____/
| $$      | $$| $$| $$$$$$$$| $$  \ $$  | $$   |  $$$$$$
| $$    $$| $$| $$| $$_____/| $$  | $$  | $$ /$$\____  $$
|  $$$$$$/| $$| $$|  $$$$$$$| $$  | $$  |  $$$$//$$$$$$$/
 \______/ |__/|__/ \_______/|__/  |__/   \___/ |_______/

By now you already have a user account. Your user account identifies you to the web applications you interact with. A username and password is sufficient for interacting with an application because the application has a user interface, so it knows that the authenticated user is the same one interacting with it. The Agave API does not have a user interface, so simply providing it a username and password is not sufficient. Agave needs to know both the user on whose behalf it is acting as well as the client application that is making the call. Whereas every person has a single user account, they may leverage multiple services to do their daily work. They may start out using Agave ToGo to kick of an analysis, then switch to MyPlant to discuss some results, then receive an Slack notice that new data has been shared with them, click a PostIt link that allows them to download the data directly to their desktop, edit the file locally, and save it in a local folder that syncs with their iPlant cloud storage in the background.

In each of the above interactions, the user is the same, but the context with which they interact with the Agave is different. Further, the above interactions all involved client applications developed by the same organization. The situation is further complicated when one or more 3rd party client applications are used to leverage the infrastructure. Agave needs to track both the users and client applications with whom it interacts. It does this through the issuance of API keys.

Agave uses OAuth2 to authenticate users and make authorization decisions about what APIs client applications have permission to access. A discussion of OAuth2 is out of the context of this tutorial. You can read more about it on the OAuth2 website or from the websites of any of the many other service providers using it today. In this section, we will walk you through getting your API keys so we can stay focused on learning how to interact with the Agave’s APIs.

Creating a new client application

In order to interact with any of the Agave APIs, you will need to first get a set of API keys. You can get your API keys from the Clients service. The example below shows how to get your API keys using both curl and the Agave CLI.

curl -sku "$API_USERNAME:$API_PASSWORD" -X POST -d "client_name=my_cli_app" -d "description=Client app used for scripting up cool stuff" https://public.tenants.agaveapi.co/clients/v2
clients-create -S -v -N my_cli_app -D "Client app used for scripting up cool stuff"

Note: the -S option will store the new API keys for future use so you don’t need to manually enter then when you authenticate later.

The response to this call will look something like:

{  
   "callbackUrl":"",
   "key":"gTgp...SV8a",
   "secret":"hZ_z3f...BOD6",
   "description":"Client app used for scripting up cool stuff",
   "name":"my_cli_app",
   "tier":"Unlimited",
   "_links":{  
      "self":{  
         "href":"https://public.tenants.agaveapi.co/clients/v2/my_cli_app"
      },
      "subscriber":{  
         "href":"https://public.tenants.agaveapi.coprofiles/v2/nryan"
      },
      "subscriptions":{  
         "href":"https://public.tenants.agaveapi.co/clients/v2/my_cli_app/subscriptions/"
      }
   }
}

Your API keys should be kept in a secure place and not shared with others. This will prevent other, unauthorized client applications from impersonating your application. If you are developing a web application, you should also provide a valid callbackUrl when creating your keys. This will reduce the risk of your keys being reused even if they are compromised. You should also create a unique set of API keys for each client application you develop. This will allow you to better monitor your usage on a client application-to-application basis and reduce the possibility of inadvertently hitting usage quotas due to cumulative usage across client applications.

Listing your existing client applications

curl -sku "$API_USERNAME:$API_PASSWORD" https://public.tenants.agaveapi.co/clients/v2
clients-list -v

The response to this call will look something like:

[  
   {  
      "callbackUrl":"",
      "key":"xn8b...0y3d",
      "description":"",
      "name":"DefaultApplication",
      "tier":"Unlimited",
      "_links":{  
         "self":{  
            "href":"https://public.tenants.agaveapi.co/clients/v2/DefaultApplication"
         },
         "subscriber":{  
            "href":"https://public.tenants.agaveapi.coprofiles/v2/nryan"
         },
         "subscriptions":{  
            "href":"https://public.tenants.agaveapi.co/clients/v2/DefaultApplication/subscriptions/"
         }
      }
   },
   {  
      "callbackUrl":"",
      "key":"gTgp...SV8a",
      "description":"Client app used for scripting up cool stuff",
      "name":"my_cli_app",
      "tier":"Unlimited",
      "_links":{  
         "self":{  
            "href":"https://public.tenants.agaveapi.co/clients/v2/my_cli_app"
         },
         "subscriber":{  
            "href":"https://public.tenants.agaveapi.coprofiles/v2/nryan"
         },
         "subscriptions":{  
            "href":"https://public.tenants.agaveapi.co/clients/v2/my_cli_app/subscriptions/"
         }
      }
   }
]

Over time you may develop several client applications. Managing several sets of API keys can become tricky. You can see which applications you have created by querying the Clients service.

Deleting client registrations

curl -sku "$API_USERNAME:$API_PASSWORD" -X DELETE https://public.tenants.agaveapi.co/clients/v2/my_cli_app
clients-delete -v my_cli_app

The response to this call is simply a null result object.

At some point you may need to delete a client. You can do this by requesting a DELETE on your client in the Clients service.

Listing current subscriptions

curl -sku "$API_USERNAME:$API_PASSWORD" https://public.tenants.agaveapi.co/clients/v2/my_cli_app/subscriptions
clients-subscriptions-list -v my_cli_app

The response to this call will look something like:

[
  {
     "context":"/apps",
     "name":"Apps",
     "provider":"admin",
     "status":"PUBLISHED",
     "version":"v2",
     "tier":"Unlimited",
     "_links":{
        "api":{
           "href":"https://public.tenants.agaveapi.co/apps/v2/"
        },
        "client":{
           "href":"https://public.tenants.agaveapi.co/clients/v2/systest_test_client"
        },
        "self":{
           "href":"https://public.tenants.agaveapi.co/clients/v2/systest_test_client/subscriptions/"
        }
     }   
  },
  {
     "context":"/files",
     "name":"Files",
     "provider":"admin",
     "status":"PUBLISHED",
     "version":"v2",
     "tier":"Unlimited"
     "_links":{
        "api":{
           "href":"https://public.tenants.agaveapi.co/files/v2/"
        },
        "client":{
           "href":"https://public.tenants.agaveapi.co/clients/v2/systest_test_client"
        },
        "self":{
           "href":"https://public.tenants.agaveapi.co/clients/v2/systest_test_client/subscriptions/"
        }
     }
  },
  ...
]

When you register a new client application and get your API keys, you are given access to all the Agave APIs by default. You can see the APIs you have access to by querying the subscriptions collection of your client.

Updating client subscriptions

curl -sku "$API_USERNAME:$API_PASSWORD" -X POST -d "name=transforms" https://public.tenants.agaveapi.co/clients/v2/my_cli_app/subscriptions
clients-subscriptions-update -v -N transforms my_cli_app

You can also use a wildcard to resubscribe to all active APIs.

curl -sku "$API_USERNAME:$API_PASSWORD" -X POST -d "name=*" https://public.tenants.agaveapi.co/clients/v2/my_cli_app/subscriptions
clients-subscriptions-update -v -N * my_cli_app

The response to this call will be a JSON array identical to the one returned when listing your subscriptions.

Over time, new APIs will be deployed. When this happens you will need to subscribe to the new APIs. You can do this by POSTing a request to the subscription collection with the information about the new API.

Systems

  /$$$$$$                     /$$
 /$$__  $$                   | $$
| $$  \__//$$   /$$ /$$$$$$$/$$$$$$   /$$$$$$ /$$$$$$/$$$$
|  $$$$$$| $$  | $$/$$_____|_  $$_/  /$$__  $| $$_  $$_  $$
 \____  $| $$  | $|  $$$$$$  | $$   | $$$$$$$| $$ \ $$ \ $$
 /$$  \ $| $$  | $$\____  $$ | $$ /$| $$_____| $$ | $$ | $$
|  $$$$$$|  $$$$$$$/$$$$$$$/ |  $$$$|  $$$$$$| $$ | $$ | $$
 \______/ \____  $|_______/   \___/  \_______|__/ |__/ |__/
          /$$  | $$
         |  $$$$$$/
          \______/

A system in Agave represents a server or collection of servers. A server can be physical, virtual, or a collection of servers exposed through a single hostname or ip address. Systems are identified and referenced in Agave by a unique ID unrelated to their ip address or hostname. Because of this, a single physical system may be registered multiple times. This allows different users to configure and use a system in whatever way they need to for their specific needs.

Systems come in two flavors: storage and execution. Storage systems are only used for storing and interacting with data. Execution systems are used for running apps (aka jobs or batch jobs) as well as storing and interacting with data.

The Systems service gives you the ability to add and discover storage and compute resources for use in the rest of the API. You may add as many or as few storage systems as you need to power your digital lab. When you register a system, it is private to you and you alone. Systems can also be published into the public space for all users to use. Depending on who is administering Agave for your organization, this may have already happened and you may already have one or more storage systems available to you by default.

In this tutorial we walk you through how to discovery, manage, share, and configure systems for your specific needs. This tutorial is best done in a hands-on manner, so if you do not have a compute or storage system of your own to use, you can grab a VM from our sandbox.

Discovering systems

curl -sk -H "Authorization: Bearer $ACCESS_TOKEN" https://public.tenants.agaveapi.co/systems/v2/
systems-list -v

The response will be something like this:

[
  {
    "id" : "data.agaveapi.co",
    "name" : "iPlant Data Store",
    "type" : "STORAGE",
    "description" : "The iPlant Data Store is where your data are stored. The Data Store is cloud-based and is the central repository from which data is accessed by all of iPlant's technologies.",
    "status" : "UP",
    "public" : true,
    "default" : true,
    "_links" : {
      "self" : {
        "href" : "https://public.tenants.agaveapi.co/systems/v2/data.agaveapi.co"
      }
    }
  },
  {
    "id" : "docker.iplantcollaborative.org",
    "name" : "Demo Docker VM",
    "type" : "EXECUTION",
    "description" : "Atmosphere VM used for Docker demonstrations and tutorials.",
    "status" : "UP",
    "public" : true,
    "default" : false,
    "_links" : {
      "self" : {
        "href" : "https://public.tenants.agaveapi.co/systems/v2/docker.iplantcollaborative.org"
      }
    }
  }
]

The Systems service allows you to list and search for systems you have registered and systems that have been shared with you. To get a list of all your systems, make a GET request on the Systems collection.

System description can get rather verbose, so a summary object is returned when listing a resource collection. The summary object contains the most critical fields in order to reduce response size when retrieving a user’s systems. You can customize this behavior using the filter query parameter.

Filtering results

List all systems (up to the page limit)

curl -sk -H "Authorization: Bearer $ACCESS_TOKEN" https://public.tenants.agaveapi.co/systems/v2/?type=storage
systems-list -v -S

Only execution systems

curl -sk -H "Authorization: Bearer $ACCESS_TOKEN" https://public.tenants.agaveapi.co/systems/v2/?type=execution
systems-list -v -E

Only public systems

curl -sk -H "Authorization: Bearer $ACCESS_TOKEN" https://public.tenants.agaveapi.co/systems/v2/?publicOnly=true
systems-list -v -P

Only private systems

curl -sk -H "Authorization: Bearer $ACCESS_TOKEN" https://public.tenants.agaveapi.co/systems/v2/?privateOnly=true
systems-list -v -Q

Only return default systems

curl -sk -H "Authorization: Bearer $ACCESS_TOKEN" https://public.tenants.agaveapi.co/systems/v2/?default=true
systems-list -v -D

You can further filter the results by type, scope, and default status. See the search section for further filtering options.

System details

curl -sk -H "Authorization: Bearer $ACCESS_TOKEN" https://public.tenants.agaveapi.co/systems/v2/data.agaveapi.co
systems-list -v data.agaveapi.co

The response will be something like this:

{
  "site": "agaveapi.co",
  "id": "data.agaveapi.co",
  "revision": 4,
  "default": true,
  "lastModified": "2016-09-30T21:43:11.000-05:00",
  "status": "UP",
  "description": "Cloud storage system for the Agave Public tenant",
  "name": "Agave Cloud Storage",
  "owner": "dooley",
  "_links": {
    "roles": {
      "href": "https://public.agaveapi.co/systems/v2/data.agaveapi.co/roles"
    },
    "credentials": {
      "href": "https://public.agaveapi.co/systems/v2/data.agaveapi.co/credentials"
    },
    "self": {
      "href": "https://public.agaveapi.co/systems/v2/data.agaveapi.co"
    },
    "metadata": {
      "href": "https://public.agaveapi.co/meta/v2/data/?q=%7B%22associationIds%22%3A%224602981590618992154-242ac116-0001-006%22%7D"
    }
  },
  "globalDefault": true,
  "available": true,
  "uuid": "4602981590618992154-242ac116-0001-006",
  "public": true,
  "type": "STORAGE",
  "storage": {
    "mirror": false,
    "port": 22,
    "homeDir": "/home",
    "protocol": "SFTP",
    "host": "corral.tacc.utexas.edu",
    "publicAppsDir": "/apps",
    "proxy": null,
    "rootDir": "/gpfs/corral3/repl/projects/agave/root",
    "auth": {
      "type": "SSHKEYS"
    }
  }
}

To query for detailed information about a specific system, add the system id to the url and make another GET request.

This time, the response will be a JSON object with a full system description. The following is the description of a storage system. In the next section we talk more about storage systems and how to register one of your own.

Storage systems

A storage systems can be thought of as an individual data repository that you want to access through Agave. The following JSON object shows how a basic storage systems is described.

{
   "id":"sftp.storage.example.com",
   "name":"Example SFTP Storage System",
   "type":"STORAGE",
   "description":"My example storage system using SFTP to store data for testing",
   "storage":{
      "host":"storage.example.com",
      "port":22,
      "protocol":"SFTP",
      "rootDir":"/",
      "homeDir":"/home/systest",
      "auth":{
         "username":"systest",
         "password":"changeit",
         "type":"PASSWORD"
      }
   }
}

The first four attribute are common to both storage and execution systems. The storage attribute describes the connectivity and authentication information needed to connect to the remote system. Here we describe a SFTP server accessible on port 22 at host storage.example.com. We specify that we want the rootDir, or virtual system root exposed through Agave, to be the system’s physical root directory, and we want the authenticated user’s home directory to be the homeDir, or virtual home directory and base of all relative paths given to Agave. Finally, we tell Agave to use password based authentication and provided the necessary credentials.

The full list of storage system attributes is described in the following table.

Attribute Type Description
available boolean Whether the system is currently available for use in the API. Unavailable systems will not be visible to anyone but the owner. This differs from the status attribute in that a system may be UP, but not available for use in Agave. Defaults to true
description string Verbose description of this system.
id string Required: A unique identifier you assign to the system. A system id must be globally unique across a tenant and cannot be reused once deleted.
name string Required: Common display name for this system.
site string The site associated with this system. Primarily for logical grouping.
status UP, DOWN, MAINTENANCE, UNKNOWN The functional status of the system. Systems must be in UP status to be used.
storage JSON Object Required: Storage configuration describing the storage config defining how to connect to this system for data staging.
type STORAGE, EXECUTION Required: Must be STORAGE.

Supported data and authentication protocols

The example above described a system accessible by SFTP. Agave supports many different data and authentication protocols for interacting with your data. Sample configurations for many protocol combinations are given below.

Sample storage system definition with each supported data protocol and authentication configuration.

{
   "id":"sftp.storage.example.com",
   "name":"Example SFTP Storage System",
   "status":"UP",
   "type":"STORAGE",
   "description":"My example storage system using SFTP to store data for testing",
   "site":"example.com",
   "storage":{
      "host":"storage.example.com",
      "port":22,
      "protocol":"SFTP",
      "rootDir":"/",
      "homeDir":"/home/systest",
      "auth":{
         "username":"systest",
         "password":"changeit",
         "type":"PASSWORD"
      }
   }
}

In each of the examples above, the storage objects were slightly different, each unique to the protocol used. Descriptions of every attribute in the storage> object and its children are given in the following tables.

storage attributes give basic connectivity information describing things like how to connect to the system and on what port.

Attribute Type Description
auth JSON object Required: A JSON object describing the default authentication credential for this system.
container string The container to use when interacting with an object store. Specifying a container provides isolation when exposing your cloud storage accounts so users do not have access to your entire storage account. This should be used in combination with delegated cloud credentials such as an AWS IAM user credential.
homeDir string The path on the remote system, relative to rootDir to use as the virtual home directory for all API requests. This will be the base of any requested paths that do not being with a ’/’. Defaults to ’/’, thus being equivalent to rootDir.
host string Required: The hostname or ip address of the storage server
port int Required: The port number of the storage server.
mirror boolean Whether the permissions set on the server should be pushed to the storage system itself. Currently, this only applies to IRODS systems.
protocol FTP, GRIDFTP, IRODS, IRODS4, LOCAL, S3, SFTP Required: The protocol used to authenticate to the storage server.
publicAppsDir string The path on the remote system where apps will be stored if this system is used as the default public storage system.
proxy JSON Object The proxy server through with Agave will tunnel when submitting jobs. Currently proxy servers will use the same authentication mechanism as the target server.
resource string The name of the default resource to use when defining an IRODS system.
rootDir string The path on the remote system to use as the virtual root directory for all API requests. Defaults to ’/’.
zone string The name of the default zone to use when defining an IRODS system.

storage.auth attributes give authentication information describing how to authenticate to the system specified in the storage config above.

Attribute Type Description
credential string The credential used to authenticate to the remote system. Depending on the authentication protocol of the remote system, this could be an OAuth Token, X.509 certificate.
internalUsername string The username of the internal user associated with this credential.
password string The password on the remote system used to authenticate.
privateKey string The private ssh key used to authenticate to the remote system.
publicKey string The public ssh key used to authenticate to the remote system.
server JSON object A JSON object describing the authentication server from which a valid credential may be obtained. Currently only auth type X509 supports this attribute.
type APIKEYS, LOCAL, PAM, PASSWORD, SSHKEYS, or X509 Required: The path on the remote system where apps will be stored if this system is used as the default public storage system.
username string The remote username used to authenticate.

storage.auth.server attributes give information about how to obtain a credential that can be used in the authentication process. Currently only systems using the X509 authentication can leverage this feature to communicate with MyProxy and MyProxy Gateway servers.

Attribute Type Description
name string A descriptive name given to the credential server
endpoint string Required: The endpoint of the authentication server.
port integer Required: The port on which to connect to the server.
protocol MPG, MYPROXY Required: The protocol with which to obtain an authentication credential.

system.proxy configuration attributes give information about how to connect to a remote system through a proxy server. This often happens when the target system is behind a firewall or resides on a NAT. Currently proxy servers can only reuse the authentication configuration provided by the target system.

Attribute Type Description
name string Required: A descriptive name given to the proxy server.
host string Required: The hostname of the proxy server.
port integer Required: The port on which to connect to the proxy server. If null, the port in the parent storage config is used.

Creating a new storage system

curl -sk -H "Authorization: Bearer $ACCESS_TOKEN" -F "fileToUpload=@sftp-password.json" https://public.tenants.agaveapi.co/systems/v2
systems-addupdate -v -F sftp-password.json

The response from the service will be similar to the following:

{
  "site": null,
  "id": "sftp.storage.example.com",
  "revision": 1,
  "default": false,
  "lastModified": "2016-09-06T17:46:42.621-05:00",
  "status": "UP",
  "description": "My example storage system using SFTP to store data for testing",
  "name": "Example SFTP Storage System",
  "owner": "nryan",
  "globalDefault": false,
  "available": true,
  "uuid": "4036169328045649434-242ac117-0001-006",
  "public": false,
  "type": "STORAGE",
  "storage": {
    "mirror": false,
    "port": 22,
    "homeDir": "/home/systest",
    "protocol": "SFTP",
    "host": "storage.example.com",
    "publicAppsDir": null,
    "proxy": null,
    "rootDir": "/",
    "auth": {
      "type": "PASSWORD"
    }
  },
  "_links": {
    "roles": {
      "href": "https://public.agaveapi.co/systems/v2/sftp.storage.example.com/roles"
    },
    "owner": {
      "href": "https://public.agaveapi.co/profiles/v2/nryan"
    },
    "credentials": {
      "href": "https://public.agaveapi.co/systems/v2/sftp.storage.example.com/credentials"
    },
    "self": {
      "href": "https://public.agaveapi.co/systems/v2/sftp.storage.example.com"
    },
    "metadata": {
      "href": "https://public.agaveapi.co/meta/v2/data/?q=%7B%22associationIds%22%3A%224036169328045649434-242ac117-0001-006%22%7D"
    }
  }
}

Congratulations, you just added your first system. This storage system can now be used by the Files service to manage data, the Transfer service as a source or destination of data movement, the Apps service as a application repository, and the Jobs Service as both a staging and archiving destination.

Notice that the JSON returned from the Systems service is different than what was submitted. Several fields have been added, and several other have been removed. On line 3, the UUID of the system has been added. This is the same UUID that is used in notifications and metadata references. On line 5, the status value was added in and assigned a default value since we did not specify it. Ditto for the site attribute on line 8.

Three new fields were added on lines 9-11. revision is the number of times this system has been updated. This being our first time registering the system, it is set to 1. public tells whether this system is published as a shared resource for all users. We will cover this more in the section on System scope. lastModified is a timestamp of the last time the system was updated.

In the storage object, the publicAppsDir and mirror fields were both added and set to their default values. In this example we are not using a proxy server, so it was defaulted to null. Last, and most important, all authentication information has been omitted from the response object. Regardless of the authentication type, no user credential information will ever be returned once they are stored.

Execution Systems

In contrast to storage systems, execution systems specify compute resources where application binaries can be run. In addition to the storage attribute found in storage systems, execution systems also have a login attribute describing how to connect to the remote system to submit jobs as well as several other attributes that allow Agave to determine how to stage data and run software on the system. The full list of execution system attributes is given in the following tables.

Name Type Description
available boolean Whether the system is currently available for use in the API. Unavailable systems will not be visible to anyone but the owner. This differs from the status attribute in that a system may be UP, but not available for use in Agave. Defaults to true
description string Verbose description of this system.
environment String List of key-value pairs that will be added to the environment prior to execution of any command.
executionType HPC, Condor, CLI Required: Specifies how jobs should go into the system. HPC and Condor will leverage a batch scheduler. CLI will fork processes.
id string Required: A unique identifier you assign to the system. A system id must be globally unique across a tenant and cannot be reused once deleted.
maxSystemJobs integer Maximum number of jobs that can be queued or running on a system across all queues at a given time. Defaults to unlimited.
maxSystemJobsPerUser integer Maximum number of jobs that can be queued or running on a system for an individual user across all queues at a given time. Defaults to unlimited.
name string Required: Common display name for this system.
queues JSON Array An array of batch queue definitions providing descriptive and quota information about the queues you want to expose on your system. If not specified, no other system queues will be available to jobs submitted using this system.
scheduler LSF, LOADLEVELER, PBS, SGE, CONDOR, FORK, COBALT, TORQUE, MOAB, SLURM, CUSTOM_LSF, CUSTOM_LOADLEVELER, CUSTOM_PBS, CUSTOM_SGE, CUSTOM_CONDOR, FORK, CUSTOM_COBALT, CUSTOM_TORQUE, CUSTOM_MOAB, CUSTOM_SLURM, UNKNOWN Required: The type of batch scheduler available on the system. This only applies to systems with executionType HPC and CONDOR. The *_CUSTOM version of each scheduler provides a mechanism for you to override the default scheduler directives added by Agave and explicitly add your own through the customDirectives field in each of the batchQueue definitions for your system.
scratchDir string Path to use for a job scratch directory. This value is the first choice for creating a job`s working directory at runtime. The path will be resolved relative to the rootDir value in the storage config if it begins with a “/”, and relative to the system homeDir otherwise.
site string The site associated with this system. Primarily for logical grouping.
startupScript String Path to a script that will be run prior to execution of any command on this system. The path will be a standard path on the remote system. A limited set of system macros are supported in this field. They are rootDir, homeDir, systemId, workDir, and homeDir. The standard set of runtime job attributes are also supported. Between the two set of macros, you should be able to construct distinct paths per job, user, and app. Any environment variables defined in the system description will be added after this script is sourced. If this script fails, output will be logged to the .agave.log file in your job directory. Job submission will still continue regardless of the exit code of the script.
status UP, DOWN, MAINTENANCE, UNKNOWN The functional status of the system. Systems must be in UP status to be used.
storage JSON Object Required: Storage configuration describing the storage config defining how to connect to this system for data staging.
type STORAGE, EXECUTION Required: Must be EXECUTION.
workDir string Path to use for a job working directory. This value will be used if no scratchDir is given. The path will be resolved relative to the rootDir value in the storage config if it begins with a “/”, and relative to the system homeDir otherwise.

Startup startupScript

Every time Agave establishes a connection to an execution system, local or remote, it will attempt to source the startupScript provided in your system definition. The value of startupScript may be an absolute path on the system (ie. “/usr/local/bin/common_aliases.sh”, “/home/nryan/.bashrc”, etc.) or a path relative to physical home directory of the account used to authenticate to the system (“.bashrc”, “.profile”, “agave/scripts/startup.sh”, etc).

The startupScript field supports the use of template variables which Agave will resolve at runtime before establishing a connection. If you would prefer to specify the startup script as a virtualized path on the system, prepend ${SYSTEM_ROOT_DIR} to the path. If the system will be made public, you can specify a file relative to the home directory of the calling user by prefixing your startupScript value with ${SYSTEM_ROOT_DIR}/${SYSTEM_HOME_DIR}/${USERNAME} A full list of the variables available is given in the following table.

Variable Description
SYSTEM_ID ID of the system (ex. ssh.execute.example.com)
SYSTEM_UUID fThe UUID of the system
SYSTEM_STORAGE_PROTOCOL The protocol used to move data to and from this system
SYSTEM_STORAGE_HOST The storage host for this sytem
SYSTEM_STORAGE_PORT The storage port for this system
SYSTEM_STORAGE_RESOURCE The system resource for iRODS systems
SYSTEM_STORAGE_ZONE The system zone for iRODS systems
SYSTEM_STORAGE_ROOTDIR The virtual root directory exposed on this system
SYSTEM_STORAGE_HOMEDIR The home directory on this system relative to the STORAGE_ROOT_DIR
SYSTEM_STORAGE_AUTH_TYPE The storage authentication method for this system
SYSTEM_STORAGE_CONTAINER The the object store bucket in which the rootDir resides.
SYSTEM_LOGIN_PROTOCOL The protocol used to establish a session with this system (eg SSH, GSISSH, etc)
SYSTEM_LOGIN_HOST The login host for this system
SYSTEM_LOGIN_PORT The login port for this system
SYSTEM_LOGIN_AUTH_TYPE The login authentication method for this system
SYSTEM_OWNER The username of the user who created the system.
AGAVE_JOB_NAME The slugified version of the name of the job. See the section on Conventions for more information about slugs.
AGAVE_JOB_ID The unique identifier of the job.
AGAVE_JOB_APP_ID The appId for which the job was requested.
AGAVE_JOB_BATCH_QUEUE The batch queue on the AGAVE_JOB_EXECUTION_SYSTEM to which the job was submitted.
AGAVE_JOB_EXECUTION_SYSTEM The Agave execution system id where this job is running.
AGAVE_JOB_ARCHIVE_PATH The path on the archiveSystem where the job output will be copied if archiving is enabled.
AGAVE_JOB_OWNER The username of the job owner.
AGAVE_JOB_TENANT The id of the tenant to which the job was submitted.
MONITOR_ID The ID of the monitor.
MONITOR_CHECK_ID The ID of the monitor check making the request.
MONITOR_OWNER The username of the user who created the monitor.

Schedulers and system execution types

Agave supports job execution both interactively and through batch queueing systems (aka schedulers). We cover the mechanics of job submission in the Job Management tutorial. Here we just point out that regardless of how your job is actually run on the underlying system, the process of submitting, monitoring, sharing, and otherwise interacting with your job through Agave is identical. Describing the scheduler and execution types for your system is really just a matter of picking the most efficient and/or available mechanism for running jobs on your system.

As you saw in the table above, executionType refers to the classification of jobs going into the system and scheduler refers to the type of batch scheduler used on a system. These two fields help limit the range of job submission options used on a specific system. For example, it is not uncommon for a HPC system to accept jobs from both a Condor scheduler and a batch scheduler. It is also possible, though generally discouraged, to fork jobs directly on the command line. With so many options, how would users publishing apps on such a system know what mechanism to use? Specifying the execution type and scheduler help narrow down the options to a single execution mechanism.

Thankfully, picking the right combination is pretty simple. The following table illustrates the available combinations.

executionType scheduler Description
HPC LSF, LOADLEVELER, PBS, SGE, COBALT, TORQUE, MOAB, SLURM Jobs will be submitted to the local scheduler using the appropriate scheduler commands. Systems with this execution type will not allow forked jobs.
CONDOR CONDOR Jobs will be submitted to the condor scheduler running locally on the remote system. Agave will not do any installation for you, so the setup and administration of the Condor server is up to you.
CLI FORK Jobs will be started as a forked process and monitored using the system process id.

Defining batch queues  

Agave supports the notion of multiple submit queues. On HPC systems, queues should map to actual batch scheduler queues on the target server. Additionally, queues are used by Agave as a mechanism for implementing quotas on job throughput in a given queue or across an entire system. Queues are defined as a JSON array of objects assigned to the queues attribute. The following table summarizes all supported queue parameters.

Name Type Description
name string Arbitrary name for the queue. This will be used in the job submission process, so it should line up with the name of an actual queue on the execution system.
maxJobs integer Maximum number of jobs that can be queued or running within this queue at a given time. Defaults to 10. -1 for no limit
maxUserJobs integer Maximum number of jobs that can be queued or running by any single user within this queue at a given time. Defaults to 10. -1 for no limit
maxNodes integer Maximum number of nodes that can be requested for any job in this queue. -1 for no limit
maxProcessorsPerNode integer Maximum number of processors per node that can be requested for any job in this queue. -1 for no limit
maxMemoryPerNode string Maximum memory per node for jobs submitted to this queue in ###.#[E|P|T|G]B format.
maxRequestedTime string Maximum run time for any job in this queue given in hh:mm:ss format.
customDirectives string Arbitrary text that will be appended to the end of the scheduler directives in a batch submit script. This could include a project number, system-specific directives, etc.
default boolean True if this is the default queue for the system, false otherwise.

Configuring quotas

In the batch queues table above, several attributes exist to specify limits on the number of total jobs and user jobs in a given queue. Corresponding attributes exist in the execution system to specify limits on the number of total and user jobs across an entire system. These attributes, when used appropriately, can be used to tell Agave how to enforce limits on the concurrent activity of any given user. They can also ensure that Agave will not unfairly monopolize your systems as your application usage grows.

If you have ever used a shared HPC system before, you should be familiar with batch queue quotas. If not, the important thing to understand is that they are a critical tool to ensure fair usage of any shared resource. As the owner/administrator for your registered system, you can use the batch queues you define to enforce whatever usage policy you deem appropriate.

Consider one example where you are using a VM to run image analysis routines on demand through Agave, your server will become memory bound and experience performance degradation if too many processes are running at once. To avoid this, you can set a limit using a batch queue configuration that limits the number of simultaneous tasks that can run at once on your server.

Another example where quotas can be helpful is to help you properly partitioning your system resources. Consider a user analyzing unstructured data. The problem is computationally and memory intensive. To preserve resources, you could create one queue with a moderate value of maxJobs and conservative maxMemoryPerNode, maxProcessorsPerNode, and maxNodes values to allow good throughput of small job. You could then create another queue with large maxMemoryPerNode, maxProcessorsPerNode, and maxNodes values while only allowing a single job to run at a time. This gives you both high throughput and high capacity on a single system.

The following sample queue definitions illustrate some other interesting use cases.

{
    "name":"short_job",
    "mappedName": null,
    "maxJobs":100,
    "maxUserJobs":10,
    "maxNodes":32,
    "maxMemoryPerNode":"64GB",
    "maxProcessorsPerNode":12,
    "maxRequestedTime":"00:15:00",
    "customDirectives":null,
    "default":true
}
#### System login protocols As with storage systems, Agave supports several different protocols and mechanisms for job submission. We already covered scheduler and queue support. Here we illustrate the different login configurations possible. For brevity, only the value of the `login` JSON object is shown.

The full list of login configuration options is given in the following table. We omit the login.auth and login.proxy attributes as they are identical to those used in the storage config.

Attribute Type Description
auth JSON object Required: A JSON object describing the default login authentication credential for this system.
host string Required: The hostname or ip address of the server where the job will be submitted.
port int The port number of the server where the job will be submitted. Defaults to the default port of the protocol used.
protocol SSH, GSISSH, LOCAL Required: The protocol used to submit jobs for execution.
proxy JSON Object The proxy server through with Agave will tunnel when submitting jobs. Currently proxy servers will use the same authentication mechanism as the target server.

Scratch and work directories

In the Job Management tutorial we will dive into how Agave manages the end-to-end lifecycle of running a job. Here we point out two relevant attributes that control where data is staged and where your job will physically run. The scratchDir and workDir attributes control where the working directories for each job will be created on an execution system. The following table summarizes the decision making process Agave uses to determine where the working directories should be created.

rootDir value homeDir value scratchDir value Effective system path for job working directories
/ / /
/ / / /
/ / /scratch /scratch
/ /home/nryan /home/nryan
/ /home/nryan / /
/ /home/nryan /scratch /scratch
/home/nryan / /home/nryan
/home/nryan / / /home/nryan
/home/nryan / /scratch /home/nryan/scratch
/home/nryan /home /home/nryan/home
/home/nryan /home / /home/nryan
/home/nryan /home /scratch /home/nryan/scratch

While it is not required, it is a best practice to always specify scratchDir and workDir values for your execution systems and, whenever possible, place them outside of the system homeDir to ensure data privacy. The reason for this is that the file system available on many servers is actually made up of a combination of physically attached storage, mounted volumes, and network mounts. Often times, your home directory will have a very conservative quota while the mounted storage will essentially be quota free. As the above table shows, when you do not specify a scratchDir or workDir, Agave will attempt to create your job work directories in your system homeDir. It is very likely that, in the course of running simulations, you will reach the quota on your home directory, thereby causing that job and all future jobs to fail on the system until you clear up more space. To avoid this, we recommend specifying a location with sufficient available space to handle the work you want to do.

Another common error that arises from not specifying thoughtful scratchDir and workDir values for your execution systems is jobs failing due to “permission denied” errors. This often happens when your scratchDir and/or workDir resolve to the actual system root. Usually the account you are using to access the system will not have permission to write to /, so all attempts to create a job working directory fail, accurately, due to a “permission denied” error.

Creating a new execution system

curl -sk -H "Authorization: Bearer $ACCESS_TOKEN" -F "fileToUpload=@ssh-password.json" https://public.tenants.agaveapi.co/systems/v2
systems-addupdate -v -F ssh-password.json

The response from the server will be similar to the following.

{
   "id":"demo.execute.example.com",
   "uuid":"0001323106792914-5056a550b8-0001-006",
   "name":"Example SSH Execution Host",
   "status":"UP",
   "type":"EXECUTION",
   "description":"My example system using ssh to submit jobs used for testing.",
   "site":"example.com",
   "revision":1,
   "public":false,
   "lastModified":"2013-07-02T10:16:11.000-05:00",
   "executionType":"HPC",
   "scheduler":"SGE",
   "environment":null,
   "startupScript":"./bashrc",
   "maxSystemJobs":100,
   "maxSystemJobsPerUser":10,
   "workDir":"/work",
   "scratchDir":"/scratch",
   "queues":[
      {
         "name":"normal",
         "maxJobs":100,
         "maxUserJobs":10,
         "maxNodes":32,
         "maxMemoryPerNode":"64GB",
         "maxProcessorsPerNode":12,
         "maxRequestedTime":"48:00:00",
         "customDirectives":null,
         "default":true
      },
      {
         "name":"largemem",
         "maxJobs":25,
         "maxUserJobs":5,
         "maxNodes":16,
         "maxMemoryPerNode":"2TB",
         "maxProcessorsPerNode":4,
         "maxRequestedTime":"96:00:00",
         "customDirectives":null,
         "default":false
      }
   ],
   "login":{
      "host":"texas.rangers.mlb.com",
      "port":22,
      "protocol":"SSH",
      "proxy":null,
      "auth":{
         "type":"PASSWORD"
      }
   },
   "storage":{
      "host":"texas.rangers.mlb.com",
      "port":22,
      "protocol":"SFTP",
      "rootDir":"/home/nryan",
      "homeDir":"",
      "proxy":null,
      "auth":{
         "type":"PASSWORD"
      }
   }
}

Disabling

Disable a system

curl -sk -H "Authorization: Bearer $AUTH_TOKEN"
    -H "Content-Type: application/json"
    -X PUT --data-binary '{"action": "disable"}'
    https://public.agaveapi.co/systems/v2/$SYSTEM_ID
systems-disable $SYSTEM_ID

The response will look something like the following:

{
  "site": null,
  "id": "sftp.storage.example.com",
  "revision": 1,
  "default": false,
  "lastModified": "2016-09-06T17:46:42.621-05:00",
  "status": "UP",
  "description": "My example storage system using SFTP to store data for testing",
  "name": "Example SFTP Storage System",
  "owner": "nryan",
  "globalDefault": false,
  "available": false,
  "uuid": "4036169328045649434-242ac117-0001-006",
  "public": false,
  "type": "STORAGE",
  "storage": {
    "mirror": false,
    "port": 22,
    "homeDir": "/home/systest",
    "protocol": "SFTP",
    "host": "storage.example.com",
    "publicAppsDir": null,
    "proxy": null,
    "rootDir": "/",
    "auth": {
      "type": "PASSWORD"
    }
  },
  "_links": {
    "roles": {
      "href": "https://public.agaveapi.co/systems/v2/sftp.storage.example.com/roles"
    },
    "owner": {
      "href": "https://public.agaveapi.co/profiles/v2/nryan"
    },
    "credentials": {
      "href": "https://public.agaveapi.co/systems/v2/sftp.storage.example.com/credentials"
    },
    "self": {
      "href": "https://public.agaveapi.co/systems/v2/sftp.storage.example.com"
    },
    "metadata": {
      "href": "https://public.agaveapi.co/meta/v2/data/?q=%7B%22associationIds%22%3A%224036169328045649434-242ac117-0001-006%22%7D"
    }
  }
}

There may be times when you need to disable a system. If your system has scheduled maintenance periods, you may want to disable the system until the maintenance period ends. You can do this by making a PUT request on a monitor with the a field name action set to “disabled”, or simply updating the status to “MAINTENANCE”. While disabled, all apps and jobs will be disabled. All file operations will be rejected during system downtimes as well. Once restored, all operations will pick back up.

Enabling a system

Enable a system

curl -sk -H "Authorization: Bearer $AUTH_TOKEN"
    -H "Content-Type: application/json"
    -X PUT --data-binary '{"action": "enable"}'
    https://public.agaveapi.co/systems/v2/$SYSTEM_ID
systems-enable $SYSTEM_ID

The response will look something like the following:

{
  "site": null,
  "id": "sftp.storage.example.com",
  "revision": 1,
  "default": false,
  "lastModified": "2016-09-06T17:46:42.621-05:00",
  "status": "UP",
  "description": "My example storage system using SFTP to store data for testing",
  "name": "Example SFTP Storage System",
  "owner": "nryan",
  "globalDefault": false,
  "available": true,
  "uuid": "4036169328045649434-242ac117-0001-006",
  "public": false,
  "type": "STORAGE",
  "storage": {
    "mirror": false,
    "port": 22,
    "homeDir": "/home/systest",
    "protocol": "SFTP",
    "host": "storage.example.com",
    "publicAppsDir": null,
    "proxy": null,
    "rootDir": "/",
    "auth": {
      "type": "PASSWORD"
    }
  },
  "_links": {
    "roles": {
      "href": "https://public.agaveapi.co/systems/v2/sftp.storage.example.com/roles"
    },
    "owner": {
      "href": "https://public.agaveapi.co/profiles/v2/nryan"
    },
    "credentials": {
      "href": "https://public.agaveapi.co/systems/v2/sftp.storage.example.com/credentials"
    },
    "self": {
      "href": "https://public.agaveapi.co/systems/v2/sftp.storage.example.com"
    },
    "metadata": {
      "href": "https://public.agaveapi.co/meta/v2/data/?q=%7B%22associationIds%22%3A%224036169328045649434-242ac117-0001-006%22%7D"
    }
  }
}

Similarly, to enable a monitor, make a PUT request with the a field name action set to “enabled”. Once reenabled, the monitor will resume its previous check schedule as specified in the nextUpdate field, or immediately if that time has already expired.

Deleting systems

curl -sk -H "Authorization: Bearer $ACCESS_TOKEN" -X DELETE https://public.tenants.agaveapi.co/systems/v2/$SYSTEM_ID
systems-delete $SYSTEM_ID

The call will return an empty result.

In the event you wish to delete a system, you can make a DELETE request on the system URL. Deleting a system will disable the system and all applications published on that system from use. Any running jobs will be continue to run, but all pending, archiving, paused, and staged jobs will be killed, and any data archived on that system will no longer be available. Restoring a deleted system requires intervention from your tenant admin. Once deleted, the system id cannot be reused at a later time. Use this operation with care.

Multi-user environments

If your application supports a multi-user environment and those users do not have API accounts, then you may run into a situation where you are juggling multiple user credentials for a single system. Agave has a solution for this problem in the for of its Internal User feature. You can map your application users into a private user store Agave provides you and assign those users credentials on your systems. This allows you to move seamlessly from community users to private users and back without having to alter your application code. For a deep discussion on the mechanics and implications of credential management with internal users, see the Internal User Credential Management guide.

System roles

Systems you register are private to you and you alone. You can, however, allow other Agave clients to utilize the system you define by granting them a role on the system using the systems roles services. The available roles are given in the table below.

Role Description
GUEST Gives any authenticated user readonly access to the system. No file operations or job executions are allowed for users with GUEST access.
USER Gives a user the ability to run jobs and access data on the system.
PUBLISHER All the rights of USER as well as the ability to publish applications listing the system as an execution host.
ADMIN All the rights of PUBLISHER as well as the ability to edit and grant roles on the system details. Admins may use the system to access data and run jobs using the default credential assigned to the system, but they may not view or update any of the credentials stored by the system owner. It is not possible for anyone but the system owner to assign or leverage internal user credentials on a system.
OWNER Reserved for the user that originally created the system. This role is non-revokable.

System scope

Throughout these tutorials and Beginner’s Guides, we have referred to both public and private systems. In addition to roles, systems have a concept of scope associated with them. Not to be confused with OAuth scope mentioned in the Authentication Guide, system scope refers to the availability of a system to the general user community. The following table lists the available scopes and their meanings.

Scope Required role Description
private Admin System is visible and available for use to the owner and to anyone whom they grant a role.
read only Tenant admin Storage system is visible and available for data browsing and download by any API user. Write access is restricted unless explicitly granted to a specific user.
public Tenant admin System is visible and available to all users for reading and writing. Virtual user home directories are enforced and write access outside of a user’s home directory is restricted unless explicitly granted by a system admin.

Private systems

All systems are private by default. This means that no one can use a system you register without you or another user with “admin” permissions granting them a role on that system. Most of the time, unless you are configuring a tenant for your organization, all the systems you register will stay private. Do not mistake the term private for isolated. Private simply means not public. Another way to think of private systems is as “invitation only.” You are free to share your system as many or as few people as you want and it will still remain a private system.

Readonly systems

Readonly systems are systems who have granted a GUEST role to the world group. Once this grant is made, any user will be able to browse the system’s entire file system regardless of individual permissions. Be careful when making a system readonly. Usually, the only reason you would do this is because you have configured the system rootDir to point to a dataset or volume that you want to publish for others to use. Carelessly making systems readonly can expose personal data stored on the system to every other API user. While your intentions may be pure, theirs may not be, so think through the implications of this action before you take it.

Public systems

Public systems are available for use by every API user within your tenant. Once public, systems inherit specific behavior unique to their type. We will cover each system type in turn.

Public Storage Systems

Public storage systems enforce a virtual user home directory with implied user permissions. The following table gives a brief summary of the permission implications. You can read more about data permissions in the Data Permissions tutorial.

rootDir homeDir URL path User permission
/ /home READ
/ /home / READ
/ /home /var READ
/ /home systest ALL
/ /home systest/some/subdir ALL
/ /home rjohnson NONE

Notice in the above example that on public systems, users will have implied ownership of a folder matching their username in the system’s homeDir. In the table, this means that user “systest” will have ownership of the physical home directory /home/systest on the system after it’s public. It is important that, before publishing a system, you make sure that the account used to access the system can actually write to these folders. Otherwise, users will not be able to access their data on the system you make public.

Public Execution Systems

Public execution systems do not share the same behavior as public storage systems. Unless explicit permission has been given, public execution systems are not accessible for data access by non-privileged users. This is because public systems allow all users to run applications on them and granting public access to the file system would expose user job data to all users. If you do need to expose the data on a public execution system, either register it again as a storage system (using an appropriate rootDir outside of the system scratchDir and workDir paths), or grant specific users a role on the system.

Publishing a system

To publish a system and make it public, you make a PUT request on the system’s url.

curl -sk -H "Authorization: Bearer $ACCESS_TOKEN"
     -H "Content-Type: application/json"
     -X PUT
     --data-binary '{"action":"publish"}'
     https://public.tenants.agaveapi.co/systems/v2/$SYSTEM_ID
systems-publish -v $SYSTEM_ID

The response from the service will be the same system description we saw before, this time with the public attribute set to true.

Unpublishing a system

curl -sk -H "Authorization: Bearer $ACCESS_TOKEN"
     -H "Content-Type: application/json"
     -X PUT
     --data-binary '{"action":"unpublish"}'
     https://public.tenants.agaveapi.co/systems/v2/$SYSTEM_ID
systems-unpublish -v $SYSTEM_ID

The response from the service will be the same system description we saw before, this time with the public attribute set to false.

To unpublish a system, make the same request with the action attribute set to unpublish.

Default systems

As you continue to use Agave over time, it will not be uncommon for you to accumulate additional storage and execution systems through both self-registration and other people sharing their systems with you. It may even be the case that you have multiple public systems available to you. In this situation, it is helpful for both you and your users to specify what the default systems should be.

Default systems are the systems that are used when the user does not specify a system to use when performing a remote action in Agave. For example, specifying an archivePath in a job request, but no archiveSystem, or specifying a deploymentPath in an app description, but no deploymentSystem. In these situations, Agave will use the user’s default storage system.

Four types of default systems are possible. The following table describes them.

Type Scope Role needed to set Description
storage user default USER Default storage system for an individual user. This takes priority over any global defaults and will be used in all data operations in leu of a system being specified for this user.
storage global default Tenant admin Default storage system for an entire tenant. This will be used as the default storage system whenever a user has not explicitly specified another. Only public systems may be made the global default.
execution user default USER Default execution system for an individual user. This takes priority over any global defaults and will be used in all app and job operations in leu of an execution system being specified for this user. In the case of app registration, normal user role requirements apply.
execution global default Tenant admin Default execution system for an entire tenant. This will be used as the default execution system whenever a user has not explicitly specified another. Only public systems may be made the global default.

Setting user default system

To set a system as the user’s default, you make a PUT request on the system’s url. Only systems the user has access to may be used as their default.

curl -sk -H "Authorization: Bearer $ACCESS_TOKEN"
     -H "Content-Type: application/json"
     -X PUT
     --data-binary '{"action":"setDefault"}'
     https://public.tenants.agaveapi.co/systems/v2/$SYSTEM_ID
systems-setdefault -v $SYSTEM_ID

The response from the service will be the same system description we saw before, this time with the default attribute set to true.

Unsetting user default system

curl -sk -H "Authorization: Bearer $ACCESS_TOKEN"
     -H "Content-Type: application/json"
     -X PUT
     --data-binary '{"action":"unsetDefault"}'
     https://public.tenants.agaveapi.co/systems/v2/$SYSTEM_ID
systems-unsetdefault -v $SYSTEM_ID

The response from the service will be the same system description we saw before, this time with the default attribute set to false.

To remove a system as the user’s default, make the same request with the action attribute set to unsetDefault. Keep in mind that you cannot remove the global default system from being the user’s default. You can only set a different one to replace it.

Setting global default system

Tenant administrators may wish to set default storage and execution systems for an entire tenant. These are called global default systems. There may be at most one system of each type set as a global default. To set a global default system, first make sure that the system is public. Only public systems may be set as a global default. Next, make sure you have administrator permissions for your tenant. Only tenant admins may publish systems and manage the global defaults. Lastly, make a PUT request on the system’s url with an action attribute in the body set to unsetGlobalDefault.

curl -sk -H "Authorization: Bearer $ACCESS_TOKEN"
     -H "Content-Type: application/json"
     -X PUT
     --data-binary '{"action":"setGlobalDefault"}'
     https://public.tenants.agaveapi.co/systems/v2/$SYSTEM_ID
systems-setdefault -v -G $SYSTEM_ID

The response from the service will be the same system description we saw before, this time with both the default and public attributes set to true.

To remove a system from being the global default, make the same request with the action attribute set to unsetGlobalDefault.

curl -sk -H "Authorization: Bearer $ACCESS_TOKEN"
     -H "Content-Type: application/json"
     -X PUT
     --data-binary '{"action":"unsetGlobalDefault"}'
     https://public.tenants.agaveapi.co/systems/v2/$SYSTEM_ID
systems-unsetdefault -v -G $SYSTEM_ID

This time the response from the service will have default set to false and public set to true.

Files

 /$$$$$$$$ /$$ /$$
| $$_____/|__/| $$
| $$       /$$| $$  /$$$$$$   /$$$$$$$
| $$$$$   | $$| $$ /$$__  $$ /$$_____/
| $$__/   | $$| $$| $$$$$$$$|  $$$$$$
| $$      | $$| $$| $$_____/ \____  $$
| $$      | $$| $$|  $$$$$$$ /$$$$$$$/
|__/      |__/|__/ \_______/|_______/

The Agave Files service allows you to manage data across multiple storage systems using multiple protocols. It supports traditional file operations such as directory listing, renaming, copying, deleting, and upload/download that are traditional to most file services. It also supports file importing from arbitrary locations, metadata assignment, and a full access control layer allowing you to keep your data private, share it with your colleagues, or make it publicly available.

Files service URL structure

Canonical URL for all file items accessible in the Platform

https://public.tenants.agaveapi.co/files/v2/media/system/$SYSTEM_ID/$PATH

Every file and directory referenced through the Files service has a canonical URL show in the first example. The following table defines each component:

Token Description
$SYSTEM_ID The id of the system where the file or directory lives. The correspond to the ids returned from the Systems service.
$PATH (Optional:) The path on the remote system. By default, all paths are relative to the home directory defined in the system description. To specify an absolute path, prefix the path with a `/`. For more on path resolution, see the next section.

Agave also supports the concept of default systems. Excluding the /system/$SYSTEM_ID segments from the above URL, the Files service will automatically assume you are referencing your default storage system. Thus, if your default system was data.agaveapi.co, the following two examples would be identical.

If data.agaveapi.co is your default storage system then

https://public.tenants.agaveapi.co/files/v2/media/shared

is equivalent to this:

https://public.tenants.agaveapi.co/files/v2/media/system/data.agaveapi.co/shared

This comes in especially handy when referencing your default system paths in other contexts such as job requests and when interacting with the Agave CLI. A good example of this situation is when you have a global default storage system accessible to all your users. In this case, most users will use that for all of their data staging and archiving needs. These users may find it easier not to even think about the system they are using. The default system support in the Files service allows them to do just that.

Understanding file paths

One powerful, but potentially confusing feature of Agave is its support for virtualizing systems paths. Every registered system specifies both a root directory, rootDir, and a home directory, homeDir attribute in its storage configuration. rootDir tells Agave the absolute path on the remote system that it should treat as /. Similar to the Linux chroot command; no requests made to Agave will ever be resolved to locations outside of rootDir.

Type of storage system Examples of rootDir values
Linux
  • Actual system root directory, `/`
  • RAID array physically attached to the system
  • NSF mount you want to share
  • An arbitrary file path, such as your `$HOME` directory from which you want to server application data.
Cloud
  • A bucket on S3
  • A folder/marker file in your object store
iRODS
  • A specific resource or zone you want to expose.
  • A collection you want to publish for use
  • Your personal home folder

homeDir specifies the path, relative to rootDir, that Agave should use for relative paths. Since Agave is stateless, there is no concept of a current working directory. Thus, when you specify a path to Agave that does not begin with a /, Agave will always prefix the path with the value of homeDir. The following table gives several examples of how different combinations of rootDir, homeDir, and URL paths will be resolved by Agave. For a deeper dive into this subject, please see the Understanding Agave File Paths section.

“rootDir” value “homeDir” value Agave URL path Resolved path on system
/ / /
/ / .. /
/ / home /home
/ / /home /home
/ /home/nryan /home/nryan
/ /home/nryan / /
/ /home/nryan .. /home
/ /home/nryan nryan /home/nryan/nryan
/ /home/nryan /nryan /nryan
/home/nryan / /home/nryan
/home/nryan / .. /home/nryan
/home/nryan /home / /home/nryan
/home/nryan /home .. /home/nryan
/home/nryan /home home /home/nryan/home/home
/home/nryan /home /bgibson /home/nryan/bgibson

Transfering data

Before we talk about how to do basic operations on your data, let’s first talk about how you can move your data around. You already have a storage system available to you, so we will start with the “hello world” of data movement, uploading a file.

Uploading data

Uploading a file

curl -sk -H "Authorization: Bearer $ACCESS_TOKEN" \
    -X POST \
    -F "fileToUpload=@files/picksumipsum.txt" \
    https://public.tenants.agaveapi.co/files/v2/media/data.agaveapi.co/nryan
files-upload -v -F files/picksumipsum.txt -S data.agaveapi.co nryan

The response will look something like this:

{
    "internalUsername": null,
    "lastModified": "2014-09-03T10:28:09.943-05:00",
    "name": "picksumipsum.txt",
    "nativeFormat": "raw",
    "owner": "nryan",
    "path": "/home/nryan/picksumipsum.txt",
    "source": "http://127.0.0.1/picksumipsum.txt",
    "status": "STAGING_QUEUED",
    "systemId": "data.agaveapi.co",
    "uuid": "0001409758089943-5056a550b8-0001-002",
    "_links": {
        "history": {
            "href": "https://public.tenants.agaveapi.co/files/v2/history/system/data.agaveapi.co/nryan/picksumipsum.txt"
        },
        "self": {
            "href": "https://public.tenants.agaveapi.co/files/v2/media/system/data.agaveapi.co/nryan/picksumipsum.txt"
        },
        "system": {
            "href": "https://public.tenants.agaveapi.co/systems/v2/data.agaveapi.co"
        }
    }
}

You may upload data to a remote systems by performing a multipart POST on the FILES service. If you are using the Agave CLI, you can perform recursive directory uploads. If you are manually calling curl or building an app with the Agave SDK, you will need to implement the recursion yourself. You can take a look in the files-upload script to see how this is done. The following is an example of how to upload a file that we will use in the remainder of this tutorial.

You will see a progress bar while the file uploads, followed by a response from the server with a description of the uploaded file. Agave does not block during data movement operations, so it may be just a moment before the file physically shows up on the remote system.

Importing data

You can also have Agave download data from an external URL. Rather than making a multipart file upload request, you can pass in a JSON object with the URL and an optional target file name, type, and array of notifications subscriptions. Agave supports several protocols for ingestion listed in the next table.

Schema Details
http Supported with and without user info
https Supported with and without user info
ftp Anonymous FTP only
sftp User info required in URL
agave No user info supported.

To demonstrate how this works, we will import a README.md file from the Agave Samples git repository in Bitbucket.

Download a file from a web accessible URL

curl -sk -H "Authorization: Bearer $ACCESS_TOKEN" -X POST
    -- data '{ "url":"https://bitbucket.org/agaveapi/science-api-samples/raw/master/README.md"}'
    https://public.tenants.agaveapi.co/files/v2/media/data.agaveapi.co/nryan
files-import -v -U "https://bitbucket.org/agaveapi/science-api-samples/raw/master/README.md"
    -S data.agaveapi.co nryan

The response will look something like this:

{
    "name" : "README.md",
    "uuid" : "0001409758713912-5056a550b8-0001-002",
    "owner" : "nryan",
    "internalUsername" : null,
    "lastModified" : "2014-09-10T20:00:55.266-05:00",
    "source" : "https://bitbucket.org/agaveapi/science-api-samples/raw/master/README.md",
    "path" : "/home/nryan/README.md",
    "status" : "STAGING_QUEUED",
    "systemId" : "data.agaveapi.co",
    "nativeFormat" : "raw",
    "_links" : {
      "self" : {
        "href" : "https://public.tenants.agaveapi.co/files/v2/media/system/data.agaveapi.co/nryan/README.md"
      },
      "system" : {
        "href" : "https://public.tenants.agaveapi.co/systems/v2/data.agaveapi.co"
      },
      "history" : {
        "href" : "https://public.tenants.agaveapi.co/files/v2/history/system/data.agaveapi.co/nryan/README.md"
      }
    }
}

Downloading data from a third party is done offline as an asynchronous activity, so the response from the server will come right away. One thing worth noting is that the file length given in the response will always be -1. This is because, generally speaking, Agave does not know what the actual source file size is until after the repsonse is send back. The file size will be updated as the download progresses. You can track the progress by querying the destination file item’s history. An entry will be present showing the progress of the download.

For this exercise, the file we just downloaded is just a few KB, so you should see it appear in your home folder on data.agaveapi.co almost immediately. If you were importing larger datasets, the transfer could take significantly longer depending on the network quality between Agave and the source location. In this case, you would see the file size continue to increase until it completed. In the event of a failed transfer, Agave will retry several times before canceling the transfer.

Transferring data

Transferring data between systems

curl -sk -H "Authorization: Bearer $ACCESS_TOKEN" \
    -H "Content-Type: application/json" \
    -X POST \
    --data-binary '{"url":"agave://stampede.tacc.utexas.edu//etc/motd"}' \
    https://public.tenants.agaveapi.co/files/v2/media/data.agaveapi.co/nryan
files-import -v -U "agave://stampede.tacc.utexas.edu//etc/motd" -S data.agaveapi.co nryan

The response from the service will be the same as the one we received importing a file.

Much like downloading data, Agave can manage the transfer of data between registered systems. This is, in fact, how data is staged prior to running a simulation. Data transfers are carried out asynchronously, so you can simply start a transfer and go about your business. Agave will ensure it completes. If you would like a notification when the transfer completes or reaches a certain stage, you can subscribe for one or more emails, webhooks, and/or realtime notifications, and Agave will alert them when as the transfer progresses. The following table lists the available file events. For more information about the events and notifications systems, please see the Notifications Guide and Event Reference.

In the example below, we will transfer a file from stampede.tacc.utexas.edu to data.agaveapi.co. While the request looks pretty basic, there is a lot going on behind the scenes. Agave will authenticate to both systems, check permissions, stream data out of Stampede using GridFTP and proxy it into data.agaveapi.co using the SFTP protocol, adjusting the transfer buffer size along the way to optimize throughput. Doing this by hand is both painful and error prone. Doing it with Agave is nearly identical to copying a file from one directory to another on your local system.

One of the benefits of the Files service is that it frees you up to work in parallel and scale with your application demands. In the next example we will use the Files service to create redundant archives of a shared project directory.

curl -sk -H "Authorization: Bearer $ACCESS_TOKEN" \
    -H "Content-Type: application/json" \
    -X POST \
    --data-binary '{"url":"agave://data.agaveapi.co/nryan/foo_project"}' \
    https://public.tenants.agaveapi.co/files/v2/media/system/nryan.storage1/

curl -sk -H "Authorization: Bearer $ACCESS_TOKEN" \
    -H "Content-Type: application/json" \
    -X POST \
    --data-binary '{"url":"agave://data.agaveapi.co/nryan/foo_project"}' \
    https://public.tenants.agaveapi.co/files/v2/media/system/nryan.storage2/
files-import -v -U "agave://data.agaveapi.co/nryan/foo_project" -S nryan.storage1

files-import -v -U "agave://data.agaveapi.co/nryan/foo_project" -S nryan.storage2

Basic data operations

Now that we understand how to move data into, out of, and between systems, we will look at how to perform file operations on the data. Again, remember that the Files service gives you a common REST interface to all your storage and execution systems regardless of the authentication mechanism or protocol they use. The examples below will use your default public storage system, but they would work identically with any storage system you have access to.

Directory listing

Listing a file or directory

curl -sk -H "Authorization: Bearer $ACCESS_TOKEN" \
    https://public.tenants.agaveapi.co/files/v2/listings/data.agaveapi.co/nryan
files-list -v -S data.agaveapi.co nryan

The response would look something like this:

[
    {
        "format": "folder",
        "lastModified": "2012-08-03T06:30:12.000-05:00",
        "length": 0,
        "mimeType": "text/directory",
        "name": ".",
        "path": "nryan",
        "permisssions": "ALL",
        "system": "data.agaveapi.co",
        "type": "dir",
        "_links": {
            "self": {
                "href": "https://public.tenants.agaveapi.co/files/v2/media/system/data.agaveapi.co/nryan"
            },
            "system": {
                "href": "https://public.tenants.agaveapi.co/systems/v2/data.agaveapi.co"
            }
        }
    },
    {
    "format": "raw",
    "lastModified": "2014-09-10T19:47:44.000-05:00",
    "length": 3235,
    "mimeType": "text/plain",
    "name": "picksumipsum.txt",
    "path": "nryan/picksumipsum.txt",
    "permissions": "ALL",
    "system": "data.agaveapi.co",
    "type": "file",
    "_links": {
            "self": {
                "href": "https://public.tenants.agaveapi.co/files/v2/media/system/data.agaveapi.co/nryan/picksumipsum.txt"
        },
        "system": {
            "href": "https://public.tenants.agaveapi.co/systems/v2/data.agaveapi.co"
        }
    }
    }
]

Obtaining a directory listing, or information about a specific file is done by making a GET request on the /files/v2/listings/ resource.

The response to this contains a summary listing of the contents of your home directory on data.agaveapi.co. Appending a file path to your commands above would give information on a specific file.

Move, copy, rename, delete

Basic file operations are available by sending a POST request the the /files/v2/media/ collection with the following parameters.

Attribute Description
action The action you want to perform. Select one of “move”, “copy”, “rename”, “mkdir”.
path Full path to the destination file or folder. This may be the name of a new directory or renamed file, or an absolute or relative Agave path where the file or directory should be copied/moved.

Copying files and directories

Copy a file item within the same system.

curl -sk -H "Authorization: Bearer $ACCESS_TOKEN" \
    -H "Content-Type: application/json" \
    -X POST \
    --data-binary '{"action":"copy","path":"$DESTPATH"}' \
    https://public.tenants.agaveapi.co/files/v2/media/system/data.agaveapi.co/$PATH
files-copy -D $DESTPATH -S data.agaveapi.co $PATH

The response from a copy operation will be a JSON object describing the new file or folder.


Copying can be performed on any remote system. Unlike the Unix cp command, all copy invocations in Agave will overwrite the destination target if it exists. In the event of a directory collision, the contents of the two directory trees will be merged with the source overwriting the destination. Any overwritten files will maintain their provenance records and have an additional entry added to record the copy operation.

Moving files and directories

curl -sk -H "Authorization: Bearer $ACCESS_TOKEN" \
    -H "Content-Type: application/json" \
    -X POST \
    --data-binary '{"action":"move","path":"$DESTPATH"}' \
    https://public.tenants.agaveapi.co/files/v2/media/system/data.agaveapi.co/$PATH
files-move -D $DESTPATH -S data.agaveapi.co $PATH

The response will reflect the new file item

Moving can be performed on any remote system. Moving a file or directory will overwrite the destination target if it exists. Unlike copy operations, the destination will be completely replaced by the source in the event of a collision. No merge will take place. Further, the provenance of the source will replace that of the target.

Renaming files and directories

Renaming a file item

curl -sk -H "Authorization: Bearer $ACCESS_TOKEN" \
   -H "Content-Type: application/json" \
   -X POST \
   --data-binary '{"action":"rename","path":"$NEWNAME"}' \
    https://public.tenants.agaveapi.co/files/v2/media/system/$SYSTEM_ID/$PATH
files-rename -N $NEWNAME -S $SYSTEM_ID $PATH

The response will reflect the renamed file item

Renaming, like copying and moving, is only applicable within the context of a single system. Unlike on Unix systems, renaming and moving are not synonymous. When specifying a new name for a file or directory, the new name is relative to the parent directory of the original file or directory. Also, If a file or directory already exists with that name, the operation will fail and an error message will be returned. All provenance information will follow the renamed file or directory.

Creating a new directory

Creating a new directory

curl -sk -H "Authorization: Bearer $ACCESS_TOKEN" \
    -H "Content-Type: application/json" \
    -X POST \
    --data-binary '{"action":"mkdir","path":"$NEWDIR"}' \
    https://public.tenants.agaveapi.co/files/v2/media/system/data.agaveapi.co/$PATH
files-mkdir -N $NEWDIR -S $SYSTEM_ID $PATH

The response will reflect the new directory

Creating a new directory is a recursive action in Agave. If the parent directories do not exist, they will be created on the fly. If a file or directory already exists with that name, the operation will fail and an error message will be returned.

Deleting a file item

Deleting a file item

curl -sk -H "Authorization: Bearer $ACCESS_TOKEN" \
    -X DELETE \
    https://public.tenants.agaveapi.co/files/v2/media/system/$SYSTEM_ID/$PATH
files-delete -S $SYSTEM_ID $PATH

A standard Agave response with an empty result value will be returned.

As with creating a directory, deleting a file or directory is a recursive action in Agave. No prompt or warning will be given once the request is sent. It is up to you to implement such checks in your application logic and/or user interface.

File history

A full history of changes, permissions changes, and access events made through the Files API is recorded for every file and folder on registered Agave systems. The recorded history events represent a subset of the events thrown by the Files API. Generally speaking, the events saved in a file item’s history represent mutations on the physical file item or its metadata.

Direct vs indirect events

Agave will record both direct and indirect events made on a file item. Examples of direct events are transferring a directory from one system to another or renaming a file. Examples of indirect events are a user manually deleting a file from the command line. The table below contains a list of all the provenance actions recorded.

Event Description
CREATED File or directory was created
DELETED The file was deleted
RENAME The file was renamed
MOVED The file was moved to another path
OVERWRITTEN The file was overwritten
PERMISSION_GRANT A user permission was added
PERMISSION_REVOKE A user permission was deleted
STAGING_QUEUED File/folder queued for staging
STAGING File or directory is currently in flight
STAGING_FAILED Staging failed
STAGING_COMPLETED Staging completed successfully
PREPROCESSING Prepairing file for processing
TRANSFORMING_QUEUED File/folder queued for transform
TRANSFORMING Transforming file/folder
TRANSFORMING_FAILED Transform failed
TRANSFORMING_COMPLETED Transform completed successfully
UPLOADED New content was uploaded to the file.
CONTENT_CHANGED Content changed within this file/folder. If a folder, this event will be thrown whenever content changes in any file within this folder at most one level deep.

Out of band file system changes

Agave does not own the storage and execution systems you access through the Science APIs, so it cannot guarantee that everything that every possible change made to the file system is recorded. Thus, Agave takes a best-effort approach to provenance allowing you to choose, through your own use of best practices, how thorough you want the provenance trail of your data to be.

Listing file history

List the history of a file item

curl -sk -H "Authorization: Bearer $ACCESS_TOKEN" \
    https://public.agaveapi.co/files/v2/history/nryan/picksumipsum.txt
files-history -v nryan/picksumipsum.txt

The response to this contains a summary listing all permissions on the

$ files-history -v nryan/picksumipsum.txt
[
  {
    "status": "DOWNLOAD",
    "created": "2016-09-20T19:47:56.000-05:00",
    "createdBy": "public",
    "description": "File was downloaded"
  },
  {
    "status": "STAGING_QUEUED",
    "created": "2016-09-20T19:48:12.000-05:00",
    "createdBy": "nryan",
    "description": "File/folder queued for staging"
  },
  {
    "status": "STAGING_COMPLETED",
    "created": "2016-09-20T19:48:16.000-05:00",
    "createdBy": "nryan",
    "description": "Staging completed successfully"
  },
  {
    "status": "TRANSFORMING_COMPLETED",
    "created": "2016-09-20T19:48:17.000-05:00",
    "createdBy": "nryan",
    "description": "Your scheduled transfer of http://129.114.97.92/picksumipsum.txt completed staging. You can access the raw file on iPlant Data Store at /home/nryan/picksumipsum.txt or via the API at https://public.agaveapi.co/files/v2/media/system/data.agaveapi.co//nryan/picksumipsum.txt."
  }
]

Basic paginated listing of file item history events is available as shown in the example. Currently, the file history service is readonly. The only way to erase the history on a file item is to delete the file item through the API.

Searching file history

Search a file item’s history

curl -sk -H "Authorization: Bearer $ACCESS_TOKEN" \
    https://public.agaveapi.co/files/v2/history/nryan/picksumipsum.txt?limit=2&offset=1&createdBy.like=*ryan
files-history-search -v -l 2 -o 1 -S data.agaveapi.co nryan/picksumipsum.txt createdBy.like=*ryan

The response is a JSON array of every action performed on the file by users with a username ending in ryan.

[
  {
    "status": "STAGING_QUEUED",
    "created": "2016-09-20T19:48:12.000-05:00",
    "createdBy": "nryan",
    "description": "File/folder queued for staging"
  },
  {
    "status": "STAGING_COMPLETED",
    "created": "2016-09-20T19:48:16.000-05:00",
    "createdBy": "nryan",
    "description": "Staging completed successfully"
  }
]

File histories can get rather lengthy over time. Full text search is available on the file history service using the standard search syntax.

File metadata management

In many systems, the concept of metadata is directly tied to the notion of a file system. Agave takes a broader view of metadata and supports it as its own first class resource in the REST API. For more information on how to leverage metadata in Agave, please consult the Metadata Guide. In there we cover all aspects of how to manage, search, validate, and associate metadata across your entire digital lab.

File permissions

Agave has a fine-grained permission model supporting use cases from creating and exposing readonly storage systems to sharing individual files and folders with one or more users. The permissions available for files items are listed in the following table. Please note that a user must have WRITE permissions to grant or revoke permissions on a file item.

Name Description
READ User can view, but not edit or execute the resource
WRITE User can edit, but not view or execute the resource
EXECUTE User can execute, but not view or edit the resource
READ_WRITE User can view and write the resource, but not execute
READ_EXECUTE User can view and execute the resource, but not edit it
WRITE_EXECUTE User can edit and execute the resource, but not view it
ALL User has full control over the resource
NONE User has all permissions revoked on the given resource

Listing all permissions

List the permissions on a file item

curl -sk -H "Authorization: Bearer $ACCESS_TOKEN" \
    'https://public.agaveapi.co/files/v2/pems/system/data.agaveapi.co/nryan/picksumipsum.txt?pretty=true''
files-pems-list \
    -S data.agaveapi.co \
    nryan/picksumipsum.txt

The response will look something like the following:

[
  {
    "username": "nryan",
    "internalUsername": null,
    "permission": {
      "read": true,
      "write": true,
      "execute": true
    },
    "recursive": true,
    "_links": {
      "self": {
        "href": "https://public.agaveapi.co/files/v2/pems/system/data.agaveapi.co/nryan/picksumipsum.txt?username.eq=nryan"
      },
      "file": {
        "href": "https://public.agaveapi.co/files/v2/media/system/data.agaveapi.co/nryan/picksumipsum.txt"
      },
      "profile": {
        "href": "https://public.agaveapi.co/profiles/v2/nryan"
      }
    }
  }
]

To list all permissions for a file item, make a GET request on the file item’s permission collection

List permissions for a specific user

List the permissions on a file item for a given user

curl -sk -H "Authorization: Bearer $ACCESS_TOKEN"
    https://public.tenants.agaveapi.co/files/v2/pems/system/data.agaveapi.co/nryan/picksumipsum.txt?username=rclemens
files-pems-list \
    -u rclemens \
    -S data.agaveapi.co \
    nryan/picksumipsum.txt

The response will look something like the following:

{
  "username":"rclemens",
  "permission":{
    "read":true,
    "write":true
  },
  "_links":{
    "self":{
      "href":"https://public.tenants.agaveapi.co/files/v2/pems/system/data.agaveapi.co/nryan/picksumipsum.txt?username=rclemens"
    },
    "parent":{
      "href":"https://public.tenants.agaveapi.co/files/v2/pems/system/data.agaveapi.co/nryan/picksumipsum.txt"
    },
    "profile":{
      "href":"https://public.agaveapi.co/profiles/v2/rclemens"
    }
  }
}

Checking permissions for a single user is done using agave URL query search syntax.

Grant permissions

Grant read access to a file item

curl -sk -H "Authorization: Bearer $ACCESS_TOKEN" \
    -H "Content-Type: application/json" \
    -X POST \
    --data '{"username":"rclemens", "permission":"READ"}' \
    https://public.tenants.agaveapi.co/files/v2/pems/system/data.agaveapi.co/nryan/picksumipsum.txt
files-pems-update 
    -u rclemens \
    -p READ \
    -S data.agaveapi.co \
    nryan/picksumipsum.txt

Grant read and write access to a file item

curl -sk -H "Authorization: Bearer $ACCESS_TOKEN" \
    -H "Content-Type: application/json" \
    -X POST \
    --data '{"username","rclemens", "permission":"READ_WRITE"}' \
    https://public.tenants.agaveapi.co/files/v2/pems/system/data.agaveapi.co/nryan/picksumipsum.txt
files-pems-addupdate 
    -u rclemens \
    -p READ_WRITE \
    -S data.agaveapi.co \
    nryan/picksumipsum.txt

The response will look something like the following

[
  {
    "username": "rclemens",
    "internalUsername": null,
    "permission": {
      "read": true,
      "write": true,
      "execute": false
    },
    "recursive": false,
    "_links": {
      "self": {
        "href": "https://public.agaveapi.co/files/v2/pems/system/data.agaveapi.co/nryan/picksumipsum.txt?username.eq=rclemens"
      },
      "file": {
        "href": "https://public.agaveapi.co/files/v2/media/system/data.agaveapi.co/nryan/picksumipsum.txt"
      },
      "profile": {
        "href": "https://public.agaveapi.co/profiles/v2/rclemens"
      }
    }
  }
]

To grant another user read access to your metadata item, assign them READ permission. To enable another user to update a file item, grant them READ_WRITE or ALL access.

Delete single user permissions

Delete permission for single user on a file item

curl -sk -H "Authorization: Bearer $ACCESS_TOKEN" \
     -H "Content-Type: application/json" \
     -X POST \
     --data '{"username","rclemens", "permission":"NONE"}' \
     https://public.tenants.agaveapi.co/files/v2/pems/system/data.agaveapi.co/nryan/picksumipsum.txt
files-pems-update  \
    -u rclemens \
    -p 'NONE' \
    -S data.agaveapi.co \
    nryan/picksumipsum.txt

A response similiar to the following will be returned

[
  {
    "username": "rclemens",
    "internalUsername": null,
    "permission": {
      "read": false,
      "write": false,
      "execute": false
    },
    "recursive": false,
    "_links": {
      "self": {
        "href": "https://public.agaveapi.co/files/v2/pems/system/data.agaveapi.co/nryan/picksumipsum.txt?username.eq=rclemens"
      },
      "file": {
        "href": "https://public.agaveapi.co/files/v2/media/system/data.agaveapi.co/nryan/picksumipsum.txt"
      },
      "profile": {
        "href": "https://public.agaveapi.co/profiles/v2/rclemens"
      }
    }
  }
]

Permissions may be deleted for a single user by making a DELETE request on the metadata user permission resource. This will immediately revoke all permissions to the file item for that user.

Deleting all permissions

Delete all permissions on a file item

curl -sk -H "Authorization: Bearer $ACCESS_TOKEN" \
     -H "Content-Type: application/json" \
     -X POST \
     --data '{"username","*", "permission":"NONE"}' \
     https://public.tenants.agaveapi.co/files/v2/pems/system/data.agaveapi.co/nryan/picksumipsum.txt

curl -sk -H "Authorization: Bearer $ACCESS_TOKEN" \
     -X DELETE \
     https://public.tenants.agaveapi.co/files/v2/pems/system/data.agaveapi.co/nryan/picksumipsum.txt
files-pems-delete \
    -S data.agaveapi.co \
    nryan/picksumipsum.txt

An empty response will be returned from the service.

Permissions may be cleared for all users on a file item by making a DELETE request on the file item permission collection. In

Recursive operations

Recursively delete all permissions on a directory

curl -sk -H "Authorization: Bearer $ACCESS_TOKEN" \
     -H "Content-Type: application/json" \
     -X POST \
     --data '{"username","*", "permission":"READ_WRITE", "recursive": true}' \
     https://public.tenants.agaveapi.co/files/v2/pems/system/data.agaveapi.co/nryan/picksumipsum.txt

curl -sk -H "Authorization: Bearer $ACCESS_TOKEN" \
     -X DELETE \
     https://public.tenants.agaveapi.co/files/v2/pems/system/data.agaveapi.co/nryan/picksumipsum.txt?recursive=true
files-pems-delete \
    --recursive \
    -S data.agaveapi.co \
    nryan/picksumipsum.txt

An empty response will be returned from the service on delete. Update will return something like the following.

[
  {
    "username": "nryan",
    "internalUsername": null,
    "permission": {
      "read": true,
      "write": true,
      "execute": true
    },
    "recursive": true,
    "_links": {
      "self": {
        "href": "https://public.agaveapi.co/files/v2/pems/system/data.agaveapi.co/nryan/picksumipsum.txt?username.eq=nryan"
      },
      "file": {
        "href": "https://public.agaveapi.co/files/v2/media/system/data.agaveapi.co/nryan/picksumipsum.txt"
      },
      "profile": {
        "href": "https://public.agaveapi.co/profiles/v2/nryan"
      }
    }
  }
]

When dealing with directories, the permission operations you perform will apply onto to the directory item itself. Permissions will not automatically propagate to the directory contents. In cases where you want to recursively apply permissions to the entire directory tree, you can do so by including the recursive attribute in your permission objects or to your URL query parameters when making a DELETE request.

Publishing data

Agave provides multiple ways to share your data with your colleagues and the general public. In addition to the standard permission model enabling you to share your data with one or more authenticated users within the Platform, you also have the ability to publish your data and make it available via an unauthenticated public URL. Unlike traditional web and cloud hosting, your data remains in its original location and is served in situ by Agave upon user request.

Publishing a file for folder is simply a matter of granting the special public user READ permission on a file or folder. Similar to the way listings and permissions are exposed through unique paths in the Files API, published data is served from a custom /files/v2/download path. The public data URLs have the following structure:

https://public.agaveapi.co/files/v2/download/<username>/system/<system_id>/<path>

Notice two things. First, a username is inserted after the download path element. This is needed because there is no authorized user for whom to validate system or file ownership on a public request. The username gives the context by which to verify the availability of the system and file item being requested. Second, the system_id is mandatory in public data requests. This ensures that the public URL remains the same even when the default storage system of the user who published it changes.

The following sections give examples of publishing files and folders in the Agave Platform.

Publishing individual files

Publish file item on your default storage system for public access

curl -sk -H "Authorization: Bearer $ACCESS_TOKEN" \
    -H "Content-Type: application/json" \
    -X POST \
    --data '{"username","public", "permission":"READ"}' \
    https://public.tenants.agaveapi.co/files/v2/pems/nryan/picksumipsum.txt
files-pems-addupdate \
    -u public \
    -p READ \
    nryan/picksumipsum.txt

Publish file item on a named system for public access

curl -sk -H "Authorization: Bearer $ACCESS_TOKEN" \
    -H "Content-Type: application/json" \
    -X POST \
    --data '{"username","public", "permission":"READ"}' \
    https://public.tenants.agaveapi.co/files/v2/pems/system/data.agaveapi.co/nryan/picksumipsum.txt
files-pems-addupdate \
    -u public \
    -p READ \
    -S data.agaveapi.co \
    nryan/picksumipsum.txt

The response will look something like the following:

{
  "username": "public",
  "permission": {
    "read": true,
    "write": false,
    "execute": false
  },
  "recursive": false,
  "_links": {
    "self": {
      "href": "https://public.tenants.agaveapi.co/files/v2/pems/system/data.agaveapi.co/nryan/picksumipsum.txt?username.eq=public"
    },
    "file": {
      "href": "https://public.tenants.agaveapi.co/files/v2/pems/system/data.agaveapi.co/nryan/picksumipsum.txt"
    },
    "profile": {
      "href": "https://public.agaveapi.co/profiles/v2/public"
    }
  }
}

Publishing a file for folder is simply a matter of giving the special public user READ permission on the file. Once published, the file will be available at the following URL:

https://public.tenants.agaveapi.co/files/v2/download/nryan/system/data.agaveapi.co/nryan/picksumipsum.txt

Publishing directories

Publish directory on your default storage system for public access

curl -sk -H "Authorization: Bearer $ACCESS_TOKEN" \
    -H "Content-Type: application/json" \
    -X POST \
    --data '{"username","public", "permission":"READ", "recursive": true}' \
    https://public.tenants.agaveapi.co/files/v2/pems/nryan/public
files-pems-addupdate \
    --recursive \
    -u public \
    -p READ \
    nryan/public

Publish directory on a named system for public access

curl -sk -H "Authorization: Bearer $ACCESS_TOKEN" \
    -H "Content-Type: application/json" \
    -X POST \
    --data '{"username","public", "permission":"READ", "recursive": true}' \
    https://public.tenants.agaveapi.co/files/v2/pems/system/data.agaveapi.co/nryan/public
files-pems-addupdate \
    --recursive \
    -u public \
    -p READ \
    -S data.agaveapi.co \
    nryan/public

The response will look something like the following:

{
  "username": "public",
  "permission": {
    "read": true,
    "write": false,
    "execute": false
  },
  "recursive": true,
  "_links": {
    "self": {
      "href": "https://public.tenants.agaveapi.co/files/v2/pems/system/data.agaveapi.co/nryan/public?username.eq=public"
    },
    "file": {
      "href": "https://public.tenants.agaveapi.co/files/v2/pems/system/data.agaveapi.co/nryan/public"
    },
    "profile": {
      "href": "https://public.agaveapi.co/profiles/v2/public"
    }
  }
}

Publishing an entire directory is identical to publishing a single file item. To make all the contents of the directory public as well, include a recursive field to your request with a value of true. Once published, the directory and all its contents will be avaialble for download. The above example will make every file and folder in the “nryan/public” directory of “data.agaveapi.co” available for download at the following URL:

https://public.agaveapi.co/files/v2/download/nryan/system/data.agaveapi.co/nryan/public

Publishing considerations

Publishing data through Agave can be a great way to share and access data. There are situations in which it may not be an ideal choice. We list several of the pitfalls user run into when publishing their data.

Large file publishing

Before publishing your large datasets, take a step back and consider how you might leverage the Files or Transfers API to reliable serve up your data. HTTP is not the fastest way to serve up the data, and it may not be the best usage pattern for applications hoping to consume it. Thinking through your use case is well worth the time, even if publishing ends up being the best approach.

Static website hosting

Website hosting is a fairly common use case for data publishing. The challenge is that your assets are still hosted remotely from our API servers and fetched on demand. This can create some heavy latency when serving up lots of assets. Depending on the nature of your backend storage solution, it may not easily handle access patterns common to the web. In those situations, you may see some files fail to load from time to time. If your site has many files, even a small failure rate can keep your site from reliably loading.

If you are going to use the file publishing service for web hosting, the following tips can help improve your overall experience.

  1. Whenever possible, reference versions of your css, fonts, and javascript dependencies hosted on public CDN. CloudFlare, Google, and Amazon all host public mirrors of the most popular javascript libraries and frameworks. Linking to those can greatly speed up your load time.
  2. Use a technology like Webpack to reduce the number of files needed to serve your application.
  3. Lazy load your assets with oclazyload, requirejs or including async attributes on your <script> elements.
  4. Store your assets on a storage system with as little connection and protocol overhead as possible. That means avoiding tape archives, gridftp, overprovisioned shared resources, and systems only accessible through a proxied connection. While the service will still work in all of these situations, it is common for the overhead involved in establishing a connection and authenticating to take longer than the actual file transfer when the file is small. Simply avoiding slower storage protocols can greating speed up your application’s load time.

Apps

  /$$$$$$
 /$$__  $$
| $$  \ $$ /$$$$$$  /$$$$$$  /$$$$$$$
| $$$$$$$$/$$__  $$/$$__  $$/$$_____/
| $$__  $| $$  \ $| $$  \ $|  $$$$$$
| $$  | $| $$  | $| $$  | $$\____  $$
| $$  | $| $$$$$$$| $$$$$$$//$$$$$$$/
|__/  |__| $$____/| $$____/|_______/
         | $$     | $$
         | $$     | $$
         |__/     |__/

An app, in the context of Agave, is an executable code available for invocation through Agave’s Jobs service on a specific execution system. Put another way, an app is a piece of code that you can run on a specific system. If a single code needs to be run on multiple systems, each combination of app and system needs to be defined as an app.

Apps are language agnostic and may or may not carry with them their own dependencies. (More on bundling your app in a moment.) Any code that can be forked at the command line or submitted to a batch scheduler can be registered as an Agave app and run through the Jobs service.

The Apps service is the central registry for all Agave apps. The Apps service provides permissions, validation, archiving, and revision information about each app in addition to the usual discovery capability. The rest of this tutorial explains in detail how to register an app to the Apps service, how to manage and share apps, and what the different application scopes mean.

App History

Inputs and Parameters

In this section we take a detailed look at the inputs and parameters sections of your app descriptions. Each of these sections takes an array of JSON objects. Each JSON object represents either a data source that needs staging in prior to job execution or a primary value passed into your app as a parameter. In either case, the JSON object only requires an id by which to reference the object in a job request, and a type field indicating primary type if the object represents a parameter.

In practice, you will want to add some descriptive information, constraints, and runtime validation checks to reduce the amount of error users can run into when attempting to run your app. The full lists of app input and parameter attributes are provided in their respective sections below. However, before we dive deeper into the next section on app inputs, let’s first get a big picture view of what we are doing when we define our app’s input and parameters.

Input and Parameter Information Flow

When a user submits a job request in step 1, they specify the inputs and parameters needed to run that job. Those attributes are defined in your app description. The Jobs service will use your app description to validate the values in the job request and either reject it with a descriptive error message as in step 2, or accept it as in step 4. Once the job request is accepted, the values provided for the inputs and parameters given in the job request are used to replace their corresponding template placeholder values in the wrapper script. For example, the job request assigned a value of foo for the input with id equal to input1. Before submitting the job request to the remote system, the Jobs service will replace all occurrences of ${input1} in the app wrapper script with foo. The same will happen with param1 and param2. All occurrences of ${param1} will be replaced with bar and all occurrences of ${param2} will be replaced with 2, just as specified in the job request.

As we look at how to define inputs and parameters for your app, keep this big picture in mind. The purpose of inputs is to specify data that need to be staged prior to your job running and to tell your wrapper script about them. The purpose of parameters is to specify variables that need to be passed to your wrapper script. To do this, we only need a simple id by which to reference the values in a job request. The rest of what we will discuss in this tutorial is the mechanism that Agave provides for you to validate, describe, discover, and restrict application inputs and parameters to provider better user and developer experiences using your app.

Inputs

The inputs attribute of your app description contains a JSON array of input objects. An input represents one or more pieces of data that your app will use at runtime. That data can be a single file, a directory, or a response from a web service. It can reside on a system that Agave knows about, or at a publicly accessible URL. Regardless of where it lives and what it is, Agave will grab the data (recursively if need be) and copy it to your job’s working directory just before execution.

A minimal input object contains a single inputs.[].id attribute that uniquely identifies it within the context of your app. Any alphanumeric value under 64 characters can be an identifier, but it must be unique among all the inputs and parameters in that app.

{
  "id": "input1"
}

Most of the time, such a minimal definition is not helpful. At the very least, you would want some descriptive information, a restriction on the cardinality, and potentially a default value. This can be achieved with the details, semantics, and value objects. The full list of input attributes is shown in the following table. We cover each attribute in the corresponding section below.

Name Type Description
id String Required: The textual id of this input. This value must be unique within all inputs and inputs for an app description.
details JSON object
details.argument string A command line argument or flag to be prepended before the input value.
details.description string Human-readable description of the input. Often used to create contextual help in automatically generated UI.
details.label string Human-readable label for the input. Often implemented as text label next to the field in automatically generated UI.
details.showArgument boolean Whether to include the argument value for this input when performing the template variable replacement during job submission. If true, the details.argument value will be prepended, without spaces, to the actual input value(s).
details.repeatArgument boolean When multiple values are provided for this input, this attribute determines whether to include the argument value before each user-supplied value when performing the template variable replacement during job submission. The details.showArgument value must be true for this value to be applied.
semantics JSON object Describes the semantic definition of this inputs and the filetypes it represents. Multiple ontologies and values are supported.
semantics.fileTypes JSON array Array of string values describing the file types represented by this input. The types correspond to values from the Transforms service. Use “raw-0” for the time being
semantics.minCardinality integer Minimum number of values this input must have.
semantics.maxCardinality integer Maximum number of values this input can have. A null value or value of -1 indicates no limit.
semantics.ontology JSON array List of ontology terms (or URIs pointing to ontology terms) applicable to the input. We recommend at least specifying an XSL Schema Simple Type.
value JSON object A description of the anticipated value and the situations when it is required.
value.default string, JSON array The default value for this input. This value is optional except when value.required is true and value.visible is false. Values may be absolute or relative paths on the user’s default storage sytem, an agave URI, or any valid URL with a supported schema.
value.order integer The order in which this input should appear when auto-generating a command line invocation.
value.required boolean Required: Is specification of this input mandatory to run a job?
value.validator string Perl-formatted regular expression to restrict valid values.
value.visible boolean When automatically generated a UI, should this field be visible to end users? If false, users will not be able to set this value in their job request.
value.enquote boolean Should the value be surrounded in quotation marks prior to injecting into the wrapper template at job runtime.

Input details section

The inputs.[].details object contains information specifying how to describe an input in different contexts. The description and label values provide human readable information appropriate for a tool tip and form label respectively. Neither of these attributes are required, however they dramatically improve the readability of your app description if you include them.

Often times you will need to translate your input value into actual command line arguments. By default, Agave will replace all occurrences of your attribute inputs.[].id in your wrapper script with the value of that attribute in your job description. That means that you are responsible for inserting any command line flags or arguments into the wrapper script yourself. This is a pretty straightforward process, however in situations where an input is optional, the resulting command line could be broken if the user does not specify an input value in their job request. One way to work around this is to add a conditional check to the variable assignment and exclude the command line flag or argument if it does not have a value set. Another is to use the inputs.[].details.argument attribute.

The inputs.[].details.argument value describes the command line argument that corresponds to this input, and the inputs.[].details.showArgument attribute specifies whether the inputs.[].details.argument value should be injected into the wrapper template in front of the actual runtime value. The following table illustrates the result of these attributes in different scenarios.

argument showArgument Input value from job request Value injected into wrapper template
true /etc/motd /etc/motd
-f true /etc/motd -f/etc/motd
-f (trailing space) true /etc/motd -f /etc/motd
-f false /etc/motd /etc/motd
–filename true /etc/motd –filename/etc/motd
–filename= true /etc/motd –filename=/etc/motd
–filename false /etc/motd /etc/motd

Input semantics section

The inputs.[].semantics object contains semantic information about the input. The minCardinality attribute specifies the minimum number of data sources that can be specified for the input. This attribute is used to validate the value(s) provided for the input in a job request. The ontology attribute specifies a JSON array of URLs pointing to the ontology definitions of this file type. (We recommend at least specifying an XSL Schema Simple Type.) Finally, the fileTypes attribute contains a JSON array of file type strings as specified in the transforms service. (In most situations you will leave the fileTypes attribute null or specify RAW-0 as the single file type in the array.)

Input value section

The inputs.[].value object contains the information needed to validate user-supplied input values in a job request. The validator attribute accepts a Perl regular expression which will be applied to the input value(s). Any submissions that do not match the validator expression will be rejected.

The default attribute allows you to specify a default value for the input. This will be used in leu of a user-supplied value if the input is required, but not visible. All default values must match the validator expression, if provided.

The required attribute specifies whether the input must be specified during a job submission.

The visible attribute takes a boolean value specifying whether the input should be accepted as as a user-supplied value in a job requests. If false, the value will be ignored at job submission and the default value will be used instead. Whenever visible is set to false, required must be true.

The order attribute is used to specify the order in which inputs should be listed in the response from the API and in command-line generation. By default, order is set to zero. Thus, providing a value greater than zero is sufficient to force any single input to be listed last.

Validating inputs

The previous section covered different ways you can specify for Agave to validate and restrict the data inputs to your app. When a user submits an job request, the order in which they are applied is as follows.

  1. visible
  2. required
  3. minCardinality
  4. maxCardinality
  5. validator

Once an input passes these tests, Agave will check that it exists and that the user has permission to access the data. Assuming everything passes, the input is accepted and scheduled for staging.

Parameters

The parameters attribute of your app description contains a JSON array of parameter objects. A parameter represents one or more arguments that your app will use at runtime. Those arguments can be more or less anything you want them to be. If, for some reason, your app handles data staging on its own and you do not want Agave to move the data on your behalf, but you do need a data reference passed in, you can define it as a parameter rather than an input.

A minimal parameter object contains a single id attribute that uniquely identifies it within the context of your app and a value.type attribute specifying the primary type of the parameter. Any alphanumeric value under 64 characters can be an identifier, but it must be unique among all the inputs and parameters in that app. The parameter type is restricted to a handful of primary types listed in the table below.

{
  "id": "parameter1",
  "value": {
    "type": "string"
  }
}

In most situations you will want some descriptive information and validation of the user-supplied values for this parameter. As with your app inputs, app parameters have details, semantics, and value objects that allow you to do just that. The full list of parameter attributes is shown in the following table. We cover each attribute in the corresponding section below.

Name Type Description
id String Required: The textual id of this parameter. This value must be unique within all parameters and parameters for an app description.
details JSON object
details.argument string A command line argument or flag to be prepended before the parameter value.
details.description string Human-readable description of the parameter. Often used to create contextual help in automatically generated UI.
details.label string Human-readable label for the parameter. Often implemented as text label next to the field in automatically generated UI.
details.showArgument boolean Whether to include the argument value for this parameter when performing the template variable replacement during job submission. If true, the details.argument value will be prepended, without spaces, to the actual parameter value(s).
details.repeatArgument boolean When multiple values are provided for this input, this attribute determines whether to include the argument value before each user-supplied value when performing the template variable replacement during job submission. The details.showArgument value must be true for this value to be applied.
semantics JSON object Describes the semantic definition of this parameters and the filetypes it represents. Multiple ontologies and values are supported.
semantics.minCardinality integer Minimum number of values this parameter must have.
semantics.maxCardinality integer Maximum number of values this parameter can have. A null value or value of -1 indicates no limit.
semantics.ontology JSON array List of ontology terms (or URIs pointing to ontology terms) applicable to the parameter. We recommend at least specifying an XSL Schema Simple Type.
value JSON object A description of the anticipated value and the situations when it is required.
value.default string, JSON array The default value for this parameter. This value is optional except when value.required is true and value.visible is false. If the value.type is of this parameter is enumeration, this value must be one of the specified value.enumValues. If the value.type is of this parameter is bool or flag, then only boolean values are accepted here.
value.enumValues JSON array An array of values specifying the possible values this parameter may have when value.type is enumeration. Both JSON Objects and strings are supported in the array. If a JSON Object is given, the object must be a single value attribute. The key will be the value passed into the wrapper template. The value will be the display value shown when auto-generating the option element in the select box representing this input.
value.order integer The order in which this parameter should appear when auto-generating a command line invocation.
value.required boolean Required: Is specification of this parameter mandatory to run a job?
value.type string, number, enumeration, bool, flag JSON type for this parameter (used to generate and validate UI).
value.validator string Perl-formatted regular expression to restrict valid values.
value.visible boolean When automatically generated a UI, should this field be visible to end users? If false, users will not be able to set this value in their job request.
value.enquote boolean Should the value be surrounded in quotation marks prior to injecting into the wrapper template at job runtime.

Parameter details section

The parameters.[].details object contains information specifying how to describe a parameter in different contexts and is identical to the inputs.[].details object.

Parameter semantics section

The parameters.[].semantics object contains semantic information about the parameter. Unlike the inputs.[].semantics object, it only has a single attribute, ontology. The ontology attribute specifies a JSON array of URLs pointing to the ontology definitions of this parameter type. (We recommend at least specifying an XSL Schema Simple Type.)

Parameter value section

The parameters.[].value object contains the information needed to validate user-supplied parameter values in a job request. The type attribute defines the primary type of this parameter’s values. The available types are:

  • number: any real number
  • string: any json-escaped alphanumeric string.
  • bool: true or false
  • flag: true or false. Identical to boolean, but only the `argument` value will be inserted into the wrapper template.
  • enumeration: a JSON array of strings values or JSON objects representing the acceptable values for this parameter. If an array of JSON objects is given, each object should have a single attribute with the key being a desired enumeration value, and the value being a human readable descriptive name for the enumerated value. The value of using objects vs strings is that object values provide a way to create more descriptive user interfaces by customizing both the content and value of a HTML select box’s option elements. An example of both is given below.
[
  "red",
  "white",
  "green",
  "black"
]

[
  { "red": "Deep Cherry Red" },
  { "white": "Bright White" },
  { "green": "Black Forest Green" },
  { "black": "Brilliant Black Crystal Pearl" }
]

The validator attribute accepts a Perl regular expression which will be applied to the input value(s). Any submissions that do not match the validator expression will be rejected. This attribute is available both to parameters of type number and string. It is not available to bool or flag parameter types, or to enumeration parameters as they require the enumValues attribute instead.

The default attribute allows you to specify a default value for the parameter. This will be used in leu of a user-supplied value if the parameter is required, but not visible. All default values must match the appropriate validator if type is number or string, or be one of the values in the enumValues array if type is enumeration.

The enumValues attribute is a JSON array of alphanumeric values specifying the acceptable values for this input. This attribute only exists for enumeration parameter types.

The required attribute specifies whether the parameter must be specified during a job submission.

The visible attribute takes a boolean value specifying whether the parameter should be accepted as as a user-supplied value in a job requests. If false, the value will be ignored at job submission and the default value will be used instead. Whenever visible is set to false, required must be true.

The order attribute is used to specify the order in which parameters should be listed in the response from the API and in command-line generation. By default, order is set to 0. Thus, providing a value greater than zero is sufficient to force any single parameter to be listed last.

Validating inputs

The previous section covered different ways you can tell for Agave to validate and restrict the parameters to your app. When a user submits an job request, the order in which they are applied is as follows.

  1. visible
  2. required
  3. type
  4. validator / enumValues

Wrapper Templates

In order to run your application, you will need to create a wrapper template that calls your executable code. The wrapper template is a simple script that Agave will filter and execute to start your app. The filtering Agave applies to your wrapper script is to inject runtime values from a job request into the script to replace the template variables representing the inputs and parameters of your app.

The order in which wrapper templates are processed in HPC and Condor apps is as follows.

1. environment variables injected. 2. startupScript run. 3. Scheduler directives prepended to the wrapper template. 4. additionalDirectives concatenated after the scheduler directives. 5. Custom modules concatenated after the additionalDirectives. 6. inputs and parameters template variables replaced with values from the job request. 7. Blacklist commands, if present, are disabled in the scripts. 8. Resulting script is written to the remote job execution folder and executed.

The order in which wrapper templates are processed in CLI apps is as follows.

1. Shell environment sourced 2. environment variables injected 3. startupScript run 4. Custom modules prepended to the top of the wrapper 5. inputs and parameters template variables replaced with values from the job request 6. Blacklist commands, if present, are disabled in the scripts. 7. Resulting script is forked into the background immediately.

Environment

Comes from the system definition. Handle in your script if you cannot change the system definition to suite your needs. Ship whatever you need with your app’s assets.

Modules

See more about Modules and Lmod. Can be used to customize your environment, locate your application, and improve portability between systems. Agave does not install or manage the module installation on a particular system, however it does know how to interact with it. Specifying the modules needed to run your app either in your wrapper template or in your system definition can greatly help you during the development process.

Default job macros

Agave provides information about the job, system, and user as predefined macros you can use in your wrapper templates. The full list of runtime job macros are give in the following table.

Variable Description
AGAVE_JOB_APP_ID The appId for which the job was requested.
AGAVE_JOB_ARCHIVE Binary boolean value indicating whether the current job will be archived after the wrapper template exits.
AGAVE_JOB_ARCHIVE_SYSTEM The system to which the job will be archived after the wrapper template exits.
AGAVE_JOB_ARCHIVE_URL The fully qualified URL to the archive folder where the job output will be copied if archiving is enabled, or the URL of the output listing
AGAVE_JOB_ARCHIVE_PATH The path on the archiveSystem where the job output will be copied if archiving is enabled.
AGAVE_JOB_BATCH_QUEUE The batch queue on the AGAVE_JOB_EXECUTION_SYSTEM to which the job was submitted.
AGAVE_JOB_EXECUTION_SYSTEM The Agave execution system id where this job is running.
AGAVE_JOB_ID The unique identifier of the job.
AGAVE_JOB_MEMORY_PER_NODE The amount of memory per node requested at submit time.
AGAVE_JOB_NAME The slugified version of the name of the job. See the section on Conventions for more information about slugs.
AGAVE_JOB_NAME_RAW The name of the job as given at submit time.
AGAVE_JOB_NODE_COUNT The number of nodes requested at submit time.
AGAVE_JOB_OWNER The username of the job owner.
AGAVE_JOB_PROCESSORS_PER_NODE The number of cores requested at submit time.
AGAVE_JOB_SUBMIT_TIME The time at which the job was submitted in ISO-8601 format.
AGAVE_JOB_TENANT The id of the tenant to which the job was submitted.
AGAVE_JOB_ARCHIVE_URL The Agave url to which the job will be archived after the job completes.
AGAVE_JOB_CALLBACK_RUNNING Represents a call back to the API stating the job has started.
AGAVE_JOB_CALLBACK_CLEANING_UP Represents a call back to the API stating the job is cleaning up.
AGAVE_JOB_CALLBACK_ALIVE Represents a call back to the API stating the job is still alive. This will essentially update the timestamp on the job and add an entry to the job’s history record.
AGAVE_JOB_CALLBACK_NOTIFICATION Represents a call back to the API telling it to forward a notification to the registered endpoint for that job. If no notification is registered, this will be ignored.
AGAVE_JOB_CALLBACK_FAILURE Represents a call back to the API stating the job failed. Use this with caution as it will tell the API the job failed even if it has not yet completed. Upon receiving this callback, Agave will abandon the job and skip any archiving that may have been requested. Think of this as kill -9 for the job lifecycle.

Input data

Agave will stage the files and folders you specify as inputs to your app. These will be available in the top level of your job directory at runtime. Additionally, the names of each of the inputs will be injected into your wrapper template for you to use in your application logic. Please be aware that Agave will not attempt to resolve namespace conflicts between your app inputs. That means that if a job specifies two inputs with the same name, one will overwrite the other during the input staging phase of the job and, though the variable names will be correctly injected to the wrapper script, your job will most likely fail due to missing data.

Variable injection

If you refer back to the app definition we used in the App Management Tutorial, you will see there are multiple inputs and parameters defined for that app. Each input and parameter object had an id attribute. That id value is the attribute name you use to associate runtime values with app inputs and parameters. When a job is submitted to Agave, prior to physically running the wrapper template, all instances of that id are replaced with the actual value from the job request. The example below shows our app description, a job request, and the resulting wrapper template at run time.

Type declarations

During the jobs submission process, Agave will store your inputs and parameters as serialized JSON. At the point that variable injection occurs, Agave will replace all occurrences of your input and parameter with their value provided in the job request. In order for Agave to properly identify your input and parameter ids, wrap them in brackets and prepend a dollar sign. For example, if you have a parameter with id param1, you would include it in your wrapper script as ${param1}. Case sensitivity is honored at all times.

Boolean values

Boolean values are passed in as truthy values. true = 1, false is empty.

Cardinality

Cardinality is not used in resolving wrapper template variables.

Validation

Cardinality is not used in resolving wrapper template variables.

Parameter Flags

If your parameter was of type “flag”, Agave will replace all occurences of the template variable with the value you provided for the argument field.

App packaging

Agave API apps have a generalized structure that allows them to carry dependencies around with them. In the case below, package-name-version.dot.dot is a folder that you build on your local system, then store in your Agave Cloud Storage in a designated location (we recommend /home//applications/). It contains binaries, support scripts, test data, etc. all in one package. Agave basically uses a very rough form of containerized applications (more on this later). We suggest you set your apps up to look something like the following:

package-name-version.dot.dot
|--system_name
|----bin.tgz (optional)
|----lib.tgz (optional)
|----include.tgz (optional)
|----test.sh
|----script.template
|----test_data (optional)
|----app.json

Agave runs a job by first transferring a copy of this directory into temporary directory on the target executionSystem. Then, the input data files (we’ll show you how to specify those are later) are staged into place automatically. Next, Agave writes a scheduler submit script (using a template you provide i.e. script.template) and puts it in the queue on the target system. The Agave service then monitors progress of the job and, assuming it completes, copies all newly-created files to the location specified when the job was submitted. Along the way, critical milestones and metadata are recorded in the job’s history.

Agave app development proceeds via the following steps:

  1. Build the application locally on the executionSystem
  2. Ensure that you are able to run it directly on the executionSystem
  3. Describe the application using an Agave app description
  4. Create a shell template for running the app
  5. Upload the application directory to a storageSystem
  6. Post the app description to the Agave apps service
  7. Debug your app by running jobs and updating the app until it works as intended
  8. (Optional) Share the app with some friends to let them test it

Build a samtools application bundle

# Log into Stampede
ssh stampede.tacc.utexas.edu

# Unload system&#039;s samtools module if it happens to be loaded by default
module unload samtools

# All TACC systems have a directory than can be accessed as $WORK
cd $WORK

# Set up a project directory
mkdir iPlant
mkdir iPlant/src
mkdir -p iPlant/samtools-0.1.19/stampede/bin
mkdir -p iPlant/samtools-0.1.19/stampede/test

# Build samtools using the Intel C Compiler
# If you don&#039;t have icc, gcc will work but icc usually gives more efficient binaries
cd iPlant/src
wget "http://downloads.sourceforge.net/project/samtools/samtools/0.1.19/samtools-0.1.19.tar.bz2"
tar -jxvf samtools-0.1.19.tar.bz2
cd samtools-0.1.19
make CC=icc

# Copy the samtools binary and support scripts to the project bin directory
cp -R samtools bcftools misc ../../samtools-0.1.19/stampede/bin/
cd ../../samtools-0.1.19/stampede

# Test that samtools will launch
bin/samtools

  Program: samtools (Tools for alignments in the SAM format)
  Version: 0.1.19-44428cd

  Usage:   samtools &lt;command&gt; [options]

  Command: view        SAM&lt;-&gt;BAM conversion
           sort        sort alignment file
           mpileup     multi-way pileup...

# Package up the bin directory as an compressed archive
# and remove the original. This preserves the execute bit
# and other permissions and consolidates movement of all
# bundled dependencies in bin to a single operation. You
# can adopt a similar approach with lib and include.
tar -czf bin.tgz bin &amp;&amp; rm -rf bin

Run samtools sort locally

Your first objective is to create a script that you know will run to completion under the Stampede scheduler and environment (or whatever executionSystem you’re working on). It will serve as a model for the template file you create later. In our case, we need to write a script that can be submitted to the Slurm scheduler. The standard is to use Bash for such scripts. You have five main objectives in your script: * Unpack binaries from bin.tgz * Extend your PATH to contain bin * Craft some option-handling logic to accept parameters from Agave * Craft a command line invocation of the application you will run * Clean up when you’re done

First, you will need some test data in your current directory (i.e., $WORK/iPlant/samtools-0.1.19/stampede/ ). You can use this test file

files-get -S data.agaveapi.co /shared/datasets/example_data/Samtools_mpileup/ex1.bam

or you can any other BAM file for your testing purposes. Make sure if you use another file to change the filename in your test script accordingly!

Now, author your script. You can paste the following code into a file called test-sort.sh or you can copy it from $IPLANT_SDK_HOME/examples/samtools-0.1.19/stampede/test-sort.sh

#!/bin/bash

# Agave automatically writes these scheduler
# directives when you submit a job but we have to
# do it by hand when writing our test

#SBATCH -p development
#SBATCH -t 00:30:00
#SBATCH -n 16
#SBATCH -A iPlant-Collabs
#SBATCH -J test-samtools
#SBATCH -o test-samtools.o%j

# Set up inputs and parameters
# We&#039;re emulating passing these in from Agave
# inputBam is the name of the file to be sorted
inputBam="ex1.bam"
# outputPrefix is a parameter that establishes
# the prefix for the final sorted file
outputPrefix="sorted"
# Parameter for memory used in sort operation, in bytes
maxMemSort=500000000
# Boolean: Sort by name instead of coordinate
nameSort=0

# Unpack the bin.tgz file containing samtools binaries
# If you are relying entirely on system-supplied binaries
# you don&#039;t need this bit
tar -xvf bin.tgz
# Extend PATH to include binaries in bin
# If you need to extend lib, include, etc
# the same approach is applicable
export PATH=$PATH:"$PWD/bin"

# Dynamically construct a command line
# by building an ARGS string then
# adding the command, file specifications, etc
#
# We&#039;re doing this in a way familar to Agave V1 users
# first. Later, we&#039;ll illustrate how to make use of
# Agave V2&#039;s new parameter passing functions
#
# Start with empty ARGS...
ARGS=""
# Add -m flag if maxMemSort was specified
# You might want to add a constraint for how large maxMemSort
# can be based on the available memory on your executionSystem
if [ ${maxMemSort} -gt 0 ]; then ARGS="${ARGS} -m $maxMemSort"; fi

# Boolean handler for -named sort
if [ ${nameSort} -eq 1 ]; then ARGS="${ARGS} -n "; fi

# Run the actual program
samtools sort ${ARGS} ${inputBam} ${outputPrefix}

# Now, delete the bin/ directory
rm -rf bin

Submit the job to the queue on Stampede…

chmod 700 test-sort.sh
sbatch test-sort.sh

You can monitor your jobs in the queue using

showq -u your_tacc_username

Assuming all goes according to plan, you’ll end up with a sorted BAM called sorted.bam, and your bin directory (but not the bin.tgz file) should be erased. Congratulations, you’re in the home stretch: it’s time to turn the test script into an Agave app.

Craft an Agave app description

In order for Agave to know how to run an instance of the application, we need to provide quite a bit of metadata about the application. This includes a unique name and version, the location of the application bundle, the identities of the execution system and destination system for results, whether its an HPC or other kind of job, the default number of processors and memory it needs to run, and of course, all the inputs and parameters for the actual program. It seems a bit over-complicated, but only because you’re comfortable with the command line already. Your goal here is to allow your applications to be portable across systems and present a web-enabled, rationalized interface for your code to consumers.

Rather than have you write a description for “samtools sort” from scratch, let’s systematically dissect an existing file provided with the SDK. Go ahead and copy the file into place and open it in your text editor of choice. If you don’t have the SDK installed, you can grab it here.

cd $WORK/iPlant/samtools-0.1.19/stampede/
cp $IPLANT_SDK_HOME/examples/samtools-0.1.19/stampede/samtools-sort.json .

Open up samtools-sort.json in a text editor or in your web browser and follow along below.

Overview

Your file samtools-sort.json is written in JSON, and conforms to an Agave-specific data model. You can find fully fleshed out details about all fields under Parameters -> Data Type -> Model at the Agave API live docs on the /apps service. We will dive into key elements here:

To make this file work for you, you will be, at a minimum, editting:

  1. Its executionSystem to match your private instance of Stampede.
  2. Its deploymentPath to match your iPlant applications path
  3. The name of the app to something besides “samtools-sort”. We recommend “$IPLANTUSERNAME-samtools-sort”.

Instructions for making these changes will follow.

All Agave application descriptions have the following structure:

{   "application_metadata":"value",
  "inputs":[],
  "parameters":[],
  "outputs":[]
}

There is a defined list of application metadata fields, some of which are mandatory. Inputs, parameters, and outputs are specified as an array of simple data structures, which will be described below.

Application metadata

Field Mandatory Type Description
checkpointable X boolean Application supports checkpointing
defaultMemoryPerNode integer Default RAM (GB) to request per compute node
defaultProcessorsPerNode integer Default processor count to request per compute node
defaultMaxRunTime integer Default maximum run time (hours:minutes:seconds) to request per compute node
defaultNodeCount integer Default number of compute nodes per job
defaultQueue string On HPC systems, default batch queue for jobs
deploymentPath X string Path relative to homeDir on deploymentSystem where application bundle will reside
deployementSystem X string The Agave-registered STORAGE system upon which you have write permissions where the app bundle resides
executionSystem X string An Agave-registered EXECUTION system upon which you have execute and app registration permissions where jobs will run
helpURI X string A URL pointing to help or description for the app you are deploying
label X string Human-readable title for the app
longDescription string A short paragraph describing the functionality of the app
modules array[string] Ordered list of modules on systems that use lmod or modules
name X string unique, URL-compatible (no special chars or spaces) name for the app
ontology X array[string] List of ontology terms (or URIs pointing to ontology terms) associated with the app
parallelism X string Is your application capable of using more than a single compute node? (SERIAL or PARALLEL)
shortDescription X string Brief description of the app
storageSystem X string The Agave-registered STORAGE system upon which you have write permissions. Default source of and destination for data consumed and emitted by the app
tags array[string] List of human-readable tags for the app
templatePath X string Path to the shell template file, relative to deploymentPath
testPath X string Path to the shell test file, relative to deploymentPath
version X string Preferred format: Major.minor.point integer values for app

Inputs

To tell Agave what files to stage into place before job execution, you need to define the app’s inputs in a JSON array. To implement the SAMtools sort app, you need to tell Agave that a BAM file is needed to act as the subject of our sort:

{  
  "id":"inputBam",
  "value":{  
    "default":"",
    "order":0,
    "required":true,
    "validator":"",
    "visible":true
  },
  "semantics":{  
    "ontology":[  
      "http://sswapmeet.sswap.info/mime/application/X-bam"
    ],
    "minCardinality":1,
    "fileTypes":[  
      "raw-0"
    ]
  },
  "details":{  
    "description":"",
    "label":"The BAM file to sort",
    "argument":null,
    "showArgument":false
  }
}

Here’s a walkthrough of what these fields mean:

Field Mandatory Type Description
id X string This is the “name” of the file. You will use this in your wrapper script later whenever you need to refer to the BAM file being sorted
value.default string The path, relative to X, of the default value for the input
value.order integer Ignore for now
value.required X boolean Is specification of this input mandatory to run a job?
value.validator string Perl-format regular expression to restrict valid values
value.visible boolean When automatically generated a UI, should this field be visible to end users?
semantics.ontology array[string] List of ontology terms (or URIs pointing to ontology terms) applicable to the input format
semantics.minCardinality integer Minimum number of values accepted for this input
semantics.maxCardinality integer Maximum number of values accepted for this input
semantics.fileTypes X array[string] List of Agave file types accepted. Always use “raw-0” for the time being
details.description string Human-readable description of the input. Often implemented as contextual help in automatically generated UI
details.label string Human-readable label for the input. Often implemented as text label next to the field in automatically generated UI
details.argument string The command-line argument associated with specifying this input at run time
details.showArgument boolean Include the argument in the substitution done by Agave when a run script is generated

A note on paths: In this iPlant-oriented tutorial, we assume you will stage data to and from “data.agaveapi.co”, the default storage system for iPlant users. In this case, you can use relative paths relative to homeDir on that system (i.e. vaughn/analyses/foobar). To add portability, marshal data from other storageSystems, or import from public servers, you can also specify fully qualified URIs as follows: * storageSystem namespace: agave://storage-system-name/path/to/file * public URI namespace: https://www.cnn.com/index.html

Parameters

Parameters are specified in a JSON array, and are broadly similar to inputs. Here’s an example of the parameter we will define allowing users to specify how much RAM to use in a “samtools sort” operation.

{  
  "id":"maxMemSort",
  "value":{  
    "default":"500000000",
    "order":1,
    "required":true,
    "type":"number",
    "validator":"",
    "visible":true
  },
  "semantics":{  
    "ontology":[  
      "xs:integer"
    ]
  },
  "details":{  
    "description":null,
    "label":"Maxiumum memory in bytes, used for sorting",
    "argument":"-m",
    "showArgument":false
  }
}
Field Mandatory Type Description
id X string This is the “name” of the parameter. At runtime, it will be replaced in your script template based on the value passed as part of the job specification
value.default string If your app has a fixed-name output, specify it here
value.order integer Ignore for now. Supports automatic generation of command lines.
value.required boolean Is specification of this parameter mandatory to run a job?
value.type string JSON type for this parameter (used to generate and validate UI). Valid values: “string”, “number”, “enumeration”, “bool”, “flag”
value.validator string Perl-formatted regular expression to restrict valid values
value.visible boolean When automatically generated a UI, should this field be visible to end users?
semantics.ontology array[string] List of ontology terms (or URIs pointing to ontology terms) applicable to the parameter. We recommend at least specifying an XSL Schema Simple Type.
details.description string Human-readable description of the parameter. Often used to create contextual help in automatically generated UI
details.label string Human-readable label for the parameter. Often implemented as text label next to the field in automatically generated UI
details.argument string The command-line argument associated with specifying this parameter at run time
details.showArgument boolean Include the argument in the substitution done by Agave when a run script is generated

Outputs

While we don’t support outputs 100% yet, Agave apps are designed to participate in workflows. Thus, just as we define the list of valid and required inputs to an app, we also must (when we know them) define a list of its outputs. This allows it to “advertise” to consumers of Agave services what it expects to emit, allowing apps to be chained together. Note that unlike inputs and parameters, output “id"s are NOT passed to the template file. If you must specify an output filename in the application json, do it as a parameter! Outputs are defined basically the same way as inputs:

{  
  "id":"bam",
  "value":{  
    "default":"sorted.bam",
    "order":0,
    "required":false,
    "validator":"",
    "visible":true
  },
  "semantics":{  
    "ontology":[  
      "http://sswapmeet.sswap.info/mime/application/X-bam"
    ],
    "minCardinality":1,
    "fileTypes":[  
      "raw-0"
    ]
  },
  "details":{  
    "description":"",
    "label":"Sorted BAM file",
    "argument":null,
    "showArgument":false
  }
}

Obligatory field walk-through:

Field Mandatory Type Description
id X string This is the "name” of the output. It is not currently used by the wrapper script but may be in the future
value.default string If your app has a fixed-name output, specify it here
value.order integer Ignore for now
value.required X boolean Is specification of this input mandatory to run a job?
value.validator string Perl-format regular expression used to match output files
value.visible boolean When automatically generated a UI, should this field be visible to end users?
semantics.ontology array[string] List of ontology terms (or URIs pointing to ontology terms) applicable to the output format
semantics.minCardinality integer Minimum number of values expected for this output
semantics.maxCardinality integer Maximum number of values expected for this output
semantics.fileTypes X array[string] List of Agave file types that may apply to the output. Always use “raw-0” for the time being
details.description string Human-readable description of the output
details.label string Human-readable label for the output
details.argument string The command-line argument associated with specifying this output at run time (not currently used)
details.showArgument boolean Include the argument in the substitution done by Agave when a run script is generated (not currently used)

Note: If the app you are working on doesn’t natively produce output with a predictable name, one thing you can do is add extra logic to your script to take the existing output and rename it to something you can control or predict.

Tools and Utilities

  1. Stumped for ontology terms to apply to your Agave app inputs, outputs, and parameters? SSWAPmeet has many URI-format terms for MIME types, and BioPortal can provide links to EDAM.
  2. Need to validate JSON files? Try JSONlint or JSONparser

Craft a shell script template

Create sort.template using your test-sort.sh script as the starting point.

cp test-sort.sh sort.template

Now, open sort.template in the text editor of your choice. Delete the bash shebang line and the SLURM pragmas. Replace the hard-coded values for inputs and parameters with variables defined by your app description.

# Set up inputs...
# Since we don&#039;t check these when constructing the
# command line later, these will be marked as required
inputBam=${inputBam}
# and parameters
outputPrefix=${outputPrefix}
# Maximum memory for sort, in bytes
# Be careful, Neither Agave nor scheduler will
# check that this is a reasonable value. In production
# you might want to code min/max for this value
maxMemSort=${maxMemSort}
# Boolean: Sort by name instead of coordinate
nameSort=${nameSort}

# Unpack the bin.tgz file containing samtools binaries
tar -xvf bin.tgz
# Set the PATH to include binaries in bin
export PATH=$PATH:"$PWD/bin"

# Build up an ARGS string for the program
# Start with empty ARGS...
ARGS=""
# Add -m flag if maxMemSort was specified
if [ ${maxMemSort} -gt 0 ]; then ARGS="${ARGS} -m $maxMemSort"; fi

# Boolean handler for -named sort
if [ ${nameSort} -eq 1 ]; then ARGS="${ARGS} -n "; fi

# Run the actual program
samtools sort ${ARGS} $inputBam ${outputPrefix}

# Now, delete the bin/ directory
rm -rf bin

Permissions

Apps have fine grained permissions similar to those found in the Jobs and Files services. Using these, you can share your app other Agave users. App permissions are private by default, so when you first POST your app to the Apps service, you are the only one who can see it. You may share your app with other users by granting them varying degrees of permissions. The full list of app permission values are listed in the following table.

Permission Description
READ Gives the ability to view the app description.
WRITE Gives the ability to update the app.
EXECUTE Gives the ability to submit jobs using the app
ALL Gives full READ and WRITE and EXECUTE permissions to the user.
READ_WRITE Gives full READ and WRITE permissions to the user
READ_EXECUTE Gives full READ and EXECUTE permissions to the user
WRITE_EXECUTE Gives full WRITE and EXECUTE permissions to the user

App permissions are distinct from all other roles and permissions and do not have implications outside the Apps service. This means that if you want to allow someone to run a job using your app, it is not sufficient to grant them READ_EXECUTE permissions on your app. They must also have an appropriate user role on the execution system on which the app will run. Similarly, if you do not have the right to publish on the executionSystem or access the deploymentPath on the deploymentSystem in your app description, you will not be able to publish your app.

Listing permissions

App permissions are managed through a set of URLs consistent with the permission operations elsewhere in the API. To query for a user’s permission for an app, perform a GET on the user’s unique app permissions url.

curl -sk -H "Authorization: Bearer $ACCESS_TOKEN" \
     https://public.tenants.agaveapi.co/apps/v2/$APP_ID/pems/$USERNAME
apps-pems-list -v -u $USERNAME $APP_ID

The response from the service will be a JSON object representing the user permission. If the user does not have a permission for that app, the permission value will be NONE. By default, only you have permission to your private apps. Public apps will return a single permission for the public meta user rather than return a permissions for every user.

{
    "_links": {
        "app": {
            "href": "https://public.tenants.agaveapi.co/apps/v2/$APP_ID"
        },
        "profile": {
            "href": "https://public.tenants.agaveapi.co/profiles/v2/systest"
        },
        "self": {
            "href": "https://public.tenants.agaveapi.co/apps/v2/$APP_ID/pems/systest"
        }
    },
    "permission": {
        "execute": true,
        "read": true,
        "write": true
    },
    "username": "systest"
}

You can also query for all permissions granted on a specific app by making a GET request on the app’s permission collection.

curl -sk -H "Authorization: Bearer $ACCESS_TOKEN" https://public.tenants.agaveapi.co/apps/v2/$APP_ID/pems
apps-pems-list -v $APP_ID

This time the service will respond with a JSON array of permission objects.

[
   {
      "_links":{
         "app":{
            "href":"https://public.tenants.agaveapi.co/apps/v2/$APP_ID"
         },
         "profile":{
            "href":"https://public.tenants.agaveapi.co/profiles/v2/systest"
         },
         "self":{
            "href":"https://public.tenants.agaveapi.co/apps/v2/$APP_ID/pems/systest"
         }
      },
      "permission":{
         "execute":true,
         "read":true,
         "write":true
      },
      "username":"systest"
   }
]

Adding and updating permissions

Setting permissions is done by posting a JSON object containing a permission and username. Alternatively, you can POST just the permission and append the username to the URL.

# Standard syntax to grant permissions to a specific user
curl -sk -H "Authorization: Bearer $ACCESS_TOKEN" -X POST -d "username=bgibson&amp;permission=READ" https://public.tenants.agaveapi.co/apps/v2/$APP_ID/pems

# Abbreviated POST data to grant permission to a single user
curl -sk -H "Authorization: Bearer $ACCESS_TOKEN" -X POST -d "permission=READ" https://public.tenants.agaveapi.co/apps/v2/$APP_ID/pems/bgibson
apps-pems-update -v -u bgibson -p READ $APP_ID

The response will contain a JSON object representing the permission that was just created.

{
    "_links": {
        "app": {
            "href": "https://public.tenants.agaveapi.co/apps/v2/$APP_ID"
        },
        "profile": {
            "href": "https://public.tenants.agaveapi.co/profiles/v2/bgibson"
        },
        "self": {
            "href": "https://public.tenants.agaveapi.co/apps/v2/$APP_ID/pems/bgibson"
        }
    },
    "permission": {
        "execute": false,
        "read": true,
        "write": false
    },
    "username": "bgibson"
}

Deleting permissions

Permissions can be deleted on a user-by-user basis, or all at once. To delete an individual user permission, make a DELETE request on the user’s app permission URL.

curl -sk -H "Authorization: Bearer $ACCESS_TOKEN" -X DELETE https://public.tenants.agaveapi.co/apps/v2/$APP_ID/bgibson
apps-pems-delete -u bgibson $APP_ID

The response will be an empty result object.

You can accomplish the same thing by updating the user permission to an empty value.

# Delete permission for a single user by updating with an empty permission value
curl -sk -H "Authorization: Bearer $ACCESS_TOKEN"  \
     -X POST -d "username=bgibson" -d "permission=NONE" \
     https://public.tenants.agaveapi.co/apps/v2/$APP_ID/pems

# Delete permission for a single user by updating with an empty permission value
curl -sk -H "Authorization: Bearer $ACCESS_TOKEN" \
     -X POST -d "permission=" \
     https://public.tenants.agaveapi.co/apps/v2/$APP_ID/pems/bgibson
apps-pems-update -v -u bgibson $APP_ID

Since this is an update operation, the resulting JSON permission object will be returned showing the user has no permissions to the app anymore.

{
    "_links": {
        "app": {
            "href": "https://public.tenants.agaveapi.co/apps/v2/$APP_ID"
        },
        "profile": {
            "href": "https://public.tenants.agaveapi.co/profiles/v2/bgibson"
        },
        "self": {
            "href": "https://public.tenants.agaveapi.co/apps/v2/$APP_ID/pems/bgibson"
        }
    },
    "permission": {
        "execute": false,
        "read": false,
        "write": false
    },
    "username": "bgibson"
}

To delete all permissions for an app, make a DELETE request on the app’s permissions collection.

curl -sk -H "Authorization: Bearer $ACCESS_TOKEN" \
     -X DELETE \
     https://public.tenants.agaveapi.co/apps/v2/$APP_ID
apps-pems-delete $APP_ID

The response will be an empty result object.

App scope

In addition to traditional permissions, apps also have a concept of scope. Unless otherwise configured, apps are private to the owner and the users they grant permission. Applications can, however move from the private space into the public space for use any anyone. Moving an app into the public space is called publishing. Publishing an app gives it much greater exposure and results in increased usage by the user community. It also comes with increased responsibilities for the original owner as well as the API administrators. Several of these are listed below:

  • Public apps must run on public systems. This makes the app available to everyone.
  • Public apps must be vetted for performance, reliability, and security by the API administrators.
  • The original app author must remain available via email for ongoing support.
  • Public apps must be copied into a public repository and checksummed.
  • Updates to public apps must result in a snapshot of the original app being created and stored with its resulting checksum in a separate location.
  • API administrators must maintain and support the app throughout its lifetime.

Publishing an app

Publishing an app.

curl -sk -H "Authorization: Bearer $ACCESS_TOKEN"
     -H "Content-Type: application/json"
     -X PUT
     --data-binary '{"action":"publish","executionSystem":"condor.opensciencegrid.org"}'
     https://public.tenants.agaveapi.co/apps/v2/wc-osg-1.00
apps-publish -e condor.opensciencegrid.org wc-osg-1.00

The response from the service will resemble the following:

{
  "id": "wc-osg-1.00u1",
  "name": "wc-osg",
  "icon": null,
  "uuid": "8734854070765284890-242ac116-0001-005",
  "parallelism": "SERIAL",
  "defaultProcessorsPerNode": 1,
  "defaultMemoryPerNode": 1,
  "defaultNodeCount": 1,
  "defaultMaxRunTime": null,
  "defaultQueue": null,
  "version": "1.00",
  "revision": 1,
  "isPublic": false,
  "helpURI": "http://www.gnu.org/s/coreutils/manual/html_node/wc-invocation.html",
  "label": "wc condor",
  "shortDescription": "Count words in a file",
  "longDescription": "",
  "tags": [
    "gnu",
    "textutils"
  ],
  "ontology": [
    "http://sswapmeet.sswap.info/algorithms/wc"
  ],
  "executionType": "CONDOR",
  "executionSystem": "condor.opensciencegrid.org",
  "deploymentPath": "/agave/apps/wc-1.00",
  "deploymentSystem": "public.storage.agave",
  "templatePath": "/wrapper.sh",
  "testPath": "/wrapper.sh",
  "checkpointable": true,
  "lastModified": "2016-09-15T04:48:17.000-05:00",
  "modules": [
    "load TACC",
    "purge"
  ],
  "available": true,
  "inputs": [
    {
      "id": "query1",
      "value": {
        "validator": "",
        "visible": true,
        "required": false,
        "order": 0,
        "enquote": false,
        "default": [
          "read1.fq"
        ]
      },
      "details": {
        "label": "File to count words in: ",
        "description": "",
        "argument": null,
        "showArgument": false,
        "repeatArgument": false
      },
      "semantics": {
        "minCardinality": 1,
        "maxCardinality": -1,
        "ontology": [
          "http://sswapmeet.sswap.info/util/TextDocument"
        ],
        "fileTypes": [
          "text-0"
        ]
      }
    }
  ],
  "parameters": [],
  "outputs": [
    {
      "id": "outputWC",
      "value": {
        "validator": "",
        "order": 0,
        "default": "wc_out.txt"
      },
      "details": {
        "label": "Text file",
        "description": "Results of WC"
      },
      "semantics": {
        "minCardinality": 1,
        "maxCardinality": 1,
        "ontology": [
          "http://sswapmeet.sswap.info/util/TextDocument"
        ],
        "fileTypes": []
      }
    }
  ],
  "_links": {
    "self": {
      "href": "https://public.agaveapi.co/apps/v2/wc-osg-1.00u1"
    },
    "executionSystem": {
      "href": "https://public.agaveapi.co/systems/v2/condor.opensciencegrid.org"
    },
    "storageSystem": {
      "href": "https://public.agaveapi.co/systems/v2/public.storage.agave"
    },
    "history": {
      "href": "https://public.agaveapi.co/apps/v2/wc-osg-1.00u1/history"
    },
    "metadata": {
      "href": "https://public.agaveapi.co/meta/v2/data/?q=%7B%22associationIds%22%3A%228734854070765284890-242ac116-0001-005%22%7D"
    },
    "owner": {
      "href": "https://public.agaveapi.co/profiles/v2/nryan"
    },
    "permissions": {
      "href": "https://public.agaveapi.co/apps/v2/wc-osg-1.00u1/pems"
    }
  }
}

To publish an app, make a PUT request on the app resource. In this example, we publish the wc-osg-1.00 app. Notice a few things about the response.

  1. Both the executionSystem and deploymentSystem have changed. Public apps must run and store their assets on public systems.
  2. We did not specify the deploymentSystem where the public app assets should be stored, so Agave placed them on the default public storage system, public.storage.agave.
  3. We did not specify the deploymentPath where the public app assets should be stored, so Agave placed them in the publicAppsDir of the deploymentPath.
  4. The deploymentPath is now a zip archive rather than a folder. Agave does this because once, published, the app can no longer be updated, so the assets are frozen and stored in a separate location, removed from user access.
  5. The id of the app has changed. It now has a u1 appended to the original app id. This indicates that it is a public app and that it has been updated a single time. If we were to publish the app again, the resulting id would be wc-osg-1.00u2. This differs from unpublished apps whose revision number increments without impacting the app id. Every time you publish an app, the id of the resulting public app will change.

Unpublishing an app

Unpublishing a public system is equivalent to disabling it.

curl -sk -H "Authorization: Bearer $ACCESS_TOKEN"
     -H "Content-Type: application/json"
     -X PUT
     --data-binary '{"action":"disable"}'
     https://public.tenants.agaveapi.co/apps/v2/wc-osg-1.00u1
apps-disable -v wc-osg-1.00u1

The response will look identical to before, but with available set to false

Unlike systems, it is not possible to unpublish an app. Once published, a deep copy of the app is store in an external location with its own provenance trail. If you would like to remove a published app from further use, simply disable it.

Cloning an app

curl -sk -H "Authorization: Bearer $ACCESS_TOKEN"\
     -X POST -d "action=clone" \
     -d "name=my-pyplot-demo" \
     -d "version=0.1.0" \
     -d "executionSystem=sftp.storage.example.com" \
     -d "deploymentSystem=2.2" \
     -d "deploymentPath=/apps/" \
     https://public.tenants.agaveapi.co/apps/v2/demo-pyplot-demo-advanced-0.1.0?pretty=true
apps-clone -N my-pyplot-demo -V 2.2 demo-pyplot-demo-advanced-0.1.0

Often times you will want to copy an existing app for use on another system, or simply to obtain a private copy of the app for your own use. This can be done using the clone functionality in the Apps service. The following tabs show how to do this using the unix curl command as well as with the Agave CLI.

Jobs

    /$$$$$         /$$
   |__  $$        | $$
      | $$ /$$$$$$| $$$$$$$  /$$$$$$$
      | $$/$$__  $| $$__  $$/$$_____/
 /$$  | $| $$  \ $| $$  \ $|  $$$$$$
| $$  | $| $$  | $| $$  | $$\____  $$
|  $$$$$$|  $$$$$$| $$$$$$$//$$$$$$$/
 \______/ \______/|_______/|_______/

The Jobs service is a basic execution service that allows you to run applications registered with the Apps service across multiple, distributed, heterogeneous systems through a common REST interface. The service manages all aspects of execution and job management from data staging, job submission, monitoring, output archiving, event logging, sharing, and notifications. The Jobs service also provides a persistent reference to your job’s output data and a mechanism for sharing all aspects of your job with others. Each feature will be described in more detail below.

Job submission

Job submission is a term recycled from shared batch computing environments where a user would submit a request for a unit of computational work (called a Job) to the batch scheduler, then go head home for dinner while waiting for the computer to complete the job they gave it.

Originally the batch scheduler was a person and the term batch came from their ability to process several submissions together. Later on, as human schedulers were replaced by software, the term stuck even though the process remained unchanged. Today the term job submission is essentially unchanged.

A user submits a request for a unit of work to be done. The primary difference is that today, often times, the wait time between submission and execution is considerably less. On shared systems, such as many of the HPC systems originally targeted by Agave, waiting for your job to start is the price you pay for the incredible performance you get once your job starts.

Agave, too, adopts the concept of job submission, though it is not in and of itself a scheduler. In the context of Agave’s Job service, the process of running an application registered with the Apps service is referred to as submitting a job.

Unlike in the batch scheduling world where each scheduler has its own job submission syntax and its own idiosyncrasies, the mechanism for submitting a job to Agave is consistent regardless of the application or system on which you run. A HTML form or JSON object are posted to the Jobs service. The submission is validated, and the job is forwarded to the scheduling and execution services for processing.

Because Agave takes an app-centric view of science, execution does not require knowing about the underlying systems on which an application runs. Simply knowing how the parameters and inputs you want to use when running an app is sufficient to define a job. Agave will handle the rest.

As mentioned previously, jobs are submitted by making a HTTP POST request either a HTML form or a JSON object to the Jobs service. All job submissions must include a few mandatory values that are used to define a basic unit of work. Table 1 lists the optional and required attributes of all job submissions.

Name Value(s) Description
name string Descriptive name of the job. This will be slugified and used as one component of directory names in certain situations.
appId string The unique name of the application being run by this job. This must be a valid application that the calling user has permission to run.
batchQueue string The batch queue on the execution system to which this job is submitted. Defaults to the app’s defaultQueue property if specified. Otherwise a best-fit algorithm is used to match the job parameters to a queue on the execution system with sufficient capabilities to run the job.
nodeCount integer The number of nodes to use when running this job. Defaults to the app’s defaultNodes property or 1 if no default is specified.
processorsPerNode integer The number of processors this application should utilize while running. Defaults to the app’s defaultProcessorsPerNode property or 1 if no default is specified. If the application is not of executionType PARALLEL, this should be 1.
memoryPerNode string The maximum amount of memory needed per node for this application to run given in ####.#[E|P|T|G]B format. Defaults to the app’s defaultMemoryPerNode property if it exists. GB are assumed if no magnitude is specified.
maxRunTime string The estimated compute time needed for this application to complete given in hh:mm:ss format. This value must be less than or equal to the max run time of the queue to which this job is assigned.
notifications* JSON array An array of one or more JSON objects describing an event and url which the service will POST to when the given event occurs. For more on Notifications, see the section on webhooks below.
archive* boolean Whether the output from this job should be archived. If true, all new files created by this application’s execution will be archived to the archivePath in the user’s default storage system.
archiveSystem* string System to which the job output should be archived. Defaults to the user’s default storage system if not specified.
archivePath* string Location where the job output should be archived. A relative path or absolute path may be specified. If not specified, a unique folder will be created in the user’s home directory of the archiveSystem at ‘archive/jobs/job-$JOB_ID’

Table 1. The optional and required attributes common to all job submissions. Optional fields are marked with an astericks.

In addition to the standard fields for all jobs, the application you specify in the appId field will also have its own set of inputs and parameters specified during registration that are unique to that app. (For more information about app registration and descriptions, see the App Management Tutorial).

The following snippet shows a sample JSON job request that could be submitted to the Jobs service to run the pyplot-0.1.0 app from the Advanced App Example tutorial.

{
  "name":"pyplot-demo test",
  "appId":"demo-pyplot-demo-advanced-0.1.0",
  "inputs":{
    "dataset":[
      "agave://$PUBLIC_STORAGE_SYSTEM/$API_USERNAME/inputs/pyplot/testdata.csv",
      "agave://$PUBLIC_STORAGE_SYSTEM/$API_USERNAME/inputs/pyplot/testdata2.csv"
    ]
  },
  "archive":false,
  "parameters":{
    "unpackInputs":false,
    "chartType":[
      "bar",
      "line"
    ],
    "width":1024,
    "height":512,
    "background":"#d96727",
    "showYLabel":true,
    "ylabel":"The Y Axis Label",
    "showXLabel":true,
    "xlabel":"The X Axis Label",
    "showLegend":true,
    "separateCharts":false
  },
  "notifications":[
    {
      "url":"$API_EMAIL",
      "event":"RUNNING"
    },
    {
      "url":"$API_EMAIL",
      "event":"FINISHED"
    },
    {
      "url":"http://http://requestbin.agaveapi.co/o1aiawo1?job_id=${JOB_ID}&amp;status=${JOB_STATUS}",
      "event":"*",
      "persistent":true
    }
  ]
}

Notice that this example specifies a single input attribute, dataset. The pyplot-0.1.0 app definition specified that the dataset input attribute could accept more than one value (maxCardinality = 2). In the job request object, that translates to an array of string values. Each string represents a piece of data that Agave will transfer into the job work directory prior to job execution. Any value accepted by the Files service when importing data is accepted here. Some examples of valid values are given in the following table.

Name Description
inputs/pyplot/testdata.csv A relative path on the user’s default storage system.
/home/apiuser/inputs/pyplot/testdata.csv An absolute path on the user’s default storage system.
agave://$PUBLIC_STORAGE_SYSTEM/$API_USERNAME/inputs/pyplot/testdata.csv An Agave URL explicitly specifying a source system and relative path.
agave://$PUBLIC_STORAGE_SYSTEM//home/apiuser/$API_USERNAME/inputs/pyplot/testdata.csv An Agave URL explicitly specifying a source system and absolute path.
http://example.com/inputs/pyplot/testdata.csv Standard url with any supported transfer protocol.

Table 2. Examples of different syntaxes that input values can be specified in the job request object. Here we assume that the validator for the input field is such that these would pass.

The example job request also specifies parameters object with the parameters defined in the pyplot-0.1.0 app description. Notice that the parameter type value specified in the app description is reflected here. Numbers are given as numbers, not strings. Boolean and flag attributes are given as boolean true and false values. As with the input section, there is also a parameter chartType that accepts multiple values. In this case that translates to an array of string value. Had the parameter type required another primary type, that would be used in the array instead.

Finally, we see a notifications array specifying that we want Agave send three notifications related to this job. The first is a one-time email when the job starts running. The second is a one-time email when the job reaches a terminal state. The third is a webhook to the url we specified. More on notifications in the section on monitoring below.

Job submission validation

If everything went well, you will receive a response that looks something like the following JSON object.

curl -sk -H "Authorization: Bearer $ACCESS_TOKEN" -X POST -F "fileToUpload=@job.json" https://public.tenants.agaveapi.co/jobs/v2?pretty=true
jobs-submit -F job.json
{
  "status" : "success",
  "message" : null,
  "version" : "2.1.0-r6d11c",
  "result" : {
    "id" : "0001414144065563-5056a550b8-0001-007",
    "name" : "demo-pyplot-demo-advanced test-1414139896",
    "owner" : "$API_USERNAME",
    "appId" : "demo-pyplot-demo-advanced-0.1.0",
    "executionSystem" : "$PUBLIC_EXECUTION_SYSTEM",
    "batchQueue" : "debug",
    "nodeCount" : 1,
    "processorsPerNode" : 1,
    "memoryPerNode" : 1.0,
    "maxRunTime" : "01:00:00",
    "archive" : false,
    "retries" : 0,
    "localId" : "10321",
    "outputPath" : null,
    "status" : "FINISHED",
    "submitTime" : "2014-10-24T04:48:11.000-05:00",
    "startTime" : "2014-10-24T04:48:08.000-05:00",
    "endTime" : "2014-10-24T04:48:15.000-05:00",
    "inputs" : {
      "dataset" : "agave://$PUBLIC_STORAGE_SYSTEM/$API_USERNAME/inputs/pyplot/testdata.csv"
    },
    "parameters" : {
      "chartType" : "bar",
      "height" : "512",
      "showLegend" : "false",
      "xlabel" : "Time",
      "background" : "#FFF",
      "width" : "1024",
      "showXLabel" : "true",
      "separateCharts" : "false",
      "unpackInputs" : "false",
      "ylabel" : "Magnitude",
      "showYLabel" : "true"
    },
    "_links" : {
      "self" : {
        "href" : "https://public.tenants.agaveapi.co/jobs/v2/0001414144065563-5056a550b8-0001-007"
      },
      "app" : {
        "href" : "https://public.tenants.agaveapi.co/apps/v2/demo-pyplot-demo-advanced-0.1.0"
      },
      "executionSystem" : {
        "href" : "https://public.tenants.agaveapi.co/systems/v2/$PUBLIC_EXECUTION_SYSTEM"
      },
      "archiveData" : {
        "href" : "https://public.tenants.agaveapi.co/jobs/v2/0001414144065563-5056a550b8-0001-007/outputs/listings"
      },
      "owner" : {
        "href" : "https://public.tenants.agaveapi.co/profiles/v2/$API_USERNAME"
      },
      "permissions" : {
        "href" : "https://public.tenants.agaveapi.co/jobs/v2/0001414144065563-5056a550b8-0001-007/pems"
      },
      "history" : {
        "href" : "https://public.tenants.agaveapi.co/jobs/v2/0001414144065563-5056a550b8-0001-007/history"
      },
      "metadata" : {
        "href" : "https://public.tenants.agaveapi.co/meta/v2/data/?q={"associationIds":"0001414144065563-5056a550b8-0001-007"}"
      },
      "notifications" : {
        "href" : "https://public.tenants.agaveapi.co/notifications/v2/?associatedUuid=0001414144065563-5056a550b8-0001-007"
      }
    }
  }
}

Job monitoring

Once you submit your job request, the job will be handed off to Agave’s back end execution service. Your job may run right away, or it may wait in a batch queue on the execution system until the required resources are available. Either way, the execution process occurs completely asynchronous to the submission process. To monitor the status of your job, Agave supports two different mechanisms: polling and webhooks.

Polling

If you have ever taken a long road trip with children, you are probably painfully aware of how polling works. Starting several minutes from the time you leave the house, a child asks, “Are we there yet?” You reply, “No.” Several minutes later the child again asks, “Are we there yet?” You again reply, “No.” This process continues until you finally arrive at your destination. This is called polling and polling is bad

Polling for your job status works the same way. After submitting your job, you start a while loop that queries the Jobs service for your job status until it detects that the job is in a terminal state. The following two URLs both return the status of your job. The first will result in a list of abbreviated job descriptions, the second will result in a full description of the job with the given $JOB_ID, exactly like that returned when submitting the job. The third will result in a much smaller response object that contains only the $JOB_ID and status being returned.

curl -sk -H "Authorization: Bearer  $ACCESS_TOKEN" https://public.tenants.agaveapi.co/jobs/v2
curl -sk -H "Authorization: Bearer  $ACCESS_TOKEN" https://public.tenants.agaveapi.co/jobs/v2/0001414144065563-5056a550b8-0001-007
curl -sk -H "Authorization: Bearer  $ACCESS_TOKEN" https://public.tenants.agaveapi.co/jobs/v2/0001414144065563-5056a550b8-0001-007/status

Sample response snippet

{
  "status" : "success",
  "message" : null,
  "version" : "2.1.0-r6d11c",
  "result" : {
    "id" : "0001414144065563-5056a550b8-0001-007",
    "status" : "FINISHED",
    "_links" : {
      "self" : {
        "href" : "$API_BASE_URL/jobs/v2/0001414144065563-5056a550b8-0001-007"
      }
    }
  }
}

The list of all possible job statuses is given in table 2.

Event Description
CREATED The job was updated
UPDATED The job was updated
DELETED The job was deleted
PERMISSION_GRANT User permission was granted
PERMISSION_REVOKE Permission was removed for a user on this job
PENDING Job accepted and queued for submission.
STAGING_INPUTS Transferring job input data to execution system
CLEANING_UP Job completed execution
ARCHIVING Transferring job output to archive system
STAGING_JOB Job inputs staged to execution system
FINISHED Job complete
KILLED Job execution killed at user request
FAILED Job failed
STOPPED Job execution intentionally stopped
RUNNING Job started running
PAUSED Job execution paused by user
QUEUED Job successfully placed into queue
SUBMITTING Preparing job for execution and staging binaries to execution system
STAGED Job inputs staged to execution system
PROCESSING_INPUTS Identifying input files for staging
ARCHIVING_FINISHED Job archiving complete
ARCHIVING_FAILED Job archiving failed
HEARTBEAT Job heartbeat received

Table 2. Job statuses listed in progressive order from job submission to completion.

Polling is an incredibly effective approach, but it is bad practice for two reasons. First, it does not scale well. Querying for one job status every few seconds does not take much effort, but querying for 100 takes quite a bit of time and puts unnecessary load on Agave’s servers. Second, polling provides what is effectively a binary response. It tells you whether a job is done or not done, it does not give you any information on what is actually going on with the job or where it is in the overall execution process.

The job history URL provides much more detailed information on the various state changes, system messages, and progress information associated with data staging. The syntax of the job history URL is as follows

curl -sk -H "Authorization: Bearer  $ACCESS_TOKEN" https://public.tenants.agaveapi.co/jobs/v2/0001414144065563-5056a550b8-0001-007/history

Sample response snippet

{
  "status":"success",
  "message":null,
  "version":"2.1.0-r6d11c",
  "result":[
    {
      "created":"2014-10-24T04:47:45.000-05:00",
      "status":"PENDING",
      "description":"Job accepted and queued for submission."
    },
    {
      "created":"2014-10-24T04:47:47.000-05:00",
      "status":"PROCESSING_INPUTS",
      "description":"Attempt 1 to stage job inputs"
    },
    {
      "created":"2014-10-24T04:47:47.000-05:00",
      "status":"PROCESSING_INPUTS",
      "description":"Identifying input files for staging"
    },
    {
      "created":"2014-10-24T04:47:48.000-05:00",
      "status":"STAGING_INPUTS",
      "description":"Staging agave://$PUBLIC_STORAGE_SYSTEM/$API_USERNAME/inputs/pyplot/testdata.csv to remote job directory"
    },
    {
      "progress":{
        "averageRate":0,
        "totalFiles":1,
        "source":"agave://$PUBLIC_STORAGE_SYSTEM/$API_USERNAME/inputs/pyplot/testdata.csv",
        "totalActiveTransfers":0,
        "totalBytes":3212,
        "totalBytesTransferred":3212
      },
      "created":"2014-10-24T04:47:48.000-05:00",
      "status":"STAGING_INPUTS",
      "description":"Copy in progress"
    },
    {
      "created":"2014-10-24T04:47:50.000-05:00",
      "status":"STAGED",
      "description":"Job inputs staged to execution system"
    },
    {
      "created":"2014-10-24T04:47:55.000-05:00",
      "status":"SUBMITTING",
      "description":"Preparing job for submission."
    },
    {
      "created":"2014-10-24T04:47:55.000-05:00",
      "status":"SUBMITTING",
      "description":"Attempt 1 to submit job"
    },
    {
      "created":"2014-10-24T04:48:08.000-05:00",
      "status":"RUNNING",
      "description":"Job started running"
    },
    {
      "created":"2014-10-24T04:48:12.000-05:00",
      "status":"CLEANING_UP"
    },
    {
      "created":"2014-10-24T04:48:15.000-05:00",
      "status":"FINISHED",
      "description":"Job completed. Skipping archiving at user request."
    }
  ]
}

Depending on the nature of your job and the reliability of the underlying systems, the response from this service can grow rather large, so it is important to be aware that this query can be an expensive call for your client application to make. Everything we said before about polling job status applies to polling job history with the additional caveat that you can chew through quite a bit of bandwidth polling this service, so keep that in mind if your application is bandwidth starved.

Often times, however, polling is unavoidable. In these situations, we recommend using an exponential backoff to check job status. An exponential backoff is an alogrithm that increases the time between retries as the number of failures increases.

Webhooks

Webhooks are the alternative, preferred way for your application to monitor the status of asynchronous actions in Agave. If you are a Gang of Four disciple, webhooks are a mechanism for implementing the Observer Pattern. They are widely used across the web and chances are that something you’re using right now is leveraging them. In the context of Agave, a webhook is a URL that you give to Agave in advance of an event which it later POSTs a response to when that event occurs. A webhook can be any web accessible URL.

The Jobs service provides several template variables for constructing dynamic URLs. Template variables can be included anywhere in your URL by surrounding the variable name in the following manner ${VARIABLE_NAME}. When an event of interest occurs, the variables will be resolved and the resulting URL called. Several example urls are given below.

http://example.com/?job_id=${JOB_ID}&amp;job_status=${EVENT}

http://example.com/trigger/job/${JOB_NAME}/${EVENT}

http://example.com/webhooks/?nonce=sdfkajerouiwe234289fahlkqr&amp;id=${JOB_ID}&amp;status=${EVENT}&amp;start=${JOB_START_TIME}&amp;end=${JOB_END_TIME}&amp;url=${JOB_ARCHIVE_URL}

The full list of template variables are listed in the following table.

Table 3. Template variables available for use when defining webhooks for your job.

Email

In situations where you do not have a persistent web address, or access to a backend service, you may find it more convenient to subscribe for email notifications rather then providing a webhook. Agave supports email notifications as well. Simply specify a valid email address in the url field in your job submission notification object and an email will be sent to that address when a relevant event occurs. A sample email message is given below.

The status of job 0001414144065563-5056a550b8-0001-007, "demo-pyplot-demo-advanced test-1414139896," has changed to FINISHED.

Name: demo-pyplot-demo-advanced test-1414139896
URL: https://public.tenants.agaveapi.co/jobs/v2/0001414144065563-5056a550b8-0001-007
Message: Job completed successfully.
Submit Time: 2014-10-24T04:48:11.000-05:00
Start Time: 2014-10-24T04:48:08.000-05:0
End Time: 2014-10-24T04:48:15.000-05:00
Output Path: $API_USERNAME/archive/jobs/job-0001414144065563-5056a550b8-0001-007
Output URL: https://public.tenants.agaveapi.co/jobs/v2/0001414144065563-5056a550b8-0001-007/outputs

Websockets

Websockets are a realtime approach to monitoring where your client application listens on a dedicated channel for notification messages from Agave. Simply subscribe to Agave’s websocket server (https://realtime.agaveapi.co and listen for a channel matching the job id.

/agave.prod/$API_USERNAME/$JOB_ID

Stopping

Once your job is submitted, you have the ability to stop the job. This will kill the job on the system on which it is running.

curl -sk -H "Authorization: Bearer $ACCESS_TOKEN" -X POST -d "action=kill" https://public.tenants.agaveapi.co/jobs/v2/$JOB_ID
jobs-stop $JOB_ID
{
  "id" : "$JOB_ID",
  "name" : "demo-pyplot-demo-advanced test-1414139896",
  "owner" : "$API_USERNAME",
  "appId" : "demo-pyplot-demo-advanced-0.1.0",
  "executionSystem" : "$PUBLIC_EXECUTION_SYSTEM",
  "batchQueue" : "debug",
  "nodeCount" : 1,
  "processorsPerNode" : 1,
  "memoryPerNode" : 1.0,
  "maxRunTime" : "01:00:00",
  "archive" : false,
  "retries" : 0,
  "localId" : "10321",
  "outputPath" : null,
  "status" : "STOPPED",
  "submitTime" : "2014-10-24T04:48:11.000-05:00",
  "startTime" : "2014-10-24T04:48:08.000-05:00",
  "endTime" : null,
  "inputs" : {
    "dataset" : "agave://$PUBLIC_STORAGE_SYSTEM/$API_USERNAME/inputs/pyplot/testdata.csv"
  },
  "parameters" : {
    "chartType" : "bar",
    "height" : "512",
    "showLegend" : "false",
    "xlabel" : "Time",
    "background" : "#FFF",
    "width" : "1024",
    "showXLabel" : "true",
    "separateCharts" : "false",
    "unpackInputs" : "false",
    "ylabel" : "Magnitude",
    "showYLabel" : "true"
  },
  "_links" : {
    "self" : {
      "href" : "https://public.tenants.agaveapi.co/jobs/v2/0001414144065563-5056a550b8-0001-007"
    },
    "app" : {
      "href" : "https://public.tenants.agaveapi.co/apps/v2/demo-pyplot-demo-advanced-0.1.0"
    },
    "executionSystem" : {
      "href" : "https://public.tenants.agaveapi.co/systems/v2/$PUBLIC_EXECUTION_SYSTEM"
    },
    "archiveData" : {
      "href" : "https://public.tenants.agaveapi.co/jobs/v2/0001414144065563-5056a550b8-0001-007/outputs/listings"
    },
    "owner" : {
      "href" : "https://public.tenants.agaveapi.co/profiles/v2/$API_USERNAME"
    },
    "permissions" : {
      "href" : "https://public.tenants.agaveapi.co/jobs/v2/0001414144065563-5056a550b8-0001-007/pems"
    },
    "history" : {
      "href" : "https://public.tenants.agaveapi.co/jobs/v2/0001414144065563-5056a550b8-0001-007/history"
    },
    "metadata" : {
      "href" : "https://public.tenants.agaveapi.co/meta/v2/data/?q={"associationIds":"0001414144065563-5056a550b8-0001-007"}"
    },
    "notifications" : {
      "href" : "https://public.tenants.agaveapi.co/notifications/v2/?associatedUuid=0001414144065563-5056a550b8-0001-007"
    }
  }
}

Deleting

Deleting a job

curl -sk -H "Authorization: Bearer $ACCESS_TOKEN" -X DELETE https://public.tenants.agaveapi.co/jobs/v2/$JOB_ID
jobs-delete $JOB_ID

Over time the number of jobs you have run can grow rather large. You can delete jobs to remove them from your listing results.

Resubmitting

Resubmitting a job

curl -sk -H "Authorization: Bearer $ACCESS_TOKEN" -X POST -d "action=resubmit" https://public.tenants.agaveapi.co/jobs/v2/$JOB_ID
jobs-resubmit $JOB_ID

Often times you will want to rerun a previous job as part of a pipeline, automation, or validation that the results were valid. In this situation, it is convenient to use the resubmit feature of the Jobs service.

Resubmission provides you the options to enforce as much or as little rigor as you desire with respect to reproducibility in the job submission process. The following options are available to you for configuring a resubmission according to your requirements.

Field Type Description
ignoreInputConflicts boolean Whether to ignore discrepencies in the previous app inputs for the resubmitted job. If true, the resubmitted job will make a best fit attempt and migrating the inputs.
ignoreParameterConflicts boolean Whether to ignore discrepencies in the previous app parameters for the resubmitted job. If true, the resubmitted job will make a best fit attempt and migrating the parameters.
preserveNotifications boolean Whether to recreate the notification of the original job for the resubmitted job.

Outputs

Throughout the lifecycle of a job, your inputs, application assets, and outputs are copied from and shuffled between several different locations. Though it is possible in many instances to explicitly locate and view all the moving pieces of your job through the Files service, resolving where those pieces are given the status, execution system, storage systems, data protocols, login protocols, and execution mechanisms of your job at a given time is…challenging. It is important, however, that you have the ability to monitor your job’s output throughout the lifetime of the job.

To make tracking the output of a specific job easier to do, the Jobs service provides a special URL for referencing individual job outputs

curl -sk -H "Authorization: Bearer  $ACCESS_TOKEN" https://public.tenants.agaveapi.co/jobs/v2/$JOB_ID/outputs/listings/$PATH

The syntax of this service is consistent with the Files service syntax, as is the JSON response from the service. The response would be similar to the following:

{
  "status" : "success",
  "message" : null,
  "version" : "2.1.0-r6d11c",
  "result" : [ {
    "name" : "output",
    "path" : "/output",
    "lastModified" : "2014-11-06T13:34:35.000-06:00",
    "length" : 0,
    "permission" : "NONE",
    "mimeType" : "text/directory",
    "format" : "folder",
    "type" : "dir",
    "_links" : {
      "self" : {
        "href" : "https://public.tenants.agaveapi.co/jobs/v2/0001414144065563-5056a550b8-0001-007/outputs/media/output"
      },
      "system" : {
        "href" : "https://public.tenants.agaveapi.co/systems/v2/data.agaveapi.co"
      },
      "parent" : {
        "href" : "https://public.tenants.agaveapi.co/jobs/v2/0001414144065563-5056a550b8-0001-007"
      }
    }
  }, {
    "name" : "demo-pyplot-demo-advanced-test-1414139896.err",
    "path" : "/demo-pyplot-demo-advanced-test-1414139896.err",
    "lastModified" : "2014-11-06T13:34:27.000-06:00",
    "length" : 442,
    "permission" : "NONE",
    "mimeType" : "application/octet-stream",
    "format" : "unknown",
    "type" : "file",
    "_links" : {
      "self" : {
        "href" : "https://public.tenants.agaveapi.co/jobs/v2/0001414144065563-5056a550b8-0001-007/outputs/media/demo-pyplot-demo-advanced-test-1414139896.err"
      },
      "system" : {
        "href" : "https://public.tenants.agaveapi.co/systems/v2/data.agaveapi.co"
      },
      "parent" : {
        "href" : "https://public.tenants.agaveapi.co/jobs/v2/0001414144065563-5056a550b8-0001-007"
      }
    }
  }, {
    "name" : "demo-pyplot-demo-advanced-test-1414139896.out",
    "path" : "/demo-pyplot-demo-advanced-test-1414139896.out",
    "lastModified" : "2014-11-06T13:34:30.000-06:00",
    "length" : 1396,
    "permission" : "NONE",
    "mimeType" : "application/octet-stream",
    "format" : "unknown",
    "type" : "file",
    "_links" : {
      "self" : {
        "href" : "https://public.tenants.agaveapi.co/jobs/v2/0001414144065563-5056a550b8-0001-007/outputs/media/demo-pyplot-demo-advanced-test-1414139896.out"
      },
      "system" : {
        "href" : "https://public.tenants.agaveapi.co/systems/v2/data.agaveapi.co"
      },
      "parent" : {
        "href" : "https://public.tenants.agaveapi.co/jobs/v2/0001414144065563-5056a550b8-0001-007"
      }
    }
  }, {
    "name" : "demo-pyplot-demo-advanced-test-1414139896.pid",
    "path" : "/demo-pyplot-demo-advanced-test-1414139896.pid",
    "lastModified" : "2014-11-06T13:34:33.000-06:00",
    "length" : 6,
    "permission" : "NONE",
    "mimeType" : "application/octet-stream",
    "format" : "unknown",
    "type" : "file",
    "_links" : {
      "self" : {
        "href" : "https://public.tenants.agaveapi.co/jobs/v2/0001414144065563-5056a550b8-0001-007/outputs/media/demo-pyplot-demo-advanced-test-1414139896.pid"
      },
      "system" : {
        "href" : "https://public.tenants.agaveapi.co/systems/v2/data.agaveapi.co"
      },
      "parent" : {
        "href" : "https://public.tenants.agaveapi.co/jobs/v2/0001414144065563-5056a550b8-0001-007"
      }
    }
  }, {
    "name" : "testdata.csv",
    "path" : "/testdata.csv",
    "lastModified" : "2014-11-06T13:34:42.000-06:00",
    "length" : 3212,
    "permission" : "NONE",
    "mimeType" : "application/octet-stream",
    "format" : "unknown",
    "type" : "file",
    "_links" : {
      "self" : {
        "href" : "https://public.tenants.agaveapi.co/jobs/v2/0001414144065563-5056a550b8-0001-007/outputs/media/testdata.csv"
      },
      "system" : {
        "href" : "https://public.tenants.agaveapi.co/systems/v2/data.agaveapi.co"
      },
      "parent" : {
        "href" : "https://public.tenants.agaveapi.co/jobs/v2/0001414144065563-5056a550b8-0001-007"
      }
    }
  } ]
}

To download a file you would use the following syntax

curl -sk -H "Authorization: Bearer  $ACCESS_TOKEN" https://public.tenants.agaveapi.co/jobs/v2/$JOB_ID/outputs/media/$PATH

Regardless of job status, the above services will always point to the most recent location of the job data. If you choose for the Jobs service to archive your job after completion, the URL will point to the archive folder of the job. If you do not choose to archive your data, or if archiving fails, the URL will point to the execution folder created for your job at runtime. Because Agave does not own any of the underlying hardware, it cannot guarantee that those locations will always exist. If, for example, the execution system enforces a purge policy, the output data may be deleted by the system administrators. Agave will let you know if the data is no longer present, however, it cannot prevent it from being deleted. This is another reason that it is important to archive data you feel will be needed in the future.

Job Lifecycle Management

Agave handles all of the end-to-end details involved with managing a job lifecycle for you. This can seem like black magic at times, so here we detail the overall lifecycle process every job goes through.

  1. Job request is made, validated, and saved.
  2. Job is queued up for execution. Job stays in a pending state until there are resources to run the job. This means that the target execution system is online, the storage system with the app assets is online, and neither the user nor the system are over quota. a. Resource do not become available with 7 days, the job is killed. b. Resources are available, the job moves on.
  3. When resources are available to run the job on the execution system, a work directory is created on the execution system. The job work directory is created based on the following logic: [code lang=plain] if (executionSystem.scratchDir exists) then $jobDir = executionSystem.scratchDir else if (executionSystem.workDir exists) then $jobDir = system.workDir else $jobDir = system.storage.homeDir endif $jobDir = $jobDir + “/” + job.owner + “/job-” + job.uuid “`
  4. The job inputs are staged to the job work directory, job status is updated to "INPUTS_STAGING” a. All inputs succeed and the job is updated to “STAGED” b. One or more inputs fail to transfer. Job status is set back to “PENDING” and staging will be attempted up to 2 more times. c. User does not have permission to access one or more inputs. The job is set to “FAILED” and exists.

  5. The job again waits until the resources are available to run the job. Usually this is immediately after the inputs finish staging. a. Resource do not become available with 7 days, the job is killed. b. Resources are available, the job moves on.
  6. The app deploymentPath is copied from the app.deploymentSystem to a temp dir on the API server. The jobs API then processes the app.deploymentDir + “/” + app.templatePath file to create the .ipcexe file. The process goes as follows:
    1. Script headers are written. This includes scheduler directives if a batch system, shbang if a forked app.
    2. Additional executionSystem[job.batchQueue].customDirectives are written
    3. “RUNNING” callback written
    4. Module commands are written
    5. executionSystem.environment is written
    6. wrapper script is filtered
      1. blacklisted commands are removed
      2. app parameter template variables are resolved against job parameter values.
      3. app input template variables are resolved against job input values
      4. blacklisted commands are removed again
    7. “CLEANING_UP” callback written
    8. All template macros are resolved.
    9. job.name.slugify + “.ipcexe” file written to temp directory
  7. App assets with wrapper template are copied to remote job work directory.
  8. Directory listing of job work directory is written to a .agave.archive manifest file in the remote job work directory.
  9. Command line is generated to invoke the *.ipcexe file by the appropriate method for the execution system.
  10. Command line is run on the remote system. a. The command succeeds and the scheduler/process/job id is captured and stored with the job record. b. The command fails, return the job to “STAGED” status and try up to 2 more times.
  11. Job is updated to “QUEUED”
  12. Job waits for a “RUNNING” callback and adds a background process to monitor the job in case the callback never comes.
  13. Callback checks the job status according the the following schedule [code lang=plain] * every 30 seconds for the first 5 minutes * every minute for the next 30 minutes * every 5 minutes for the next hour * every 15 minutes for the next 12 hours * every 30 minutes for the next 24 hours * every hour for the next 14 days “`

  14. Job either calls back with a "CLEANING_UP” status update or the monitoring process discovers the job no longer exists on the remote system.

  15. If job.archive is true, send job to archiving queue to stage outputs to job.archiveSystem
    1. Resource do not become available with 7 days, the job is killed.
    2. Resources are available, the job moves on.
      1. Read the .agave.archive manifest file from the job work directory
      2. Begin a breadth first directory traversal of the job work directory
      3. If a file/folder is not in the .agave.archive manifest, copy it to the job.archivePath on the job.archiveSystem
      4. Delete the job work directory
  16. Update job status to “FINISHED”

Permissions and sharing

As with the Systems, Apps, and Files services, your jobs have their own set of access controls. Using these, you can share your job and its data with other Agave users. Job permissions are private by default. The permissions you give a job apply both to the job, its outputs, its metadata, and the permissions themselves. Thus, by sharing a job with another user, you share all aspects of that job.

Job permissions are managed through a set of URLs consistent with the permissions URL elsewhere in the API.

Granting

# General grant
curl -sk -H "Authorization: Bearer $ACCESS_TOKEN" \
     -H "Content-Type: application/json" \
     -X POST --data-binary '{"permission":"READ","username":"$USERNAME"}' \
     https://public.tenants.agaveapi.co/jobs/v2/$JOB_ID/pems

# Custom url grant
curl -sk -H "Authorization: Bearer $ACCESS_TOKEN" \
     -H "Content-Type: application/json" \
     -X POST --data-binary '{"permission":"READ"}' \
     https://public.tenants.agaveapi.co/jobs/v2/$JOB_ID/pems/$USERNAME
jobs-pems-update -u $USERNAME $JOB_ID
{
  "username": "$USERNAME",
  "internalUsername": null,
  "permission": {
    "read": true,
    "write": false
  },
  "_links": {
    "self": {
      "href": "https://public.agaveapi.co/jobs/v2/$JOB_ID/pems/$USERNAME"
    },
    "parent": {
      "href": "https://public.agaveapi.co/jobs/v2/$JOB_ID"
    },
    "profile": {
      "href": "https://public.agaveapi.co/profiles/v2/$USERNAME"
    }
  }
}

Granting permissions is simply a matter of issuing a POST with the desired permission object to the job’s pems collection.

The available permission values are listed in Table 2.

Permission Description
READ Gives the ability to view the job status, and output data.
WRITE Gives the ability to perform actions, manage metadata, and set permissions.
ALL Gives full READ and WRITE permissions to the user.
READ_WRITE Synonymous to ALL. Gives full READ and WRITE permissions to the user

Table 2. Supported job permission values.

Job permissions are distinct from file permissions. In many instances, your job output will be accessible via the Files and Jobs services simultaneously. Granting a user permissions a job output file through the Files services does not alter the accessibility of that file through the Jobs service. It is important, then, that you consider to whom you grant permissions, and the implications of that decision in all areas of your application.

Listing

curl -sk -H "Authorization: Bearer $AUTH_TOKEN" \
    'https://public.agaveapi.co/jobs/v2/$JOB_ID/pems/'
jobs-pems-list -V $JOB_ID
[
  {
    "username": "$API_USERNAME",
    "internalUsername": null,
    "permission": {
      "read": true,
      "write": true
    },
    "_links": {
      "self": {
        "href": "https://public.agaveapi.co/jobs/v2/6608339759546166810-242ac114-0001-007/pems/$API_USERNAME"
      },
      "parent": {
        "href": "https://public.agaveapi.co/jobs/v2/6608339759546166810-242ac114-0001-007"
      },
      "profile": {
        "href": "https://public.agaveapi.co/profiles/v2/$API_USERNAME"
      }
    }
  },
  {
    "username": "$USERNAME",
    "internalUsername": null,
    "permission": {
      "read": true,
      "write": false
    },
    "_links": {
      "self": {
        "href": "https://public.agaveapi.co/jobs/v2/$JOB_ID/pems/$USERNAME"
      },
      "parent": {
        "href": "https://public.agaveapi.co/jobs/v2/$JOB_ID"
      },
      "profile": {
        "href": "https://public.agaveapi.co/profiles/v2/$USERNAME"
      }
    }
  }
]

To find the permissions for a given job, make a GET on the job’s pems collection. Here we see that both the job owner and the user we just granted permission to appear in the response.

Updating

curl -sk -H "Authorization: Bearer  $ACCESS_TOKEN" \
     -H "Content-Type: application/json" \
     -X POST --data-binary {"permission":"READ_WRITE}" \
     https://public.tenants.agaveapi.co/jobs/v2/$JOB_ID/$USERNAME
jobs-pems-update -u $USERNAME -p READ_WRITE $JOB_ID
{
  "username": "$USERNAME",
  "internalUsername": null,
  "permission": {
    "read": true,
    "write": true
  },
  "_links": {
    "self": {
      "href": "https://public.agaveapi.co/jobs/v2/$JOB_ID/pems/$USERNAME"
    },
    "parent": {
      "href": "https://public.agaveapi.co/jobs/v2/$JOB_ID"
    },
    "profile": {
      "href": "https://public.agaveapi.co/profiles/v2/$USERNAME"
    }
  }
}

Updating is exactly like granting permissions. Just POST to the same job’s pems collection.

Deleting

curl -sk -H "Authorization: Bearer  $ACCESS_TOKEN" \
     -X DELETE \
     https://public.tenants.agaveapi.co/jobs/v2/$JOB_ID/$USERNAME
jobs-pems-update -u $USERNAME -p '' $JOB_ID

To delete a permission, you can issue a DELETE request on the user permission resource we’ve been using, or update with an empty permission value.

Notifications

 /$$$$$$$          /$$       /$$$$$$          /$$
| $$__  $$        | $$      /$$__  $$        | $$
| $$  \ $$/$$   /$| $$$$$$$| $$  \__//$$   /$| $$$$$$$
| $$$$$$$| $$  | $| $$__  $|  $$$$$$| $$  | $| $$__  $$
| $$____/| $$  | $| $$  \ $$\____  $| $$  | $| $$  \ $$
| $$     | $$  | $| $$  | $$/$$  \ $| $$  | $| $$  | $$
| $$     |  $$$$$$| $$$$$$$|  $$$$$$|  $$$$$$| $$$$$$$/
|__/      \______/|_______/ \______/ \______/|_______/

Under the covers, the Agave API is an event-driven distributed system implemented on top of a reliable, cloud-based messaging system. This means that every action either observed or taken by Agave is tied to an event. The changing of a job from one status to another is an event. The granting of permissions on a file is an event. Editing a piece of metadata is an event, and to be sure, the moment you created an account with Agave was an event. You get the idea.

Having such a fine-grain event system is helpful for the same reason that having a fine-grain permission model is helpful. It affords you the highest degree of flexibility and control possible to achieve the behavior you desire. With Agave’s event system, you have the ability to alert your users (or yourself) the instant something occurs. You can be proactive rather than reactive, and you can begin orchestrating your complex tasks in a loosely coupled, asynchronous way.

Subscriptions

Example notification subscription request

{
  "associatedUuid": "0001409758089943-5056a550b8-0001-002",
  "event": "OVERWRITTEN",
  "persistent": true,
  "url": "nryan@rangers.mlb.com"
}

As consumers of Agave, you have the ability to subscribe to events occurring on any resource to which you have access. By that we mean, for example, you could subscribe to events on your job and a job that someone shared with you, but you could not subscribe to events on a job submitted by someone else who has not shared the job with you. Basically, if you can see a resource, you can subscribe to its events.

The Notifications service is the primary mechanism by which you create and manage your event subscriptions. A typical use case is a user subscribing for an email alert when her job completes. The following JSON object represents a request for such a notification.

The associatedUuid value is the UUID of her job. Here, we given the UUID of the picsumipsum.txt file we uploaded in the Files API guide. The event value is the name of the event to which she wants to be notified. This example is asking for an email to be sent whenever the file is overwritten. She could have just as easily specified a status of DELETED or RENAME to be notified when the file was deleted or renamed.

The persistent value specifies whether the notification should fire more than once. By default, all event subscriptions are transient. This is because the events themselves are transient. An event occurs, then it is over. There are, however, many situations where events could occur over and over again. Permission events, changes to metadata and data, application registrations on a system, job submissions to a system or queue, etc., all are transient events that can potentially occur many, many times. In these cases it is either not possible or highly undesirable to constantly resubscribe for the same event. The persistent attribute tells the notification service to keep a subscription alive until it is explicitly deleted.

Continuing to work through the example, the url value specifies where the notification should be sent. In this example, our example user specified that she would like to be notified via email. Agave supports both email and webhook notifications. If you are unfamiliar with webhooks, take a moment to glance at the webhooks.org page for a brief overview. If you are a Gang of Four disciple, webhooks are a mechanism for implementing the Observer Pattern. Webhooks are widely used across the web and chances are that something you’re using right now is leveraging them.

URL Macros

Receive a callback when a new user is created that includes the new user’s information

https://example.com/sendWelcome.php?username=${USERNAME}&email=${EMAIL}&firstName=${FIRST_NAME}&lastName=${LAST_NAME}&src=agaveapi.co&nonce=1234567

Receive self-describing job status updates

http://example.com/job/${JOB_ID}?status=${STATUS}&lastUpdated=${LAST_UPDATED}

Get notified on all jobs going into and out of queues

http://example.com/system/${EXECUTION_SYSTEM}/queue/${QUEUE}?action=add
http://example.com/system/${EXECUTION_SYSTEM}/queue/${QUEUE}?action=subtract

Rerun an analysis when a files finishes staging

https://$AGAVE_BASE_URL/jobs/v2/a32487q98wasdfa9-09090b0b-007?action=resubmit

Use plus mailing to route job notifications to different folders

nryan+${EXECUTION_SYSTEM}+${JOB_ID}@gmail.com

In the context of Agave, a webhook is a URL to which Agave will send a POST request when that event occurs. A webhook can be any web accessible URL. While you cannot customize the POST content that Agave sends (it is unique to the event), you can take advantage of the many template variables that Agave provides to customize the URL at run time. The following tables show the webhook template variables available for each resource. Use the select box to view the macros for different resources.

Variable Description
UUID The UUID of the app.
EVENT The event which occurred
APP_ID The application id (ex. sabermetrics-2.1)

The value of webhook template variables is that they allow you to build custom callbacks using the values of the resource variable at run time. Several commonly used webhooks are shown in the tables above.

Creating

Create a new notification subscription

curl -sk -H "Authorization: Bearer $ACCESS_TOKEN" -X POST \
    -H "Content-Type: application/json" \
    --data-binary '{"associatedUuid": "7554973644402463206-242ac114-0001-007", "event": "FINISHED", "url": "http://requestbin.agaveapi.co/zyiomxzy?path=${PATH}&system=${SYSTEM}&event=${EVENT}" }' \
    https://public.tenants.agaveapi.co/notifications/v2?pretty=true
notifications-addupdate -F notification.json

Which will result in output similar to this

{
  "id": "7612526206168863206-242ac114-0001-011",
  "owner": "nryan",
  "url": "http://requestbin.agaveapi.co/zyiomxzy?path=${PATH}&system=${SYSTEM}&event=${EVENT}",
  "associatedUuid": "7554973644402463206-242ac114-0001-007",
  "event": "FINISHED",
  "responseCode": null,
  "attempts": 0,
  "lastSent": null,
  "success": false,
  "persistent": false,
  "status": "ACTIVE",
  "lastUpdated": "2016-08-24T10:07:03.000-05:00",
  "created": "2016-08-24T10:07:03.000-05:00",
  "policy": {
    "retryLimit": 5,
    "retryRate": 5,
    "retryDelay": 0,
    "saveOnFailure": true,
    "retryStrategy": "NONE"
  },
  "_links": {
    "self": {
      "href": "https://public.agaveapi.co/notifications/v2/7612526206168863206-242ac114-0001-011"
    },
    "history": {
      "href": "https://public.agaveapi.co/notifications/v2/7612526206168863206-242ac114-0001-011/history"
    },
    "attempts": {
      "href": "https://public.agaveapi.co/notifications/v2/7612526206168863206-242ac114-0001-011/attempts"
    },
    "owner": {
      "href": "https://public.agaveapi.co/profiles/v2/nryan"
    },
    "job": {
      "href": "https://public.agaveapi.co/jobs/v2/7554973644402463206-242ac114-0001-007"
    }
  }
}

Subscribing to an event is done by posting a form or JSON object to the Notifications service. An example of doing this using curl as well as the CLI is given below.

Updating

The updated notification subscription object

{
    "associatedUuid": "7554973644402463206-242ac114-0001-007",
    "event": "*",
    "url": "http://requestbin.agaveapi.co/zyiomxzy?path=${PATH}&system=${SYSTEM}&event=${EVENT}"
}

The JSON used to update the subscription is shown above

Updating a subscription is done identically to creation except that the form or JSON is POSTed to the existing subscription URL. An example of doing this using curl as well as the CLI is given below.

curl -sk -H "Authorization: Bearer $ACCESS_TOKEN" -X POST \
    -H "Content-Type: application/json" \
    -F "fileToUpload=@notification.json" \
    https://public.tenants.agaveapi.co/notifications/v2/2699130208276770330-242ac114-0001-011
notifications-addupdate -F notification.json 2699130208276770330-242ac114-0001-011

Which will result in output similar to this

{
  "id": "7612526206168863206-242ac114-0001-011",
  "owner": "nryan",
  "url": "http://requestbin.agaveapi.co/zyiomxzy?path=${PATH}&system=${SYSTEM}&event=${EVENT}",
  "associatedUuid": "7554973644402463206-242ac114-0001-007",
  "event": "*",
  "responseCode": null,
  "attempts": 0,
  "lastSent": null,
  "success": false,
  "persistent": false,
  "status": "ACTIVE",
  "lastUpdated": "2016-08-24T10:07:03.000-05:00",
  "created": "2016-08-24T10:07:03.000-05:00",
  "policy": {
    "retryLimit": 5,
    "retryRate": 5,
    "retryDelay": 0,
    "saveOnFailure": true,
    "retryStrategy": "NONE"
  },
  "_links": {
    "self": {
      "href": "https://public.agaveapi.co/notifications/v2/7612526206168863206-242ac114-0001-011"
    },
    "history": {
      "href": "https://public.agaveapi.co/notifications/v2/7612526206168863206-242ac114-0001-011/history"
    },
    "attempts": {
      "href": "https://public.agaveapi.co/notifications/v2/7612526206168863206-242ac114-0001-011/attempts"
    },
    "owner": {
      "href": "https://public.agaveapi.co/profiles/v2/nryan"
    },
    "job": {
      "href": "https://public.agaveapi.co/jobs/v2/7554973644402463206-242ac114-0001-007"
    }
  }
}

Listing

Listing notification subscriptions

curl -sk -H "Authorization: Bearer $ACCESS_TOKEN" \
    https://public.tenants.agaveapi.co/notifications/v2/2699130208276770330-242ac114-0001-011
notifications-list -V

Which will result in output similar to this

[
  {
    "id": "7612526206168863206-242ac114-0001-011",
    "url": "http://requestbin.agaveapi.co/zyiomxzy?path=${PATH}&system=${SYSTEM}&event=${EVENT}",
    "associatedUuid": "7554973644402463206-242ac114-0001-007",
    "event": "*",
    "_links": {
      "self": {
        "href": "https://public.agaveapi.co/notifications/v2/7612526206168863206-242ac114-0001-011"
      },
      "profile": {
        "href": "https://public.agaveapi.co/profiles/v2/nryan"
      },
      "job": {
        "href": "https://public.agaveapi.co/jobs/v2/7554973644402463206-242ac114-0001-007"
      }
    }
  },
  {
    "id": "7404907487080223206-242ac114-0001-011",
    "url": "nryan@rangers.texas.mlb.com",
    "associatedUuid": "6904887394479903206-242ac114-0001-007",
    "event": "FINISHED",
    "_links": {
      "self": {
        "href": "https://public.agaveapi.co/notifications/v2/7404907487080223206-242ac114-0001-011"
      },
      "profile": {
        "href": "https://public.agaveapi.co/profiles/v2/nryan"
      },
      "job": {
        "href": "https://public.agaveapi.co/jobs/v2/6904887394479903206-242ac114-0001-007"
      }
    }
  },
  {
    "id": "3676815741209931290-242ac114-0001-011",
    "url": "nryan@rangers.texas.mlb.com",
    "associatedUuid": "3717016635100491290-242ac114-0001-007",
    "event": "FINISHED",
    "_links": {
      "self": {
        "href": "https://public.agaveapi.co/notifications/v2/3676815741209931290-242ac114-0001-011"
      },
      "profile": {
        "href": "https://public.agaveapi.co/profiles/v2/nryan"
      },
      "job": {
        "href": "https://public.agaveapi.co/jobs/v2/3717016635100491290-242ac114-0001-007"
      }
    }
  }
]

You can get a list of your current notification subscriptions by performing a GET operation on the base /notifications collection. Adding the UUID of a notification will return just that notification. You can also query for all notifications assigned to a specific UUID by adding associatedUuid=$uuid. An example of querying all notifications using curl as well as the CLI is given below.

Unsubscribing

Unsubscribing from a notification subscription

curl -sk -H "Authorization: Bearer $ACCESS_TOKEN" \
     -X DELETE \
     https://public.tenants.agaveapi.co/notifications/v2/2699130208276770330-242ac114-0001-011
notifications-delete -V

An standard Agave response with an empty result will be returned.

To unsubscribe from an event, perform a DELETE on the notification URL. Once deleted, you can not restore a subscription. You can, however create a new one. Keep in mind that if you do this, the UUID of the new notification will be different that that of the deleted one. An example of deleting a notification using curl as well as the CLI is given below.

Retry Policies

Sample notification subscription object with custom retry policy.

{
  "url" : "$REQUEST_BIN?path=${PATH}&system=${SYSTEM}&event=${EVENT}",
  "event" : "*",
  "persistent": true,
  "policy": {
      "retryStrategy": "IMMEDIATE",
      "retryLimit": 20,
      "retryRate": 5,
      "retryDelay": 0,
      "saveOnFailure": true
    }
}

In some situations, Agave may be unable to publish a specific notification. When this happens, Agave will immediately retry the notification 5 times in an attempt to deliver it successfully. When delivery fails for a 5th time, the notification is abandoned. If your application requires a more tenacious or methodical approach to retry delivery, you may provide a notification policy.

Name Type Description
retryStrategy NONE, IMMEDIATE, DELAYED, EXPONENTIAL The retry strategy to employ. Default is IMMEDIATE
retryRate int; 0:86400 The frequency with which attempts should be made to deliver the message.
retryLimit int; 0:1440 The maximum attempts that should be made to delivery the message.
retryDelay int; 0:86400 The initial delay between the initial delivery attempt and the first retry.
saveOnFailure boolean Whether the failed message should be persisted if unable to be delivered within the retryLimit

Notification retry policies describe the strategy, frequency, delay, limit, and persistence to be applied when publishing an individual event for a given notification. The example above is our previous example with a notification policy included.

Failed deliveries

Query failed attempts for a specific notification

curl -sk -H "Authorization: Bearer $ACCESS_TOKEN" \
     https://$API_BASE_URL/notifications/$API_VERSION/229681451607921126-8e1831906a8e-0001-042"/attempts
notifications-list-failures 229681451607921126-8e1831906a8e-0001-042"

A list of notification attempts will be returned.

[
  {
    "id" : "229681451607921126-8e1831906a8e-0001-042",
    "url" : "https://httpbin.org/status/500",
    "event" : "SENT",
    "associatedUuid" : "5833036796741676570-b0b0b0bb0b-0001-011",
    "startTime" : "2016-06-19T22:21:02.266-05:00",
    "endTime" : "2016-06-19T22:21:03.268-05:00",
    "response" : {
      "code" : 500,
      "message" : ""
    },
    "_links" : {
      "self" : {
        "href" : "https://$API_BASE_URL/notifications/$API_VERSION/229123105859441126-8e1831906a8e-0001-011/attempts/229681451607921126-8e1831906a8e-0001-042"
      },
      "notification" : {
        "href" : "https://$API_BASE_URL/notifications/$API_VERSION/5833036796741676570-b0b0b0bb0b-0001-011"
      },
      "profile" : {
        "href" : "https://$API_BASE_URL/profiles/$API_VERSION/ipcservices"
      }
    }
  }
]

By providing a retry policy where saveOnFailure is true, failed messages will be persisted and made available for querying at a later time. This is a great way to handled missed work due to a server failure, maintenance downtime, etc. To query for for failed messages

Note: There is no way to save successful notification deliveries.

PostIts

 /$$$$$$$                       /$$    /$$$$$$ /$$
| $$__  $$                     | $$   |_  $$_/| $$
| $$  \ $$ /$$$$$$   /$$$$$$$ /$$$$$$   | $$ /$$$$$$
| $$$$$$$//$$__  $$ /$$_____/|_  $$_/   | $$|_  $$_/
| $$____/| $$  \ $$|  $$$$$$   | $$     | $$  | $$
| $$     | $$  | $$ \____  $$  | $$ /$$ | $$  | $$ /$$
| $$     |  $$$$$$/ /$$$$$$$/  |  $$$$//$$$$$$|  $$$$/
|__/      \______/ |_______/    \___/ |______/ \___/

The PostIts service is a URL shortening service similar to bit.ly, goo.gl, and t.co. It allows you to create pre-authenticated, disposable URLs to any resource in the Agave Platform. You have control over the lifetime and number of times the URL can be redeemed, and you can expire a PostIt at any time. As with all Science API resources, a full set of events is available for you to track usage and integrate the lifecycle of a PostIt into external applications as needed.

The most common use of PostIts is to create URLs to files and folders you can share with others without having to upload them to a third-party service. For example, using the PostIts service, you can share the output(s) of an experimental run, distribute materials for a class, submit data to a third-party service, and serve up assets for a static website like Agave ToGo.

Other uses cases for the PostIts service include creating “drop” folders to which anyone with the link can upload data, allowing a job to be reproducibly rerun for peer review, publishing metadata for public consumption, publishing a canonical reference to your user profile. The possibilities go on and on. Anytime you need to share your science with your world, PostIts can help you.

Creating PostIts

Creating a PostIt

curl -sk -H "Authorization: Bearer $AUTH_TOKEN" \
    -X POST \
    -d "lifetime=3600" \
    -d "maxUses=10" \
    -d "method=GET" \
    -d "url=https://public.agaveapi.co/files/v2/media/system/data.agaveapi.co/nryan/picksumipsum.txt" \
    'https://public.agaveapi.co/postits/v2/?pretty=true'
postits-create \
    -m 10 \
    -l 86400 \
    https://public.agaveapi.co/files/v2/media/system/data.agaveapi.co/nryan/picksumipsum.txt

Should result in something similar to the following:

{
  "creator":"nryan",
  "internalUsername":null,
  "authenticated":true,
  "created":"2016-09-30T21:51:31-05:00",
  "expires":"2016-10-01T00:14:51-05:00",
  "remainingUses":10,
  "postit":"f61256c53bf3744185de4ac6c0c839b4",
  "noauth":false,
  "url":"https://public.agaveapi.co/files/v2/media/system/data.agaveapi.co//home/nryan/picksumipsum.txt",
  "method":"GET",
  "_links":{
    "self":{
      "href":"https://public.agaveapi.co/postits/v2/f61256c53bf3744185de4ac6c0c839b4"
    },
    "profile":{
      "href":"https://public.agaveapi.co/profiles/v2/nryan"
    },
    "file":{
      "href":"https://public.agaveapi.co/files/v2/media/system/data.agaveapi.co//home/nryan/picksumipsum.txt"
    }
  }
}

To create a PostIt, send a POST request to the PostIts service with the target url you want to share. In this example, we are sharing a file we have in Agave’s cloud storage account.

In the response you see standard fields such as created timestamp and the postit token. You also see several fields that lead into the discussion of another aspect of PosIts, the ability to restrict usage and expire them on demand.

Restricting PostIt usage

When creating a PostIt, you have the ability to limit the lifespan, number of uses, and HTTP method used to connect to the target resource. The following table shows the fields available for this purpose. Not specifying any of these fields results in a single-use PostIt that remains valid for 1 calendar month.

Attribute Type Description
maxUses JSON object The maximum number of times the postit may be redeemed. Defaults to 1.
maxLifetime string The maximum lifetime in seconds over which the postit may be redeemed. Defaults to 1 month.
method GET,POST,PUT,DELETE The HTTP method to be used to request the target resource when redeeming a postit. Defaults to GET
noauth boolean Whether the request to the target resource should be authenticated. Defaults to true.

Listing Active PostIts

Listing active PostIts

curl -sk -H "Authorization: Bearer $AUTH_TOKEN" \
    'https://public.agaveapi.co/postits/v2/?pretty=true'
postits-list -v 

Should result in something similar to the following:

[
  {
    "creator":"nryan",
    "internalUsername":null,
    "authenticated":true,
    "created":"2016-09-30T21:51:31-05:00",
    "expires":"2016-10-01T00:14:51-05:00",
    "remainingUses":10,
    "postit":"f61256c53bf3744185de4ac6c0c839b4",
    "noauth":false,
    "url":"https://public.agaveapi.co/files/v2/media/system/data.agaveapi.co//home/nryan/picksumipsum.txt",
    "method":"GET",
    "_links":{
      "self":{
        "href":"https://public.agaveapi.co/postits/v2/f61256c53bf3744185de4ac6c0c839b4"
      },
      "profile":{
        "href":"https://public.agaveapi.co/profiles/v2/nryan"
      },
      "file":{
        "href":"https://public.agaveapi.co/files/v2/media/system/data.agaveapi.co//home/nryan/picksumipsum.txt"
      }
    }
  }
]

Redeeming PostIts

Redeeming a PostIt

curl -s -o picksumipsum.txt 'https://public.agaveapi.co/postits/v2/f61256c53bf3744185de4ac6c0c839b4'
curl -s -o picksumipsum.txt 'https://public.agaveapi.co/postits/v2/f61256c53bf3744185de4ac6c0c839b4'

Which would download the picksumipsum.txt file from your storage system.

You redeem a postit by making a non-authenticated HTTP request on the PostIt URL. In the above example, that would be https://public.agaveapi.co/postits/v2/ead227bace394790e56beb07e7c3ff4d. Every time you make a get request on the PostIt, the remainingUses field decrements by 1. This continues until the value hits 0 or the PostIt outlives its expires field.

Forcing PostIt browser downloads

If you are using PostIts in a browser environment, you can force a file download by adding force=true to the PostIt URL query. If the target URL is a file item, the name of the file item will be included in the Content-Disposition header so the downloaded file has the correct file name. You may also add the same query parameter to any target file item to force the Content-Disposition header from the Files API.

Expiring PostIts

Manually expiring a PostIt

curl -sk -H "Authorization: Bearer $AUTH_TOKEN" \
    -X DELETE
    'https://public.agaveapi.co/postits/v2/f61566c53bf3744185de4ac6c0c839b4?pretty=true'
postits-delete f61566c53bf3744185de4ac6c0c839b4

Which will result in an empty response from the server.

In addition to setting expiration parameters when you create a PostIt, you can manually expire a PostIt at any time by making an authenticated DELETE request on the PostIt URL. This will instantly expire the PostIt from further use and remove it from your listing results.

Metadata

 /$$      /$$             /$$
| $$$    /$$$            | $$
| $$$$  /$$$$  /$$$$$$  /$$$$$$    /$$$$$$
| $$ $$/$$ $$ /$$__  $$|_  $$_/   |____  $$
| $$  $$$| $$| $$$$$$$$  | $$      /$$$$$$$
| $$\  $ | $$| $$_____/  | $$ /$$ /$$__  $$
| $$ \/  | $$|  $$$$$$$  |  $$$$/|  $$$$$$$
|__/     |__/ \_______/   \___/   \_______/

The Agave Metadata service allows you to manage metadata and associate it with Agave entities via associated UUIDs. It supports JSON schema for structured JSON metadata; it also accepts any valid JSON-formatted metadata or plain text String when no schema is specified. As with other Agave services, a full access control layer is available, enabling you to keep your metadata private or share it with your colleagues.

Metadata Structure

Key-value metadata item

{
  "name": "some metadata",
  "value": "A model organism...",
}

Structured metadata item, metadata.json

{
  "name":"some metadata",
  "value":{
    "title":"Example Metadata",
    "properties":{
      "species":"arabidopsis",
      "description":"A model organism..."
    }
  }
}

Every metadata item has four fields shown in the following table.

Field name Type Description
name string; 1-256 required A non-unique key you can use to reference and group your metadata.
value json string; 0-5M
associationIds array; An JSON array of zero or more UUID to which this metadata item should be associated.
schemaId string; The id of a valid Agave metadata schema object representing the JSON Schema definition used to validate this metadata item.

The name field is just that, a user-defined name you give to your metadata item. There is no uniqueness constraint put on the name field, so it is up to you to the application to enforce whatever naming policy it sees fit.

Depending on your application needs, you may use the Metadata service as a key-value store, document store, or both. When using it as a key-value store, you provide text for the value field. When you fetching data, you could search by exact value or full-text search as needed.

When using the Metadata service as a document store, you provide a JSON object or array for the value field. In this use case you can leverage additional functionality such as structured queries, atomic updates, etc.

Either use case is acceptable and fully supported. Your application needs will determine the best approach for you to take.

Associations

Each metadata item also has an optional associationIds field. This field contains a JSON array of Agave UUID for which this metadata item applies. This provides a convenient grouping mechanism by which to organize logically-related resources. One common examples is creating a metadata item to represent a “data collection” and associating files and folders that may be geographically distributed under that “data collection”. Another is creating a metadata item to represent a “project”, then sharing the “project” with other users involved in the “project”.

Metadata items can also be associated with other metadata items to create hierarchical relationships. Building on the “project” example, additional metadata items could be created for “links”, “videos”, and “experiments” to hold references for categorized groups of postits, video file items, and jobs respectively. Such a model translates well to a user interface layer and eliminates a large amount of boilerplate code in your application.

Creating metadata

Create a new metadata item

curl -sk -H "Authorization: Bearer $ACCESS_TOKEN" -X POST  
    -H 'Content-Type: application/json'
    --data-binary '{"value": {"title": "Example Metadata", "properties": {"species": "arabidopsis", "description": "A model organism..."}}, "name": "some metadata"}'
    https://public.tenants.agaveapi.co/meta/v2/data
metadata-addupdate -v -F - <<<'{"value": {"title": "Example Metadata", "properties": {"species": "arabidopsis", "description": "A model organism..."}}, "name": "some metadata"}'

The response will look something like the following:

{
  "uuid": "7341557475441971686-242ac11f-0001-012",
  "owner": "nryan",
  "schemaId": null,
  "internalUsername": null,
  "associationIds": [],
  "lastUpdated": "2016-08-29T04:49:34.532-05:00",
  "name": "some metadata",
  "value": {
    "title": "Example Metadata",
    "properties": {
      "species": "arabidopsis",
      "description": "A model organism..."
    }
  },
  "created": "2016-08-29T04:49:34.532-05:00",
  "_links": {
    "self": {
      "href": "https://public.agaveapi.co/meta/v2/data/7341557475441971686-242ac11f-0001-012"
    },
    "permissions": {
      "href": "https://public.agaveapi.co/meta/v2/data/7341557475441971686-242ac11f-0001-012/pems"
    },
    "owner": {
      "href": "https://public.agaveapi.co/profiles/v2/nryan"
    },
  }
}

New Metadata are created in the repository via a POST to their collection URLs. As we mentioned before, there is no uniqueness constraint placed on metadata items. Thus, repeatedly POSTing the same metadata item to the service will create duplicate entries, each with their own unique UUID assigned by the service.

Updating metadata

Update a metadata item

curl -sk -H "Authorization: Bearer $ACCESS_TOKEN" -X POST
    -H 'Content-Type: application/json'
    --data-binary '{"value": {"title": "Example Metadata", "properties": {"species": "arabidopsis", "description": "A model plant organism..."}}, "name": "some metadata", "associationIds":["179338873096442342-242ac113-0001-002","6608339759546166810-242ac114-0001-007"]}'
    https://public.tenants.agaveapi.co/meta/v2/data/7341557475441971686-242ac11f-0001-012
metadata-addupdate -v -F - 7341557475441971686-242ac11f-0001-012 <<<'{"value": {"title": "Example Metadata", "properties": {"species": "arabidopsis", "description": "A model plant organism..."}}, "name": "some metadata", "associationIds":["179338873096442342-242ac113-0001-002","6608339759546166810-242ac114-0001-007"]}'

The response will look something like the following:

{
  "uuid": "7341557475441971686-242ac11f-0001-012",
  "schemaId": null,
  "internalUsername": null,
  "associationIds": [
    "179338873096442342-242ac113-0001-002",
    "6608339759546166810-242ac114-0001-007"
  ],
  "lastUpdated": "2016-08-29T05:51:39.908-05:00",
  "name": "some metadata",
  "value": {
    "title": "Example Metadata",
    "properties": {
      "species": "arabidopsis",
      "description": "A model plant organism..."
    }
  },
  "created": "2016-08-29T05:43:18.618-05:00",
  "owner": "nryan",
  "_links": {
    "self": {
      "href": "https://public.agaveapi.co/meta/v2/data/7341557475441971686-242ac11f-0001-012"
    },
    "permissions": {
      "href": "https://public.agaveapi.co/meta/v2/data/7341557475441971686-242ac11f-0001-012/pems"
    },
    "owner": {
      "href": "https://public.agaveapi.co/profiles/v2/nryan"
    },
    "associationIds": [
      {
        "rel": "179338873096442342-242ac113-0001-002",
        "href": "https://public.agaveapi.co/files/v2/media/system/storage.example.com//",
        "title": "file"
      },
      {
        "rel": "6608339759546166810-242ac114-0001-007",
        "href": "https://public.agaveapi.co/jobs/v2/6608339759546166810-242ac114-0001-007",
        "title": "job"
      }
    ]
  }
}

Updating metadata is done by POSTing an updated metadata object to the existing resource. When updating, it is important to note that it is not possible to change the metadata uuid, owner, lastUpdated or created fields. Those fields are managed by the service.

Deleting metadata

Delete a metadata item

curl -sk -H "Authorization: Bearer $ACCESS_TOKEN"
    -X DELETE
    https://public.tenants.agaveapi.co/meta/v2/data/7341557475441971686-242ac11f-0001-012
metadata-delete 7341557475441971686-242ac11f-0001-012

An empty response will be returned from the service.

To delete a metadata item, simply make a DELETE request on the metadata resource.

Metadata details

Fetching a metadata item

curl -sk -H "Authorization: Bearer $ACCESS_TOKEN"
    https://public.tenants.agaveapi.co/meta/v2/data/7341557475441971686-242ac11f-0001-012
metadata-list -v 7341557475441971686-242ac11f-0001-012

The response will look something like the following:

{
  "uuid":"7341557475441971686-242ac11f-0001-012",
  "schemaId":null,
  "internalUsername":null,
  "associationIds":[
    "179338873096442342-242ac113-0001-002",
    "6608339759546166810-242ac114-0001-007"
  ],
  "lastUpdated":"2016-08-29T05:51:39.908-05:00",
  "name":"some metadata",
  "value":{
    "title":"Example Metadata",
    "properties":{
      "species":"arabidopsis",
      "description":"A model plant organism..."
    }
  },
  "created":"2016-08-29T05:43:18.618-05:00",
  "owner":"nryan",
  "_links":{
    "self":{
      "href":"https://public.agaveapi.co/meta/v2/schemas/4736020169528054246-242ac11f-0001-013"
    },
    "permissions":{
      "href":"https://public.agaveapi.co/meta/v2/schemas/4736020169528054246-242ac11f-0001-013/pems"
    },
    "owner":{
      "href":"https://public.agaveapi.co/profiles/v2/nryan"
    },
    "associationIds":[
      {
        "rel":"179338873096442342-242ac113-0001-002",
        "href":"https://public.agaveapi.co/files/v2/media/system/storage.example.com//",
        "title":"file"
      },
      {
        "rel":"6608339759546166810-242ac114-0001-007",
        "href":"https://public.agaveapi.co/jobs/v2/6608339759546166810-242ac114-0001-007",
        "title":"job"
      }
    ]
  }
}

To fetch a detailed description of a metadata item, make a GET request on the resource URL. The response will be the full metadata item representation. Two points of interest in the example response are that the response does not have an id field. Instead, it has a uuid field which serves as its ID. This is the result of regression support for legacy consumers and will be changed in the next major release.

The second point of interest in the response is the _links.associationIds array in the hypermedia response. This contains an expanded representation of the associationIds field in the body. The objects in this array are similar to the information you would recieve by calling the UUID API to resolve each of the associationIds array values. By leveraging the information in the hypermedia response, you can save several round trips to resolve basic information about the resources the associationIds represent.

Metadata browsing

Listing your metadata

curl -sk -H "Authorization: Bearer $ACCESS_TOKEN"
    https://public.tenants.agaveapi.co/meta/v2/data?limit=1
metadata-list -v -l 1

The response will look something like the following:

[
  {
    "uuid": "7341557475441971686-242ac11f-0001-012",
    "schemaId": null,
    "internalUsername": null,
    "associationIds": [
      "179338873096442342-242ac113-0001-002",
      "6608339759546166810-242ac114-0001-007"
    ],
    "lastUpdated": "2016-08-29T05:51:39.908-05:00",
    "name": "some metadata",
    "value": {
      "title": "Example Metadata",
      "properties": {
        "species": "arabidopsis",
        "description": "A model plant organism..."
      }
    },
    "created": "2016-08-29T05:43:18.618-05:00",
    "owner": "nryan",
    "_links": {
      "self": {
        "href": "https://public.agaveapi.co/meta/v2/schemas/4736020169528054246-242ac11f-0001-013"
      },
      "permissions": {
        "href": "https://public.agaveapi.co/meta/v2/schemas/4736020169528054246-242ac11f-0001-013/pems"
      },
      "owner": {
        "href": "https://public.agaveapi.co/profiles/v2/nryan"
      },
      "associationIds": [
        {
          "rel": "179338873096442342-242ac113-0001-002",
          "href": "https://public.agaveapi.co/files/v2/media/system/storage.example.com//",
          "title": "file"
        },
        {
          "rel": "6608339759546166810-242ac114-0001-007",
          "href": "https://public.agaveapi.co/jobs/v2/6608339759546166810-242ac114-0001-007",
          "title": "job"
        }
      ]
    }
  }
]

To browse your Metadata, make a GET request against the /meta/v2/data collection. This will return all the metadata you created and to which you have been granted READ access. This includes any metadata items that have been shared with the public or world users. In practice, users will have many metadata items created and shared with them as part of normal use of the platform, so pagination and search become important aspects of interacting with the service.

For admins, who have implicit access to all metadata, the default listing response will be a paginated list of every metadata item in the tenant. To avoid such a scenario, admin users can append privileged=false to bypass implicit permissions and only return the metadata queries to which they have ownership or been granted explicit access.

Metadata Validation

curl -sk -H "Authorization: Bearer $ACCESS_TOKEN" -X POST
    -H 'Content-Type: application/json'
    --data-binary '{"schemaId": "4736020169528054246-242ac11f-0001-013", "value": {"title": "Example Metadata", "properties": {"description": "A model organism..."}}, "name": "some metadata"}'
    https://public.tenants.agaveapi.co/meta/v2/data
metadata-addupdate -v -F - <<<'{"schemaId": "4736020169528054246-242ac11f-0001-013", "value": {"title": "Example Metadata", "properties": {"description": "A model organism..."}}, "name": "some metadata"}'

The response will look something like the following:

{
  "status" : "error",
  "message" : "Metadata value does not conform to schema.",
  "version" : "2.1.8-r8bb7e86"
}

Often times it is necessary to validate metadata for format or simple quality control. The Metadata service is capable of validating the value of a metadata item against a predefined JSON Schema definition. In order to leverage this feature, you must first register your JSON Schema definition with the Metadata Schemata service, then reference the UUID of that metadata schema resource in the schemaId field.

Given our previous example metadata schema object, the following request would fail due to a missing “species” value in the metadata item value field.

Metadata Searching

Searching metadata for all items with name like “mustard plant”

curl -sk -H "Authorization: Bearer $ACCESS_TOKEN"
    --data-urlencode '{"name": "mustard plant"}'
    https://public.tenants.agaveapi.co/meta/v2/data
metadata-list -v -Q '{"name":"mustard+plant"}'

The response will look something like the following:

[
  {
    "uuid": "7341557475441971686-242ac11f-0001-012",
    "schemaId": null,
    "internalUsername": null,
    "associationIds": [
      "179338873096442342-242ac113-0001-002",
      "6608339759546166810-242ac114-0001-007"
    ],
    "lastUpdated": "2016-08-29T05:51:39.908-05:00",
    "name": "some metadata",
    "value": {
      "title": "Example Metadata",
      "properties": {
        "species": "arabidopsis",
        "description": "A model plant organism..."
      }
    },
    "created": "2016-08-29T05:43:18.618-05:00",
    "owner": "nryan",
    "_links": {
      "self": {
        "href": "https://public.agaveapi.co/meta/v2/schemas/4736020169528054246-242ac11f-0001-013"
      },
      "permissions": {
        "href": "https://public.agaveapi.co/meta/v2/schemas/4736020169528054246-242ac11f-0001-013/pems"
      },
      "owner": {
        "href": "https://public.agaveapi.co/profiles/v2/nryan"
      },
      "associationIds": [
        {
          "rel": "179338873096442342-242ac113-0001-002",
          "href": "https://public.agaveapi.co/files/v2/media/system/storage.example.com//",
          "title": "file"
        },
        {
          "rel": "6608339759546166810-242ac114-0001-007",
          "href": "https://public.agaveapi.co/jobs/v2/6608339759546166810-242ac114-0001-007",
          "title": "job"
        }
      ]
    }
  }
]

In addition to retrieving Metadata via its UUID, the Metadata service supports MongoDB query syntax. Just add the q=<value> to URL query portion of your GET request on the metadata collection. This differs from other APIs, but provides a richer syntax to query and filter responses.

If you wanted to look up Metadata corresponding to a specific value within its JSON Metadata value, you can specify this using a JSON object such as {"name": "mustard plant"}. Remember that, in order to send JSON in a URL query string, it must first be URL encoded. Luckily this is easily handled for us by curl and the Agave CLI.

The given query will return all metadata with name, “mustard plant” that you have permission to access.

Search Examples

metadata search by exact name

{"name": "mustard plant"}

metadata search by field in value

{"value.type": "a plant"}

metadata search for values with any field matching an item in the given array

{ "value.profile.status": { "$in": [ "active", "paused" ] } }

metadata search for items with a name matching a case-insensitive regex

{ "name": { "$regex": "^Cactus.*", "$options": "i"}}

metadata search for value by regex matched against each line of a value

{ "value.description": { "$regex": ".*monocots.*", "$options": "m"}}

metadata search for value by conditional queries

{
   "$or":[
      {
         "value.description":{
            "$regex":[
               ".*prickly pear.*",
               ".*agave.*",
               ".*century.*"
            ],
            "$options":"i"
         }
      },
      {
         "value.title":{
            "$regex":".*Cactus$"
         },
         "value.order":{
            "$regex":"Agavoideae"
         }
      }
   ]
}

Some common search syntax examples. Consult the MongoDB Query Documentation for more examples and full syntax documentation.

Metadata Permissions

The Metadata service supports permissions for both Metadata and Schemata consistent with that of a number of other Agave services. If no permissions are explicitly set, only the owner of the Metadata and tenant administrators can access it.

The permissions available for Metadata and Metadata Schemata are listed in the following table. Please note that a user must have WRITE permissions to grant or revoke permissions on a metadata or schema item.

Name Description
READ User can view the resource
WRITE User can edit, but not view the resource
READ_WRITE User can manage the resource
ALL User can manage the resource
NONE User can view the resource

Listing all permissions

List the permissions on Metadata for a given user

curl -sk -H "Authorization: Bearer $ACCESS_TOKEN"
    https://public.tenants.agaveapi.co/meta/v2/data/7341557475441971686-242ac11f-0001-012/pems/rclemens
metadata-pems-list -u rclemens \
    7341557475441971686-242ac11f-0001-012

The response will look something like the following:

[
  {
    "username": "nryan",
    "permission": {
      "read": true,
      "write": true
    },
    "_links": {
      "self": {
        "href": "https://public.agaveapi.co/meta/v2/7341557475441971686-242ac11f-0001-012/pems/nryan"
      },
      "parent": {
        "href": "https://public.agaveapi.co/meta/v2/7341557475441971686-242ac11f-0001-012"
      },
      "profile": {
        "href": "https://public.agaveapi.co/meta/v2/nryan"
      }
    }
  }
]

To list all permissions for a metadata item, make a GET request on the metadata item’s permission collection

List permissions for a specific user

List the permissions on Metadata for a given user

curl -sk -H "Authorization: Bearer $ACCESS_TOKEN"
    https://public.tenants.agaveapi.co/meta/v2/data/7341557475441971686-242ac11f-0001-012/pems/nryan
metadata-pems-list -u rclemens \
    7341557475441971686-242ac11f-0001-012

The response will look something like the following:

{
  "username":"nryan",
  "permission":{
    "read":true,
    "write":true
  },
  "_links":{
    "self":{
      "href":"https://public.agaveapi.co/meta/v2/7341557475441971686-242ac11f-0001-012/pems/nryan"
    },
    "parent":{
      "href":"https://public.agaveapi.co/meta/v2/7341557475441971686-242ac11f-0001-012"
    },
    "profile":{
      "href":"https://public.agaveapi.co/meta/v2/nryan"
    }
  }
}

Checking permissions for a single user is simply a matter of adding the username of the user in question to the end of the metadata permission collection.

Grant permissions

Grant read access to a metadata item

curl -sk -H "Authorization: Bearer $ACCESS_TOKEN" -X POST
    --data '{"permission":"READ"}'
    https://public.tenants.agaveapi.co/meta/v2/data/7341557475441971686-242ac11f-0001-012/pems/rclemens
metadata-pems-addupdate -u rclemens \
    -p READ 7341557475441971686-242ac11f-0001-012

Grant read and write access to a metadata item

curl -sk -H "Authorization: Bearer $ACCESS_TOKEN" -X POST
    --data '{"permission":"READ_WRITE"}'
    https://public.tenants.agaveapi.co/meta/v2/data/7341557475441971686-242ac11f-0001-012/pems/rclemens
metadata-pems-addupdate -u rclemens \
    -p READ_WRITE 7341557475441971686-242ac11f-0001-012

The response will look something like the following:

{
  "username": "rclemens",
  "permission": {
    "read": true,
    "write": true
  },
  "_links": {
    "self": {
      "href": "https://public.agaveapi.co/meta/v2/7341557475441971686-242ac11f-0001-012/pems/rclemens"
    },
    "parent": {
      "href": "https://public.agaveapi.co/meta/v2/7341557475441971686-242ac11f-0001-012"
    },
    "profile": {
      "href": "https://public.agaveapi.co/meta/v2/jstubbs"
    }
  }
}

To grant another user read access to your metadata item, assign them READ permission. To enable another user to update a metadata item, grant them READ_WRITE or ALL access.

Delete single user permissions

Delete permission for single user on a Metadata item

curl -sk -H "Authorization: Bearer $ACCESS_TOKEN"
    -X DELETE
    https://public.tenants.agaveapi.co/meta/v2/data/7341557475441971686-242ac11f-0001-012/pems/rclemens
metadata-pems-delete -u rclemens 7341557475441971686-242ac11f-0001-012

An empty response will come back from the API.

Permissions may be deleted for a single user by making a DELETE request on the metadata user permission resource. This will immediately revoke all permissions to the metadata item for that user.

Deleting all permissions

Delete all permissions on a Metadata item

curl -sk -H "Authorization: Bearer $ACCESS_TOKEN"
    -X DELETE
    https://public.tenants.agaveapi.co/meta/v2/data/7341557475441971686-242ac11f-0001-012/pems
metadata-pems-delete 7341557475441971686-242ac11f-0001-012

An empty response will be returned from the service.

Permissions may be deleted for a single user by making a DELETE request on the metadata resource permission collection.

Metadata Schemata

  /$$$$$$          /$$
 /$$__  $$        | $$
| $$  \__/ /$$$$$$| $$$$$$$  /$$$$$$ /$$$$$$/$$$$  /$$$$$$
|  $$$$$$ /$$_____| $$__  $$/$$__  $| $$_  $$_  $$|____  $$
 \____  $| $$     | $$  \ $| $$$$$$$| $$ \ $$ \ $$ /$$$$$$$
 /$$  \ $| $$     | $$  | $| $$_____| $$ | $$ | $$/$$__  $$
|  $$$$$$|  $$$$$$| $$  | $|  $$$$$$| $$ | $$ | $|  $$$$$$$
 \______/ \_______|__/  |__/\_______|__/ |__/ |__/\_______/

Schema can be provided in JSON Schema form. The service will validate that the schema is valid JSON and store it. To validate Metadata against it, the schema UUID should be given as a parameter, schemaId, when uploading Metadata. If no schemaId` is provided, the Metadata service will accept any JSON Object or plain text string and store it accordingly. This flexible approach allows Agave a high degree of flexibility in handling structured and unstructured metadata alike.

For more on JSON Schema please see http://json-schema.org/

To add a metadata schema to the repository:

Creating schemata

Example JSON Schema document, schema.json

{
  "title": "Example Schema",
  "type": "object",
  "properties": {
    "species": {
      "type": "string"
    }
  },
  "required": [
    "species"
  ]
}

Creating a new metadata schema

curl -sk -H "Authorization: Bearer $ACCESS_TOKEN"
    -X POST -H "Content-Type: application/json"
    --data-binary '{ "title": "Example Schema", "type": "object", "properties": { "species": { "type": "string" } },"required": ["species"] }'
    https://public.tenants.agaveapi.co/meta/v2/schemas/
metadata-schema-addupdate -v -F schema.json

The response will look something like the following:

{
  "uuid": "4736020169528054246-242ac11f-0001-013",
  "internalUsername": null,
  "lastUpdated": "2016-08-29T04:52:11.474-05:00",
  "schema": {
    "title": "Example Schema",
    "type": "object",
    "properties": {
      "species": {
        "type": "string"
      }
    },
    "required": [
      "species"
    ]
  },
  "created": "2016-08-29T04:52:11.474-05:00",
  "owner": "nryan",
  "_links": {
    "self": {
      "href": "https://public.agaveapi.co/meta/v2/schemas/4736020169528054246-242ac11f-0001-013"
    },
    "permissions": {
      "href": "https://public.agaveapi.co/meta/v2/schemas/4736020169528054246-242ac11f-0001-013/pems"
    },
    "owner": {
      "href": "https://public.agaveapi.co/profiles/v2/nryan"
    }
  }
}

To create a new metadata schema that can be used to validate metadata items upon addition or updating, POST a JSON Schema document to the service.

More JSON Schema examples can be found in the Agave Samples project.

Updating schema

Update a metadata schema

curl -sk -H "Authorization: Bearer $ACCESS_TOKEN" -X POST
    -H 'Content-Type: application/json'
    --data-binary '{ "title": "Example Schema", "type": "object", "properties": { "species": { "type": "string" }, "description": {"type":"string"} },"required": ["species"] }'
    https://public.tenants.agaveapi.co/meta/v2/data/4736020169528054246-242ac11f-0001-013
metadata-addupdate -v -F - 4736020169528054246-242ac11f-0001-013 <<< '{ "title": "Example Schema", "type": "object", "properties": { "species": { "type": "string" }, "description": {"type":"string"} },"required": ["species"] }'

The response will look something like the following:

{
  "uuid": "4736020169528054246-242ac11f-0001-013",
  "internalUsername": null,
  "lastUpdated": "2016-08-29T04:52:11.474-05:00",
  "schema": {
    "title": "Example Schema",
    "type": "object",
    "properties": {
      "species": {
        "type": "string"
      }
    },
    "required": [
      "species"
    ]
  },
  "created": "2016-08-29T04:52:11.474-05:00",
  "owner": "nryan",
  "_links": {
    "self": {
      "href": "https://public.agaveapi.co/meta/v2/schemas/4736020169528054246-242ac11f-0001-013"
    },
    "permissions": {
      "href": "https://public.agaveapi.co/meta/v2/schemas/4736020169528054246-242ac11f-0001-013/pems"
    },
    "owner": {
      "href": "https://public.agaveapi.co/profiles/v2/nryan"
    }
  }
}

Updating metadata schema is done by POSTing an updated schema object to the existing resource. When updating, it is important to note that it is not possible to change the schema uuid, owner, lastUpdated or created fields. Those fields are managed by the service.

Deleting schema

Delete a metadata schema

curl -sk -H "Authorization: Bearer $ACCESS_TOKEN"
    -X DELETE
    https://public.tenants.agaveapi.co/meta/v2/data/4736020169528054246-242ac11f-0001-013
metadata-schema-delete 4736020169528054246-242ac11f-0001-013

An empty response will be returned from the service.

To delete a metadata schema, simply make a DELETE request on the metadata schema resource.

Specifying schemata as $ref

When building new JSON Schema definitions, it is often helpful to break each object out into its own definition and use $ref fields to reference them. The metadata service supports such references between metadata schema resources. Simply provide the fully qualified URL of another valid metadata schema resources as the value to a $ref field and Agave will resolve the reference internally, applying the appropriate authentication and authorization for the requesting user to the request to the referenced resource.

Monitors

 /$$      /$$                  /$$  /$$
| $$$    /$$$                 |__/ | $$
| $$$$  /$$$$ /$$$$$$ /$$$$$$$ /$$/$$$$$$   /$$$$$$  /$$$$$$
| $$ $$/$$ $$/$$__  $| $$__  $| $|_  $$_/  /$$__  $$/$$__  $$
| $$  $$$| $| $$  \ $| $$  \ $| $$ | $$   | $$  \ $| $$  \__/
| $$\  $ | $| $$  | $| $$  | $| $$ | $$ /$| $$  | $| $$
| $$ \/  | $|  $$$$$$| $$  | $| $$ |  $$$$|  $$$$$$| $$
|__/     |__/\______/|__/  |__|__/  \___/  \______/|__/

The Agave Monitors API provides a familiar paradigm for monitoring the use ability and accessibility of storage and execution systems you registered with Agave. Similar to services like Pingdom, Pagerduty, and WebCron, the Monitors API allows you to to create regular health checks on a registered system. Unlike standard uptime services, Agave will check that your system is responsive and accessible by performing proactive tests on availability (ping), accessibility (authentication), and functionality (listing or echo). Each check result is persisted and the check history of a given monitor is queryable through the API. As with all resources in the Agave Platform, a full event model is available so you can subscribe to event you care about such as failed checks, restored system availability, and system disablement. 

Creating Monitors

Create a new default monitor

curl -sk -H "Authorization: Bearer $AUTH_TOKEN" \
     -H "Content-Type: application-json" \
     -X POST --data-binary '{"target": "storage.example.com"}' \
     https://public.agaveapi.co/monitors/v2/
monitors-addupdate -S storage.example.com  

The response will look something like the following:

{
    "active": true,
    "created": "2016-06-03T17:22:59.000-05:00",
    "frequency": 60,
    "id": "5024717285821443610-242ac11f-0001-014",
    "internalUsername": null,
    "lastCheck": null,
    "lastSuccess": null,
    "lastUpdated": "2016-06-03T17:22:59.000-05:00",
    "nextUpdate": "2016-06-03T18:22:59.000-05:00",
    "owner": "nryan",
    "target": "storage.example.com",
    "updateSystemStatus": false,
    "_links": {
        "checks": {
            "href": "https://public.agaveapi.co/monitor/v2/5024717285821443610-242ac11f-0001-014/checks"
        },
        "notifications": {
            "href": "https://public.agaveapi.co/notifications/v2/?associatedUuid=5024717285821443610-242ac11f-0001-014"
        },
        "owner": {
            "href": "https://public.agaveapi.co/profiles/v2/nryan"
        },
        "self": {
            "href": "https://public.agaveapi.co/monitor/v2/5024717285821443610-242ac11f-0001-014"
        },
        "system": {
            "href": "https://public.agaveapi.co/systems/v2/storage.example.com"
        }
    }
}

The only piece of information needed to monitor a system is the system ID. Sending a POST request to the Monitors API with a monitor definition containing just the systemId field with a valid system ID or UUID will create a monitor that will run hourly health checks starting an hour from when you sent the request.  

Custom frequency and start time

Create a monitor with a custom frequency

curl -sk -H "Authorization: Bearer $AUTH_TOKEN" \
     -H "Content-Type: application-json" \
     -X POST --data-binary '{"target": "storage.example.com","frequency":15}' \
     https://public.agaveapi.co/monitors/v2/
monitors-addupdate -S storage.example.com -I 15

The response will look something like the following:

{
    "_links": {
        "checks": {
            "href": "https://public.agaveapi.co/monitor/v2/5024717285821443610-242ac11f-0001-014/checks"
        },
        "notifications": {
            "href": "https://public.agaveapi.co/notifications/v2/?associatedUuid=5024717285821443610-242ac11f-0001-014"
        },
        "owner": {
            "href": "https://public.agaveapi.co/profiles/v2/nryan"
        },
        "self": {
            "href": "https://public.agaveapi.co/monitor/v2/5024717285821443610-242ac11f-0001-014"
        },
        "system": {
            "href": "https://public.agaveapi.co/systems/v2/storage.example.com"
        }
    },
    "active": true,
    "created": "2016-06-03T17:22:59.000-05:00",
    "frequency": 15,
    "id": "5024717285821443610-242ac11f-0001-014",
    "internalUsername": null,
    "lastCheck": null,
    "lastSuccess": null,
    "lastUpdated": "2016-06-03T17:22:59.000-05:00",
    "nextUpdate": "2016-06-03T17:37:59.000-05:00",
    "owner": "nryan",
    "target": "storage.example.com",
    "updateSystemStatus": false
}

If you need the monitor to run more frequently, you can customize the frequency and time at which a monitor runs by including the interval and startTime fields in your monitor definition. By providing a time expression in the interval field, you can control the frequency at which a monitor runs. The maximum interval you can set for a monitor is one month. The minimum interval varies from tenant to tenant, but is generally no less than 5 minutes. 

The startTime field allows you to schedule when you would like Agave to start the monitor on your system. Any date or time expression representing a moment between the current time and one month from then is acceptable. If you do not specify a value for startTime, Agave will add the value of interval to the current time and use that as the startTIme. Setting stop times or “off hours” is not currently supported.

Automating system status updates

Create a monitor that updates system status on change

curl -sk -H "Authorization: Bearer $AUTH_TOKEN" \
     -H "Content-Type: application-json" \
     -X POST \
     --data-binary '{"target": "storage.example.com","frequency":15,"updateSystemStatus"=true}' \
     https://public.agaveapi.co/monitors/v2/
monitors-addupdate -S storage.example.com -I 15 -U true

The response will look something like the following:

{
    "active": true,
    "created": "2016-06-03T17:22:59.000-05:00",
    "frequency": 15,
    "id": "5024717285821443610-242ac11f-0001-014",
    "internalUsername": null,
    "lastCheck": null,
    "lastSuccess": null,
    "lastUpdated": "2016-06-03T17:22:59.000-05:00",
    "nextUpdate": "2016-06-03T17:37:59.000-05:00",
    "owner": "nryan",
    "target": "storage.example.com",
    "updateSystemStatus": true,
    "_links": {
        "checks": {
            "href": "https://public.agaveapi.co/monitor/v2/5024717285821443610-242ac11f-0001-014/checks"
        },
        "notifications": {
            "href": "https://public.agaveapi.co/notifications/v2/?associatedUuid=5024717285821443610-242ac11f-0001-014"
        },
        "owner": {
            "href": "https://public.agaveapi.co/profiles/v2/nryan"
        },
        "self": {
            "href": "https://public.agaveapi.co/monitor/v2/5024717285821443610-242ac11f-0001-014"
        },
        "system": {
            "href": "https://public.agaveapi.co/systems/v2/storage.example.com"
        }
    }
}

In the section on Events and notifications, we cover the ways in which you can get alerted about events pertaining to a monitor. Here we will simply point out that a convenience field, updateStatus, is built into all monitors. Setting this field to true will authorize Agave to update the status of the monitored system based on the result of the monitor checks. This is a convenient way to ensure that the status value in your system description matches the actual operational status of the system.

Updating an existing monitor

Update an existing monitor

curl -sk -H "Authorization: Bearer $AUTH_TOKEN" \
     -H "Content-Type: application-json" \
     -X POST \
     --data-binary '{"target": "storage.example.com","frequency":5,"updateSystemStatus"=false}' \
     https://public.agaveapi.co/monitors/v2/5024717285821443610-242ac11f-0001-014
monitors-addupdate -S storage.example.com -I 5 -U false 5024717285821443610-242ac11f-0001-014

The response will look something like the following:

{
    "active": true,
    "created": "2016-06-03T17:22:59.000-05:00",
    "frequency": 15,
    "id": "5024717285821443610-242ac11f-0001-014",
    "internalUsername": null,
    "lastCheck": null,
    "lastSuccess": null,
    "lastUpdated": "2016-06-03T17:24:59.000-05:00",
    "nextUpdate": "2016-06-03T17:29:59.000-05:00",
    "owner": "nryan",
    "target": "storage.example.com",
    "updateSystemStatus": false,
    "_links": {
        "checks": {
            "href": "https://public.agaveapi.co/monitor/v2/5024717285821443610-242ac11f-0001-014/checks"
        },
        "notifications": {
            "href": "https://public.agaveapi.co/notifications/v2/?associatedUuid=5024717285821443610-242ac11f-0001-014"
        },
        "owner": {
            "href": "https://public.agaveapi.co/profiles/v2/nryan"
        },
        "self": {
            "href": "https://public.agaveapi.co/monitor/v2/5024717285821443610-242ac11f-0001-014"
        },
        "system": {
            "href": "https://public.agaveapi.co/systems/v2/storage.example.com"
        }
    }
}

Monitors can be managed by making traditional GET, POST, and DELETE operations. When updating a monitor, pay attention to the response because the time of the next check will change. In fact, any change to a monitor will recalculate the time when the next health check will run. 

Disabling an existing monitor

Disable an existing monitor

curl -sk -H "Authorization: Bearer $AUTH_TOKEN"
    -H "Content-Type: application/json"
    -X PUT --data-binary '{"action": "disable"}'
    https://public.agaveapi.co/monitors/v2/5024717285821443610-242ac11f-0001-014
monitors-disable 5024717285821443610-242ac11f-0001-014

The response will look something like the following:

{
    "active": false,
    "created": "2016-06-03T17:22:59.000-05:00",
    "frequency": 15,
    "id": "5024717285821443610-242ac11f-0001-014",
    "internalUsername": null,
    "lastCheck": null,
    "lastSuccess": null,
    "lastUpdated": "2016-06-03T17:24:59.000-05:00",
    "nextUpdate": "2016-06-03T17:29:59.000-05:00",
    "owner": "nryan",
    "target": "storage.example.com",
    "updateSystemStatus": false,
    "_links": {
        "checks": {
            "href": "https://public.agaveapi.co/monitor/v2/5024717285821443610-242ac11f-0001-014/checks"
        },
        "notifications": {
            "href": "https://public.agaveapi.co/notifications/v2/?associatedUuid=5024717285821443610-242ac11f-0001-014"
        },
        "owner": {
            "href": "https://public.agaveapi.co/profiles/v2/nryan"
        },
        "self": {
            "href": "https://public.agaveapi.co/monitor/v2/5024717285821443610-242ac11f-0001-014"
        },
        "system": {
            "href": "https://public.agaveapi.co/systems/v2/storage.example.com"
        }
    }
}

There may be times when you need to pause a monitor. If your system has scheduled maintenance periods, you may want to disable the monitor until the maintenance period ends. You can do this by making a PUT request on a monitor with the a field name action set to “disabled”. While disabled, all health checks will be skipped. 

Enabling an existing monitor

Enable an existing monitor

curl -sk -H "Authorization: Bearer $AUTH_TOKEN"
    -H "Content-Type: application/json"
    -X PUT --data-binary '{"action": "enable"}'
    https://public.agaveapi.co/monitors/v2/5024717285821443610-242ac11f-0001-014
monitors-enable 5024717285821443610-242ac11f-0001-014
{
    "active": true,
    "created": "2016-06-03T17:22:59.000-05:00",
    "frequency": 15,
    "id": "5024717285821443610-242ac11f-0001-014",
    "internalUsername": null,
    "lastCheck": null,
    "lastSuccess": null,
    "lastUpdated": "2016-06-03T17:24:59.000-05:00",
    "nextUpdate": "2016-06-03T17:29:59.000-05:00",
    "owner": "nryan",
    "target": "storage.example.com",
    "updateSystemStatus": false,
    "_links": {
        "checks": {
            "href": "https://public.agaveapi.co/monitor/v2/5024717285821443610-242ac11f-0001-014/checks"
        },
        "notifications": {
            "href": "https://public.agaveapi.co/notifications/v2/?associatedUuid=5024717285821443610-242ac11f-0001-014"
        },
        "owner": {
            "href": "https://public.agaveapi.co/profiles/v2/nryan"
        },
        "self": {
            "href": "https://public.agaveapi.co/monitor/v2/5024717285821443610-242ac11f-0001-014"
        },
        "system": {
            "href": "https://public.agaveapi.co/systems/v2/storage.example.com"
        }
    }
}

Similarly, to enable a monitor, make a PUT request with the a field name action set to “enabled”. Once reenabled, the monitor will resume its previous check schedule as specified in the nextUpdate field, or immediately if that time has already expired.

Deleting a monitor

Deleting an existing monitor

curl -sk -H "Authorization: Bearer $AUTH_TOKEN"
    -H "Content-Type: application/json"
    -X DELETE
    https://public.agaveapi.co/monitors/v2/5024717285821443610-242ac11f-0001-014
monitors-delete 5024717285821443610-242ac11f-0001-014

An empty response will be returned

To delete a monitor, simply make a DELETE request on the monitor.

Monitor Checks

Listing past monitor checks

curl -sk -H "Authorization: Bearer $AUTH_TOKEN"
    'https://public.agaveapi.co/monitors/v2/5024717285821443610-242ac11f-0001-014/checks?limit=1'
monitors-checks-list -v -l 1
    -M 5024717285821443610-242ac11f-0001-014

The response will look something like the following:

[
    {
        "created": "2016-06-03T17:29:59.000-05:00",
        "id": "4035070921477123610-242ac11f-0001-015",
        "message": null,
        "result": "PASSED",
        "type": "STORAGE",
        "_links": {
            "monitor": {
                "href": "https://public.agaveapi.co/monitor/v2/5024717285821443610-242ac11f-0001-014"
            },
            "self": {
                "href": "https://public.agaveapi.co/monitor/v2/5024717285821443610-242ac11f-0001-014/checks/4035070921477123610-242ac11f-0001-015"
            },
            "system": {
                "href": "https://public.agaveapi.co/systems/v2/storage.example.com"
            }
        }
    }
]

Each instance of a monitor testing a system is called a Check. Monitor Checks are persisted over time and query able as a collection of a monitor resource. Monitor checks can be queried by result, timeframe, and type. By default, the last check is injected into a monitor description as the lastCheck field. 

Each monitor check has a unique ID and represents a formal, addressable resource in the API. Here we see a typical successful monitor check. Checks will have one of two states: PASSED or FAILED. Successful monitors have a status of PASSED and no message. Unsuccessful monitors have a status of FAILED and a message describing why they failed.

Searching check history

Searching check history for a monitor

curl -sk -H "Authorization: Bearer $AUTH_TOKEN" \   
    'https://public.agaveapi.co/monitors/v2/5024717285821443610-242ac11f-0001-014/checks?limit=1&result.eq=PASSED'
monitors-checks-search -v -l 1 \
    -M 5024717285821443610-242ac11f-0001-014 \
    result.eq=PASSED

The response will look something like the following:

[
    {
        "created": "2016-06-03T17:29:59.000-05:00",
        "id": "4035070921477123610-242ac11f-0001-015",
        "message": null,
        "result": "PASSED",
        "type": "STORAGE",
        "_links": {
            "monitor": {
                "href": "https://public.agaveapi.co/monitor/v2/5024717285821443610-242ac11f-0001-014"
            },
            "self": {
                "href": "https://public.agaveapi.co/monitor/v2/5024717285821443610-242ac11f-0001-014/checks/4035070921477123610-242ac11f-0001-015"
            },
            "system": {
                "href": "https://public.agaveapi.co/systems/v2/storage.example.com"
            }
        }
    }
]

Long-running monitor checks can build up a large history which can become prohibitive to page through. When generating graphs and looking for specific incidents, you can search for specific checks based on result, startTime, endTime, type, and id. The standard JSON SQL search syntax used across the rest of the Science APIs is supported for monitor checks as well.

Manually running a check

Forcing a monitor check to run

curl -sk -H "Authorization: Bearer $AUTH_TOKEN" \
     -H "Content-Type: application-json" \
     -X POST --data-binary '{}' \
    https://public.agaveapi.co/monitors/v2/5024717285821443610-242ac11f-0001-014/checks
monitors-fire -v 5024717285821443610-242ac11f-0001-014  

The response will look something like the following:

{
    "created": "2016-06-10T11:30:58.920-05:00",
    "id": "5314048891498786330-242ac11f-0001-015",
    "message": null,
    "result": "PASSED",
    "type": "STORAGE",
    "_links": {
        "monitor": {
            "href": "https://public.agaveapi.co/monitor/v2/5024717285821443610-242ac11f-0001-014"
        },
        "self": {
            "href": "https://public.agaveapi.co/monitor/v2/5024717285821443610-242ac11f-0001-014/checks/5314048891498786330-242ac11f-0001-015"
        },
        "system": {
            "href": "https://public.agaveapi.co/systems/v2/storage.example.com"
        }
    }
}

If you need to verify the accessibility of your system, or behavior of your monitor, you can force an existing monitor to run on demand by sending a POST request to the monitor checks collection. When doing this, you are still subject to the same minimum check interval configured for your tenant. 

Permissions

At this time, monitors do not have permissions associated with them.

History

List the change history of a monitor

curl -sk -H "Authorization: Bearer $AUTH_TOKEN" \
     -H "Content-Type: application-json" \
     -X POST --data-binary '{}' \
    https://public.agaveapi.co/monitors/v2/5024717285821443610-242ac11f-0001-014/history
monitors-history -v 5024717285821443610-242ac11f-0001-014

The response will look something like the following:

[
  {
    "createdBy": "nryan",
    "created": "2016-06-12T19:10:22Z",
    "status": "CREATED",
    "description": "This monitor was created by nryan",
    "id": "5705275956568068582-242ac11f-0001-035",
    "_links": {
      "self": {
        "href": "https://public.agaveapi.co/monitor/v2/5024717285821443610-242ac11f-0001-014/history/5705275956568068582-242ac11f-0001-035"
      },
      "monitor_event": {
        "href": "https://public.agaveapi.co/monitor/v2/5024717285821443610-242ac11f-0001-014"
      }
    }
  }
]

A full history of the lifecycle of a monitor is available via the monitor history collection. Here you can list events that have occurred during the life of the monitor.

Events

The following events will be thrown by the Monitors API.

API         Description
CREATED The monitor was created
UPDATED The monitor was updated
DELETED The monitor was deleted
ENABLED The monitor was enabled
DISABLED The monitor was disabled
PERMISSION_GRANT A new user permission was granted on this monitor
PERMISSION_REVOKE A user permission was revoked on this sytem
FORCED_CHECK_REQUESTED A status check was requested by the user outside of the existing monitor schedule.
CHECK_PASSED The status check passed
CHECK_FAILED The status check failed
CHECK_UNKNOWN The status check finished in an unknown state
STATUS_CHANGE The status condition of the monitored resource changed since the last check
RESULT_CHANGE The cumulative result of all checks performed on the monitored resource changed since the last suite of checks

User Profiles

 /$$$$$$$                   /$$$$$$ /$$/$$
| $$__  $$                 /$$__  $|__| $$
| $$  \ $$/$$$$$$  /$$$$$$| $$  \__//$| $$ /$$$$$$  /$$$$$$$
| $$$$$$$/$$__  $$/$$__  $| $$$$   | $| $$/$$__  $$/$$_____/
| $$____| $$  \__| $$  \ $| $$_/   | $| $| $$$$$$$|  $$$$$$
| $$    | $$     | $$  | $| $$     | $| $| $$_____/\____  $$
| $$    | $$     |  $$$$$$| $$     | $| $|  $$$$$$$/$$$$$$$/
|__/    |__/      \______/|__/     |__|__/\_______|_______/

The Agave hosted identity service (profiles service) is a RESTful web service that gives organizations a way to create and manage the user accounts within their Agave tenant. The service is backed by a redundant LDAP instance hosted in multiple datacenters making it highly available. Additionally, passwords are stored using the openldap md5crypt algorithm.

Tenant administrators can manage only a basic set of fields on each user account within LDAP itself. For more complex profiles, we recommend combing the profiles service with the metadata service. See the section on Extending the Basic Profile with the Metadata Service below.

The service uses OAuth2 for authentication, and user’s must have special privileges to create and update user accounts within the tenant. Please work with the Agave development team to make sure your admins have the user-account-manager role.

In addition to the web service, there is also a basic front-end web application providing user sign up. The web application will suffice for basic user profiles and can be used as a starting point for more advanced use cases.

Creating

Create a user account by sending a POST request to the profiles service, providing an access token of a user with the user-account-manager role. The fields username, password and email are required to create a new user.

curl -sk -H "Authorization: Bearer $ACCESS_TOKEN" \
    -X POST \
    -d "username=testuser" \
    -d "password=abcd123" \
    -d "email=testuser@test.com" \
    https://public.tenants.agaveapi.co/profiles/v2
profiles-create -u testuser -p abcd123 -e testuser@test.com

The response to this call for our example user looks like this:

{
  "message":"User created successfully.",
  "result":{
    "email":"testuser@test.com",
    "first_name":"",
    "full_name":"testuser",
    "last_name":"testuser",
    "mobile_phone":"",
    "phone":"",
    "status":"Active",
    "uid":null,
    "username":"testuser"
  },
  "status":"success",
  "version":"2.0.0-SNAPSHOT-rc3fad"
}

The complete list of available fields and their descriptions is provided in the table below.

Field Name Description Required?
username The username for the user; must be unique across the tenant Yes
email The email address for the user. Yes
password The password for the user. Yes
first_name First name of the user No
last_name Last name of the user No
phone User’s phone number No
mobile_phone User’s mobile phone number. No

Note that the service does not do any password strength enforcement or other password management policies. We leave it to each organization to implement the policies best suited for their use case.

Extending with Metadata

Sample metadata object extending a user profile

{
  "name":"user_profile",
  "value":{
    "firstName":"Test",
    "lastName":"User",
    "email":"testuser@test.com",
    "city":"Springfield",
    "state":"IL",
    "country":"USA",
    "phone":"636-555-3226",
    "gravatar":"http://www.gravatar.com/avatar/ed53e691ee322e24d8cc843fff68ebc6"
  }
}

Save the extended profile document to the metadata service

curl -sk -H "Authorization: Bearer $ACCESS_TOKEN" \
    -X POST \
    -F "fileToUpload=@profile_ex" \
    https://public.tenants.agaveapi.co/meta/v2/data/?pretty=true
metadata-addupdate -v -F profile_ex

The response would resemble something like the following:

{
  "status" : "success",
  "message" : null,
  "version" : "2.1.0-rc0c5a",
  "result" : {
    "uuid" : "0001429724043699-5056a550b8-0001-012",
    "owner" : "jstubbs",
    "schemaId" : null,
    "internalUsername" : null,
    "associationIds" : [ ],
    "lastUpdated" : "2015-04-22T12:34:03.698-05:00",
    "name" : "user_profile",
    "value" : {
      "firstName" : "Test",
      "lastName" : "User",
      "email" : "testuser@test.com",
      "city" : "Springfield",
      "state" : "IL",
      "country" : "USA",
      "phone" : "636-555-3226",
      "gravatar" : "http://www.gravatar.com/avatar/ed53e691ee322e24d8cc843fff68ebc6"
    },
    "created" : "2015-04-22T12:34:03.698-05:00",
    "_links" : {
      "self" : {
        "href" : "https://public.tenants.agaveapi.co/meta/v2/data/0001429724043699-5056a550b8-0001-012"
      }
    }
  }
}

We do not expect the fields above to provide full support for anything but the most basic profiles. The recommended strategy is to use the profiles service in combination with the metadata service the (see Metadata Guide for more details) to store additional information. The metadata service allows you to create custom types using JSON schema, making it more flexible than standard LDAP from within a self-service model. Additionally, the metadata service includes a rich query interface for retrieving users based on arbitrary JSON queries.

The general approach used by existing tenants has been to create a single entry per user where the entry contains all additional profile data for the user. Every metadata item representing a user profile can be identified using a fixed string for the “name” attribute (e.g., “user_profile’). The value of the metadata item contains a unique identifier for the user (e.g. username or email address) along with all the additional fields you wish to track on the profile. One benefit of this approach is that it cleanly delineates multiple classes of profiles, for example "admin_profile”, “developer_profile”, “mathematician_profile”, etc. When consuming this information in a web interface, such user-type grouping makes presentation significantly easier.

Another issue to consider when extending user profile information through the Metadata service is ownership. If you create the user’s account, then prompt them to login before entering their extended data, it is possible to create the user’s metadata record under their account. This has the advantage of giving the user full ownership over the information, however it also opens up the possibility that the user, or a third-party application, could modify or delete the record.

A better approach is to use a service account to create all extended profile metadata records and grant the user READ access on the record. This still allows third-party applications to access the user’s information at their request, but prevents any malicious things from happening.

The example above represents a possible JSON document that could be used to store a metadata record representing a profile:

Updating

curl -sk -H "Authorization: Bearer $ACCESS_TOKEN" -X PUT -d "password=abcd123&email=testuser@test.com&first_name=Test&last_name=User" https://public.tenants.agaveapi.co/profiles/v2/testuser
profiles-addupdate -v -p abcd123 -e "testuser@test.com" -f Test -l User testuser

The response to this call looks like this:

{
  "message":"User updated successfully.",
  "result":{
    "create_time":"20150421153504Z",
    "email":"testuser@test.com",
    "first_name":"Test",
    "full_name":"Test User",
    "last_name":"User",
    "mobile_phone":"",
    "phone":"",
    "status":"Active",
    "uid":0,
    "username":"testuser"
  },
  "status":"success",
  "version":"2.0.0-SNAPSHOT-rc3fad"
}

Updates to existing users can be made by sending a PUT request to https://public.tenants.agaveapi.co/profiles/v2/ and passing the fields to update. For example, we can add a gravatar attribute to the account we created above.

Deleting

curl -sk -H "Authorization: Bearer $ACCESS_TOKEN" -X DELETE https://public.tenants.agaveapi.co/profiles/v2/testuser
profiles-delete -v testuser

The response to this call looks like this:

{
"message": "User deleted successfully.",
"result": {},
"status": "success",
"version": "2.0.0-SNAPSHOT-rc3fad"
}

To delete an existing user, make a DELETE request on their profile resource.

Registration Web Application

The account creation web app provides a simple form to enable user self-sign. Here is a screenshot of the sign up form:

Account creation web app form

The web application also provides an email loop for verification of new accounts. The code is open source and freely available from bitbucket: Account Creation Web Application

Most likely you will want to customize the branding and other aspects of the application, but for simple use cases, the Agave team can deploy a stock instance of the application in your tenant. Work with the Agave developer team if this is of interest to your organization.

UUID

 /$$   /$$ /$$   /$$ /$$$$$$ /$$$$$$$
| $$  | $$| $$  | $$|_  $$_/| $$__  $$
| $$  | $$| $$  | $$  | $$  | $$  \ $$
| $$  | $$| $$  | $$  | $$  | $$  | $$
| $$  | $$| $$  | $$  | $$  | $$  | $$
| $$  | $$| $$  | $$  | $$  | $$  | $$
|  $$$$$$/|  $$$$$$/ /$$$$$$| $$$$$$$/
 \______/  \______/ |______/|_______/

The Agave UUID service resolves the type and representation of one or more Agave UUID. This is helpful, for instance, when you need to expand the hypermedia response of another resource, get the URL corresponding to a UUID, or fetch the representations of multiple resources in a single request.

Resolving a single UUID

Resolving a uuid

curl -sk -H "Authorization: Bearer $ACCESS_TOKEN" \
    https://public.tenants.agaveapi.co/uuid/v2/0001409758089943-5056a550b8-0001-002
uuid-lookup -v 0001409758089943-5056a550b8-0001-002

The response will look something like this:

{
  "uuid":"0001409758089943-5056a550b8-0001-002",
  "type":"FILE",
  "_links":{
    "file":{
      "href":"https://public.tenants.agaveapi.co/files/v2/history/system/data.agaveapi.co/nryan/picksumipsum.txt"
    }
  }
}

A single UUID can be resolved by making a GET request on the UUID resource. The response will include the UUID and the type of the resource to which it is associated. The canonical resource URL is available in the hypermedia response. All calls to the UUID API are authenticated, however no permission checks will be made when doing basic resolving.

Expanding a UUID query

Resolving a uuid to a full resource representation

curl -sk -H "Authorization: Bearer $ACCESS_TOKEN" \
    https://public.tenants.agaveapi.co/uuid/v2/0001409758089943-5056a550b8-0001-002?expand=true&pretty=true
uuid-lookup -v -e 0001409758089943-5056a550b8-0001-002

The response will include the entire representation of the resource just as if you queried the Files API.

{
  "internalUsername":null,
  "lastModified":"2014-09-03T10:28:09.943-05:00",
  "name":"picksumipsum.txt",
  "nativeFormat":"raw",
  "owner":"nryan",
  "path":"/home/nryan/picksumipsum.txt",
  "source":"http://127.0.0.1/picksumipsum.txt",
  "status":"STAGING_QUEUED",
  "systemId":"data.agaveapi.co",
  "uuid":"0001409758089943-5056a550b8-0001-002",
  "_links":{
    "history":{
      "href":"https://public.tenants.agaveapi.co/files/v2/history/system/data.agaveapi.co/nryan/picksumipsum.txt"
    },
    "self":{
      "href":"https://public.tenants.agaveapi.co/files/v2/media/system/data.agaveapi.co/nryan/picksumipsum.txt"
    },
    "system":{
      "href":"https://public.tenants.agaveapi.co/systems/v2/data.agaveapi.co"
    }
  }
}

Often times you need more information about the resource associated with the UUID. You can save yourself an API request by adding expand=true to the URL query. The resulting response, if successful, will include the full resource representation of the resource associated with the UUID just as if you had called its URL directly. Filtering is also supported, so you can specify just the fields you want returned in the response.

Resolving multiple UUID

Resolving multiple UUID.

curl -sk -H "Authorization: Bearer $ACCESS_TOKEN" \
    https://public.tenants.agaveapi.co/uuid/v2/?uuids.eq=0001409758089943-5056a550b8-0001-002,0001414144065563-5056a550b8-0001-007?expand=true&pretty=true
uuid-lookup -v -E 0001409758089943-5056a550b8-0001-002 0001414144065563-5056a550b8-0001-007

The response will be similar to the following.

[
  {
    "uuid":"0001409758089943-5056a550b8-0001-002",
    "type":"FILE",
    "url":"https://public.tenants.agaveapi.co/files/v2/history/system/data.agaveapi.co/nryan/picksumipsum.txt",
    "_links":{
      "file":{
        "href":"https://public.tenants.agaveapi.co/files/v2/history/system/data.agaveapi.co/nryan/picksumipsum.txt"
      }
    }
  },
  {
    "uuid":"0001414144065563-5056a550b8-0001-007",
    "type":"JOB",
    "url":"https://public.tenants.agaveapi.co/jobs/v2/0001414144065563-5056a550b8-0001-007",
    "_links":{
      "file":{
        "href":"https://public.tenants.agaveapi.co/jobs/v2/0001414144065563-5056a550b8-0001-007"
      }
    }
  }
]

To resolve multiple UUID, make a GET request on the uuids collection and pass the UUID in as a comma-separated list to the uuids query parameter. The response will contain a list of resolved resources in the same order that you requested them.

Expanding multiple UUID

Resolving multiple UUID to their resource representations

curl -sk -H "Authorization: Bearer $ACCESS_TOKEN" \
    https://public.tenants.agaveapi.co/uuid/v2/?uuids.eq=0001409758089943-5056a550b8-0001-002,0001414144065563-5056a550b8-0001-007?expand=true&pretty=true
uuid-lookup -v -e 0001409758089943-5056a550b8-0001-002 0001414144065563-5056a550b8-0001-007

The response will include an array of the expanded representations in the order they were requested in the URL query.

[
  {
    "id":"$JOB_ID",
    "name":"demo-pyplot-demo-advanced test-1414139896",
    "owner":"$API_USERNAME",
    "appId":"demo-pyplot-demo-advanced-0.1.0",
    "executionSystem":"$PUBLIC_EXECUTION_SYSTEM",
    "batchQueue":"debug",
    "nodeCount":1,
    "processorsPerNode":1,
    "memoryPerNode":1.0,
    "maxRunTime":"01:00:00",
    "archive":false,
    "retries":0,
    "localId":"10321",
    "outputPath":null,
    "status":"STOPPED",
    "submitTime":"2014-10-24T04:48:11.000-05:00",
    "startTime":"2014-10-24T04:48:08.000-05:00",
    "endTime":null,
    "inputs":{
      "dataset":"agave://$PUBLIC_STORAGE_SYSTEM/$API_USERNAME/inputs/pyplot/testdata.csv"
    },
    "parameters":{
      "chartType":"bar",
      "height":"512",
      "showLegend":"false",
      "xlabel":"Time",
      "background":"#FFF",
      "width":"1024",
      "showXLabel":"true",
      "separateCharts":"false",
      "unpackInputs":"false",
      "ylabel":"Magnitude",
      "showYLabel":"true"
    },
    "_links":{
      "self":{
        "href":"https://public.tenants.agaveapi.co/jobs/v2/0001414144065563-5056a550b8-0001-007"
      },
      "app":{
        "href":"https://public.tenants.agaveapi.co/apps/v2/demo-pyplot-demo-advanced-0.1.0"
      },
      "executionSystem":{
        "href":"https://public.tenants.agaveapi.co/systems/v2/$PUBLIC_EXECUTION_SYSTEM"
      },
      "archiveData":{
        "href":"https://public.tenants.agaveapi.co/jobs/v2/0001414144065563-5056a550b8-0001-007/outputs/listings"
      },
      "owner":{
        "href":"https://public.tenants.agaveapi.co/profiles/v2/$API_USERNAME"
      },
      "permissions":{
        "href":"https://public.tenants.agaveapi.co/jobs/v2/0001414144065563-5056a550b8-0001-007/pems"
      },
      "history":{
        "href":"https://public.tenants.agaveapi.co/jobs/v2/0001414144065563-5056a550b8-0001-007/history"
      },
      "metadata":{
        "href":"https://public.tenants.agaveapi.co/meta/v2/data/?q=%7b%22associationIds%22%3a%220001414144065563-5056a550b8-0001-007%22%7d"
      },
      "notifications":{
        "href":"https://public.tenants.agaveapi.co/notifications/v2/?associatedUuid=0001414144065563-5056a550b8-0001-007"
      }
    }
  },
  {
    "internalUsername":null,
    "lastModified":"2014-09-03T10:28:09.943-05:00",
    "name":"picksumipsum.txt",
    "nativeFormat":"raw",
    "owner":"nryan",
    "path":"/home/nryan/picksumipsum.txt",
    "source":"http://127.0.0.1/picksumipsum.txt",
    "status":"STAGING_QUEUED",
    "systemId":"data.agaveapi.co",
    "uuid":"0001409758089943-5056a550b8-0001-002",
    "_links":{
      "history":{
        "href":"https://public.tenants.agaveapi.co/files/v2/history/system/data.agaveapi.co/nryan/picksumipsum.txt"
      },
      "self":{
        "href":"https://public.tenants.agaveapi.co/files/v2/media/system/data.agaveapi.co/nryan/picksumipsum.txt"
      },
      "system":{
        "href":"https://public.tenants.agaveapi.co/systems/v2/data.agaveapi.co"
      }
    }
  }
]

Expansion also works when querying UUID in bulk. Simply add expand=true to the URL query in your request and the full resource representation of each UUID will be returned in an array with the original UUID request order maintained. If any of the resolutions fail due to permission violation or server error, the error response object will be provided rather than resource representation.

Events

 /$$$$$$$$                          /$$            
| $$_____/                         | $$            
| $$   /$$    /$$/$$$$$$ /$$$$$$$ /$$$$$$  /$$$$$$$
| $$$$|  $$  /$$/$$__  $| $$__  $|_  $$_/ /$$_____/
| $$__/\  $$/$$| $$$$$$$| $$  \ $$ | $$  |  $$$$$$ 
| $$    \  $$$/| $$_____| $$  | $$ | $$ /$\____  $$
| $$$$$$$\  $/ |  $$$$$$| $$  | $$ |  $$$$/$$$$$$$/
|________/\_/   \_______|__/  |__/  \___/|_______/ 

Events underpin everything in the Agave Platform. This section covers the events available to each resource.

Event Reference

Apps

Event Description
UPDATED The app was updated
DELETED The app was deleted
PUBLISHED The app was made available for public use.
CLONED The app was cloned as another app
PERMISSION_GRANT A user permission was updated
PERMISSION_REVOKE A user permission was deleted
RESTORED App was restored from disabled status
UNPUBLISHED App was unpublished. It will no longer be available for public use
PUBLISHING_FAILED The app failed to complete publishing. The id given in the original request is not valid and the app will not be publicly available.
DISABLED App was disabled and is not currently available for use.
CLONING_FAILED The app failed to complete publishing. The id given in the original request is not valid and the app will not be available for use.
REGISTERED A new app was registered

Files

Event Description
CREATED File or directory was created
DELETED The file was deleted
INDEX_START Indexing of file/folder started
INDEX_COMPLETE Indexing of file/folder completed
INDEX_FAILED Indexing of file/folder failed
RENAME The file was renamed
MOVED The file was moved to another path
OVERWRITTEN The file was overwritten
PERMISSION_GRANT A user permission was added
PERMISSION_REVOKE A user permission was deleted
STAGING_QUEUED File/folder queued for staging
STAGING File or directory is currently in flight
STAGING_FAILED Staging failed
STAGING_COMPLETED Staging completed successfully
PREPROCESSING Prepairing file for processing
TRANSFORMING_QUEUED File/folder queued for transform
TRANSFORMING Transforming file/folder
TRANSFORMING_FAILED Transform failed
TRANSFORMING_COMPLETED Transform completed successfully
UPLOAD New content was uploaded to the file.
CONTENT_CHANGED Content changed within this file/folder. If a folder, this event will be thrown whenever content changes in any file within this folder at most one level deep.
DOWNLOAD The file item was downloaded.

Internal Users

Event Description
CREATED The internal user was updated
DELETED The internal user was deleted
UPDATED The internal user was updated

Jobs

Event Description
CREATED The job was updated
UPDATED The job was updated
DELETED The job was deleted
PERMISSION_GRANT User permission was granted
PERMISSION_REVOKE Permission was removed for a user on this job
PENDING Job accepted and queued for submission.
STAGING_INPUTS Transferring job input data to execution system
CLEANING_UP Job completed execution
ARCHIVING Transferring job output to archive system
STAGING_JOB Job inputs staged to execution system
FINISHED Job complete
KILLED Job execution killed at user request
FAILED Job failed
STOPPED Job execution intentionally stopped
RUNNING Job started running
PAUSED Job execution paused by user
QUEUED Job successfully placed into queue
SUBMITTING Preparing job for execution and staging binaries to execution system
STAGED Job inputs staged to execution system
PROCESSING_INPUTS Identifying input files for staging
ARCHIVING_FINISHED Job archiving complete
ARCHIVING_FAILED Job archiving failed
HEARTBEAT Job heartbeat received
JOB_RUNTIME_CALLBACK_EVENT This is the default event thrown when a job pushes out runtime information using the AGAVE_JOB_CALLBACK_NOTIFICATION macro.
EMPTY_STATUS_RESPONSE An empty response was received from the remote execution system when querying for job status
REMOTE_STATUS_CHANGE The status of the job on the remote system was changed by an external process. The change does not reflect a change in Agave’s understanding of the job’s status.
UNKNOWN_TERMINATION The job experienced an unknown termination event and is no longer running on the remote system. The job will be failed by Agave momentarily.

Metadata

Event Description
CREATED The metadata was updated
UPDATED The metadata was updated
DELETED The metadata was deleted
PERMISSION_GRANT User permission was granted
PERMISSION_REVOKE

Monitors

API         Description
CREATED The monitor was created
UPDATED The monitor was updated
DELETED The monitor was deleted
ENABLED The monitor was enabled
DISABLED The monitor was disabled
PERMISSION_GRANT A new user permission was granted on this monitor
PERMISSION_REVOKE A user permission was revoked on this sytem
FORCED_CHECK_REQUESTED A status check was requested by the user outside of the existing monitor schedule.
CHECK_PASSED The status check passed
CHECK_FAILED The status check failed
CHECK_UNKNOWN The status check finished in an unknown state
STATUS_CHANGE The status condition of the monitored resource changed since the last check
RESULT_CHANGE The cumulative result of all checks performed on the monitored resource changed since the last suite of checks

Schema

Event Description
CREATED The schema was updated
UPDATED The schema was updated
DELETED The schema was deleted
PERMISSION_GRANT User permission was granted
PERMISSION_REVOKE

PostIts

Event Description
CREATED The metadata was updated
UPDATED The metadata was updated
REFRESHED PostIt was refreshed back to its original quotas or extended for another day
DELETED The metadata was deleted
REDEEMED User permission was granted

Profiles

Event Description
CREATED A new user account was created.
DELETED The user account was deleted.
UPDATED The user account was updated.
ACCOUNT_ACTIVATED The user’s account was activated.
ACCOUNT_DEACTIVATED The user’s account was deactivated.
ROLE_GRANTED The user had a role added.
ROLE_REVOKED The user had a role revoked.
QUOTA_EXCEEDED The user has exceeded one or more quotas.

Systems

Event Description
CREATED The system was created
UPDATED The system was updated
DELETED The system was deleted
ROLES_GRANT User permission was granted
ROLES_REVOKE User role was removed from the system
STATUS_CHANGE The system status changed

Tags

Event Description
CREATED Tag was registered
UPDATED Tag was updated
DELETED Tag was deleted from active use
RESOURCE_ADDED Tag was restored from deleted status
RESOURCE_REMOVED Tag was disabled
PUBLISHED Tag was published for public use
UNPUBLISHED Tag was unpublished. It will no longer be available for public use
PERMISSION_REVOKE One or more user permissions were revoked on this tag
PERMISSION_GRANT One or more user permissions were granted on this tag

Transfers

Event Description
CREATED A new transfer was created
CANCELLED The system was deleted
QUEUED Transfer queued and waiting to start
COMPLETED Transfer completed successfully
FAILED Transfer failed while transferring
PAUSED Transfer paused
RETRYING Transfer failed, beginning to retry
TRANSFERRING Transfer has started
  /$$$$$$                                     /$$
 /$$__  $$                                   | $$
| $$  \__/ /$$$$$$  /$$$$$$  /$$$$$$  /$$$$$$| $$$$$$$
|  $$$$$$ /$$__  $$|____  $$/$$__  $$/$$_____| $$__  $$
 \____  $| $$$$$$$$ /$$$$$$| $$  \__| $$     | $$  \ $$
 /$$  \ $| $$_____//$$__  $| $$     | $$     | $$  | $$
|  $$$$$$|  $$$$$$|  $$$$$$| $$     |  $$$$$$| $$  | $$
 \______/ \_______/\_______|__/      \_______|__/  |__/

Search is a fundamental feature of the Agave Platform. Most of the core science APIs support a mature, URL-based query mechanism allowing you to search using a sql-inspired json syntax. The two exceptions are the Files and Metadata APIs. The Files service does not index the directory or file contents of registered systems, so there is no way for it to performantly search the file system. The metadata service supports MongoDB query syntax, thus allowing more flexible, and slightly more complex, querying syntax.

Query syntax

http://public.agaveapi.co/jobs/v2?name=test%20job

You can include as multiple search expressions to build a more restrictive query.

http://public.agaveapi.co/jobs/v2?name=test%20job&executionSystem=aws-demo&status=FAILED

By default, search is enabled on each collection endpoint allowing you to trim the response down to the results you care about most. The list of available search terms is identical to the attributes included in the JSON returned when requesting the full resource description.

To search for a specific attribute, you simply append a search expression into the URL query of your request. For example:

Search operators

# systems with cloud in their name  
systems/v2?name.like=*cloud*

# apps modified between October 1 and October 30 of this year  
apps/v2?lastModified.between=10/1,10/30

# jobs with status equal to PENDING or ARCHIVING  
jobs/v2?id.in=PENDING,ARCHIVING
# systems with cloud in their name  
systems-search 'name.like=*cloud*'

# apps modified between October 1 and October 30 of this year  
apps-search 'lastModified.between=10/1,10/30'

# jobs with status equal to PENDING or ARCHIVING  
jobs-search 'id.in=PENDING,ARCHIVING'

By default, all search expressions are evaluated for equality. In order to perform more complex queries, you may append a search operator to the attribute in your search expression. The following examples should help clarify:

For resources with nested collections, you may use JSON dot notation to query the subresources in the collection.

# systems using Amazon S3 as the storage protocol  
systems/v2?storage.protocol.eq="S3"

# systems with a batch queue allowing more than 10 concurrent user jobs  
systems/v2?queues.maxUserJobs.gt=10
# systems using Amazon S3 as the storage protocol  
systems-search 'storage.protocol.eq=S3'

# systems with a batch queue allowing more than 10 concurrent user jobs  
systems-search 'queues.maxUserJobs.gt=10'

Multiple operators

# jobs whos app has hadoop in the name, ran on a system with id aws-demo, and started
# any time during the last business week
jobs/v2?appId.like=*hadoop*&executionSystem.eq=aws-demo&startTime.between=last%20monday,last%20friday

# users who profile has a last name ending in ross and an email address ending in texas.edu
profiles/v2?lastname.like=*ross&email.like=*texas.edu

# failed login checks on the a system with uuid 0001409867973952-5056a550b8-0001-014
monitors/v2/?target.like=*ec2*&result.eq=FAILED&type=LOGIN
# jobs whos app has hadoop in the name, ran on a system with id aws-demo, and started
# any time during the last business week
jobs-search 'appId.like=*hadoop*' \
            'executionSystem.eq=aws-demo' \
            'startTime.between=last%20monday,last%20friday'

# users who profile has a last name ending in ross and an email address ending in texas.edu
profiles-search 'lastname.like=*ross' 'email.like=*texas.edu'

# failed login monitor checks on systems with "ec2" in the name
monitors-checks-search -M target.like=*ec2* \
                      'result.eq=FAILED' \
                      'type=LOGIN'

As before you can include multiple search expressions to narrow your results.

The full list of search operators is given in the following table.

Operator Values Description
eq mixed Matches values equal to the given search value. All comparisons are case sensitive. This cannot be used for complex object comparison.
on datestring Matches dates falling on the given datestring. Regardless of the precision given in the datestring, the search will look for matches from midnight to midnight on the resovled date.
neq mixed Matches values not equal to the given search value. All comparisons are case sensitive. This cannot be used for complex object comparison.
lt mixed Matches values less than the given search value.
before datestring Matches dates falling before the given datestring. Single second precision is supported.
lte mixed Matches values less than or equal to the given search value.
gt mixed Matches values greater than the given search value.
after datestring Matches values after the given datestring.
gte mixed Matches values greater than or equal to the given search value.
in comma-separated list Matches values in the given comma-separated list. This is equivalent to applying the like operator to each comma-separated value .
nin comma-separated list Matches values not in the given comma-separated list. This is equivalent to applying the nlike operator to each comma-separated value .
like string Matches values similar to the given search term. Wildcards (*) may be used to perform partial matches.
nlike string Matches values different from the given search term. Wildcards (*) may be used to perform partial matches.
between comma-separated datestring Matches dates falling within the given range. Single second precision is supported at either end of the range.

Date support

Dates returned from the Agave core science API are always formatted as ISO8601 dates. When searching, however, a much more flexible date syntax is supported. The following table lists supported expressions by example.

Expression Equivalent Expression
08:00:00.000
4pm or 04:00pm or 16:00 16:00:00.000
430pm or 04:30pm or 16:30 16:30:00.000
4pm 17:00:00.000
+1 second|minute|hour|day|week|month|year now +1 second|minute|hour|day|week|month|year
-1 second|minute|hour|day|week|month|year now -1 second|minute|hour|day|week|month|year
next Tuesday
last Tuesday
now new Date()
today 00:00:00.000
midnight 00:00:00.000 +24 hours
morning or this morning 07:00:00.000
noon 12:00:00.000
afternoon or this afternoon 13:00:00.000
evening or this evening 17:00:00.000
tonight 20:00:00.000
tomorrow now +24 hours
tomorrow morning morning +24 hours
noon tomorrow or tomorrow noon noon +24 hours
tomorrow afternoon afternoon +24 hours
yesterday now -24 hours
all the permutations of yesterday and morning, noon, afternoon, and evening #colspan#
2004
October or Oct 10/1
Tuesday or Tue Calendar date of the next Tuesday
October 26, 1981 or Oct 26, 1981 10/26/1981
October 26 or Oct 26 10/26
26 October 1981 10/26/1981
26 Oct 1981 10/26/1981
26 Oct 81 10/26/1981
10/26/1981 or 10-26-1981
10/26/81 or 10-26-81
1981/10/26 or 1981-10-26 10/26/1981
10/26 or 10-26

Custom search result

Search with multiple operators and return a custom response

# jobs whos app has hadoop in the name, ran on a system with id aws-demo, and started
# any time during the last business week
jobs/v2?appId.like=*cloud*&executionSystem.like=*docker*&startTime.after=2016-01-01&naked=true&limit=3
# jobs whos app has hadoop in the name, ran on a system with id aws-demo, and started
# any time during the last business week
jobs-search -v --limit=3 \
            --filter=id,appId,executionSystem,status,created \
            'appId.like=*cloud*' \
            'executionSystem.like=*docker*'
            'startTime.after=2016-01-01' \
            'naked=true'

There response will be a JSON array of custom objects comprised of only the fields you specified in the filter query parameter.

[
  {
    "id":"2974032102330798566-242ac115-0001-007",
    "appId":"cloud-runner-0.1.0u1",
    "executionSystem":"docker.tacc.utexas.edu",
    "status":"FINISHED",
    "created":"2016-11-03T16:04:53.000-05:00"
  },
  {
    "id":"8643408718823550490-242ac115-0001-007",
    "appId":"cloud-runner-0.1.0u1",
    "executionSystem":"docker.tacc.utexas.edu",
    "status":"FINISHED",
    "created":"2016-11-03T15:17:24.000-05:00"
  },
  {
    "id":"9049010248689521126-242ac115-0001-007",
    "appId":"cloud-runner-0.1.0u1",
    "executionSystem":"docker.tacc.utexas.edu",
    "status":"FINISHED",
    "created":"2016-11-03T15:17:07.000-05:00"
  }
]

By combining the search, filtering, and naked query parameters, you can query the API and return just the information you care about. The example search will return a JSON array of job objects with just the id, appId, executionSystem, status, and created fields from the full job object in the response. This combination of search, filtering, and pagination provides a powerful mechanism for generating custom views of the data.

Tooling

Sometimes the hardest part of a new project is taking the first step. Agave Tooling helps make taking that first step a little easier through reference web applications, boilerplate integrations scripts, and integrations with popular CMS and frameworks through native plugins and modules.

Agave ToGo

Get a head start on your next development sprint by leveraging the open source Agave ToGo project. This AngularJS webapp can be reused in your existing project or used as-is for a clean, responsive, client-side web application that brings the full power of Agave to your browser.

Jupyter Hub

Jupyter notebooks (formerly iPython notebooks) provide users with interactive computing documents that contain both computer code and a mix of rich text elements such as data visualizations, text paragraphs, hyperlinks, formatted equations, etc. The code cells in notebooks can be executed interactively, cell by cell, and the results of the executions are displayed in subsequent cells in the notebook. The notebooks can also be exported to a serialized JSON formatted file and executed like a traditional program.

JupyterHub is an open source project to provide multi-user hosted notebook servers as a service. When a user signs in to JupyterHub, a notebook server with pre-configured software is automatically launched for them. The Agave team integrated JupyterHub into its identity and access management stack and made several other additional enhancements and customizations to enable the use of Agave’s language SDKs such as agavepy and the CLI, persistent storage, and multiple kernel support, directly from their notebooks with very minimal setup. Agave’s deployment of JupyterHub, which runs each user’s notebook server in a Docker container to further enhance reproducibility, is freely available for use in Agave’s Public Tenant.

You can get started with JupyterHub today at https://jupyter.agaveapi.co.

Command Line Interface

The Agave command-line interface (CLI) is an complete interface to the Agave REST API. The scripts include support for creating persistent authentication sessions, creating/renaming apps, registering and sharing systems, uploading and managing data, creating PostIts, etc. For existing projects looking to leverage Agave for back-end processing, for users wishing to integrate Agave into their existing scripted solutions, or for those new to Agave who just want to kick the tires, the Agave CLI is a powerful tool for all of these things. The Agave CLI can be checked out from the Agave git repository.

git clone https://bitbucket.org/agaveapi/cli

For more information on using the Agave CLI in common tasks, please consult the Tutorials Section which reference it in all their examples, or check out the Agave Samples project for sample data and examples of how to use it to populate and interact with your tenant.