ADES (Processing)⚓︎
ZOO-Project DRU⚓︎
Note
With EOEPCA release 1.4, the ADES implementation has been significantly reworked and fully aligned with the upstream ZOO-Project (GitHub). This zoo-project-dru
version deprecates the previous proc-ades
implementation.
With this transition, there are some functional changes to be aware of…
- Service Endpoint
Withzoo-project-dru
the OGC API Processes endpoint is at the path/<username>/ogc-api/processes
compared to the previous/<username>/wps3/processes
. - Deployed Application Endpoint
The endpoint for a deployed Application no longer appends the version of the Application Package.
For example, previously the applicationconvert-url
at version0.1.2
would result in the endpoint/<username>/wps3/processes/convert-url_0_1_2
.
With the newzoo-project-dru
this same Application Package deployment will result in the endpoint/<username>/ogc-api/processes/convert-url
. - Deployed Application Version
The version of the deployed application is obtained from the Application Package CWL (ref.s:softwareVersion: 0.1.2
), and is maintained within the metadata for the deployed process that is returned from the APIsGet Process Details
request.
In the case that multiple versions of the same Application Package are required to be simultaneously deployed, then this would have to be handled with different CWL documents in which the version is embedded in the workflowid
(or some other technique that establishes uniqueness ofid
between variants).
The ADES provides a platform-hosted execution engine through which users can initiate parameterised processing jobs using applications made available within the platform - supporting the efficient execution of the processing ‘close to the data’. Users can deploy specific ‘applications’ to the ADES, which may be their own applications, or those published by other platform users.
The ADES provides an implementation of the OGC API Processes - Part 1: Core and Part 2: Deploy, Replace, Undeploy (draft).
Helm Chart⚓︎
The EOEPCA deployment is aligned with the upstream implementation and so relies upon the upstream helm chart that is hosted at the ZOO-Project Helm Chart Repository - in particular the zoo-project-dru
chart variant.
The chart is configured via values that are fully documented in the README for the zoo-project-dru
chart.
helm install --version 0.2.6 --values ades-values.yaml \
--repo https://zoo-project.github.io/charts/ \
zoo-project-dru zoo-project-dru
Values⚓︎
The deployment must be configured for you environment. Some significant configuration values are elaborated here…
Cookie-cutter Template⚓︎
The implementation zoo-project-dru
provides the core capabilities for OGC API Processes Parts 1 & 2. The deployemnt of this core must be completed by inetgartion with the ‘runner’ that executes the processes as Application Packages, and integrates as necessary with other platform services - such as Catalogue, Workspace, etc.
Thus, zoo-project-dru
is extensible by design via a ‘cookie-cutter’ that provides the template ‘runner’ for each Application Package process as it is deployed to the service.
For the purposes of our EOEPCA ‘release’ as covered by this guide, we provide eoepca-proc-service-template
as a cookie-cutter implemetation that provides:
- Integration with Kubernetes to run process Application packages, via the Calrissian CWL runner
- Stage-in of inputs as STAC items, integrated as required with S3 object storage
- Stage-out of outputs as a STAC Collection, integrated with S3 object storage and (optionally) user Workspace inetgration
The cookie-cutter template is identified in the helm values…
cookiecutter:
templateUrl: https://github.com/EOEPCA/eoepca-proc-service-template.git
templateBranch: master
The function of the cookie-cutter template is supported some other aspects, that are elaborated below, which must be configured in collaboration with the expectations of the template.
In particular…
- Template parameterisation that is passed through the core
zoo-project-dru
configuration [ref] - CWL ‘wrapper’ files that prepend and append the process Application Package CWL to perform stage-in and stage-out functions [ref]
ZOO-Project DRU custom configuration⚓︎
In order support our eoepca-proc-service-template
cookie-cutter template, there is a custom zoo-project-dru
container image that includes the python dependencies that are required by this template. Thus, the deployment must identify the custom container image via helm values…
zoofpm:
image:
tag: eoepca-092ea7a2c6823dba9c6d52c383a73f5ff92d0762
zookernel:
image:
tag: eoepca-092ea7a2c6823dba9c6d52c383a73f5ff92d0762
In addition, we can add values to the ZOO-Project DRU main.cfg
configuration file via helm values. In this case we add some eoepca-specific values that match those that we know to be expected by our eoepca-proc-service-template
cookie-cutter template. In this way we can effectively use helm values to pass parameters through to the template.
This is manifest in zoo’s main.cfg
in INI file configuration syntax…
The presence or otherwise of the workspace_prefix
parameter dicates whether or not the stage-out step will integrate with the user’s Workspace for persistence of the processing results, and registration within the Workspace services.
In the case that workspace_prefix
is not set, then the object storage specification in the helm values is relied upon…
workflow:
inputs:
STAGEOUT_AWS_SERVICEURL: https://minio.192-168-49-2.nip.io
STAGEOUT_AWS_ACCESS_KEY_ID: eoepca
STAGEOUT_AWS_SECRET_ACCESS_KEY: changeme
STAGEOUT_AWS_REGION: RegionOne
STAGEOUT_OUTPUT: eoepca
Stage-in / Stage-out⚓︎
The ADES hosts applications that are deployed and invoked in accordance with the OGC Best Practise for Application Package. Thus, the ADES provides a conformant environment within which the application is integrated for execution. A key part of the ADES’s role in this is to faciltate the provision of input data to the application (stage-in), and the handling of the results output at the conclusion of application execution (stage-out).
The zoo-project-dru
helm chart provides a default implementation via the included files - main.yaml
, rules.yaml
, stagein.yaml
and stageout.yaml
.
The helm values provides a means through which each of these files can be overriden for reasons of integration with your platform environment…
files:
# Directory 'files/cwlwrapper-assets' - assets for ConfigMap 'XXX-cwlwrapper-config'
cwlwrapperAssets:
main.yaml: |-
<override file content here>
rules.yaml: |-
<override file content here>
stagein.yaml: |-
<override file content here>
stageout.yaml: |-
<override file content here>
In the most part the default CWL wrapper files provided with the helm chart are suffient. In particular the stagein.yaml
implements the stage-in of STAC items that are specified as inputs of type Directory
in the Application Package CWL.
E.g.
inputs:
stac:
label: the image to convert as a STAC item
doc: the image to convert as a STAC item
type: Directory
Nevertheless, in this guide we provide an override of the stageout.yaml
in order to organise the processing outputs into a STAC Collection that is then pushed to the designated S3 object storage, including support for the user’s workspace storage and resource management services.
The custom stage-out embeds, within the CWL document, the python code required to implement the desired stage-out functionality. This should be regarded as an example that could be adapted for alternative behaviour.
cwlVersion: v1.0
class: CommandLineTool
id: stage-out
doc: "Stage-out the results to S3"
inputs:
process:
type: string
collection_id:
type: string
STAGEOUT_OUTPUT:
type: string
STAGEOUT_AWS_ACCESS_KEY_ID:
type: string
STAGEOUT_AWS_SECRET_ACCESS_KEY:
type: string
STAGEOUT_AWS_REGION:
type: string
STAGEOUT_AWS_SERVICEURL:
type: string
outputs:
StacCatalogUri:
outputBinding:
outputEval: ${ return "s3://" + inputs.STAGEOUT_OUTPUT + "/" + inputs.process + "/catalog.json"; }
type: string
baseCommand:
- python
- stageout.py
arguments:
- $( inputs.wf_outputs.path )
- $( inputs.STAGEOUT_OUTPUT )
- $( inputs.process )
- $( inputs.collection_id )
requirements:
DockerRequirement:
dockerPull: ghcr.io/terradue/ogc-eo-application-package-hands-on/stage:1.3.2
InlineJavascriptRequirement: {}
EnvVarRequirement:
envDef:
AWS_ACCESS_KEY_ID: $( inputs.STAGEOUT_AWS_ACCESS_KEY_ID )
AWS_SECRET_ACCESS_KEY: $( inputs.STAGEOUT_AWS_SECRET_ACCESS_KEY )
AWS_REGION: $( inputs.STAGEOUT_AWS_REGION )
AWS_S3_ENDPOINT: $( inputs.STAGEOUT_AWS_SERVICEURL )
InitialWorkDirRequirement:
listing:
- entryname: stageout.py
entry: |-
import sys
import shutil
import os
import pystac
cat_url = sys.argv[1]
shutil.copytree(cat_url, "/tmp/catalog")
cat = pystac.read_file(os.path.join("/tmp/catalog", "catalog.json"))
...
The helm chart values provide the opportunity to pass through additional inputs - to satisfy the input specifications that are specified in the cwlwrapperAssets
files…
workflow:
inputs:
STAGEIN_AWS_SERVICEURL: http://data.cloudferro.com
STAGEIN_AWS_ACCESS_KEY_ID: test
STAGEIN_AWS_SECRET_ACCESS_KEY: test
STAGEIN_AWS_REGION: RegionOne
STAGEOUT_AWS_SERVICEURL: https://minio.192-168-49-2.nip.io
STAGEOUT_AWS_ACCESS_KEY_ID: eoepca
STAGEOUT_AWS_SECRET_ACCESS_KEY: changeme
STAGEOUT_AWS_REGION: RegionOne
STAGEOUT_OUTPUT: eoepca
Node Selection⚓︎
The zoo-project-dru
services uses a Node Selector to determine the node(s) upon which the processing execution is run. This is configured as a matching rule in the helm values, and must be tailored to your cluster.
For example, for minikube…
Ingress⚓︎
Ingress can be enabled and configured to establish (reverse-proxy) external access to the zoo-project-dru
services.
Hosturl
In the case that protection is enabled - e.g. via Resource Guard - then it is likely that ingress should be disabled here, since the ingress will instead be handled by the protection.
In this case, the hosturl
parameter should be set to reflect the public url through the service will be accessed.
In the case that ingress is enabled then it is not necessary to specify the hosturl
, since it will be taken from the ingress.hosts[0].host
value.
Ingress disabled…
Ingress enabled…
ingress:
enabled: true
annotations:
kubernetes.io/ingress.class: nginx
ingress.kubernetes.io/ssl-redirect: true
nginx.ingress.kubernetes.io/ssl-redirect: true
cert-manager.io/cluster-issuer: letsencrypt-production
hosts:
- host: zoo-open.192-168-49-2.nip.io
paths:
- path: /
pathType: ImplementationSpecific
tls:
- hosts:
- zoo-open.192-168-49-2.nip.io
secretName: zoo-open-tls
The above example assumes that TLS should be enabled via Letsencrypt as certificate provider - see section Letsencrypt Certificates.
Persistence⚓︎
Various of the services deployed as part of zoo-project-dru
rely upon dynamic provisioning of persistent storage volumes.
A number of helm values are impacted by this setting, which must be configured with the Storage Class appropriate to your cluster. For example, using the minikube standard
storage class…
workflow:
storageClass: standard
persistence:
procServicesStorageClass: standard
storageClass: standard
tmpStorageClass: standard
postgresql:
primary:
persistence:
storageClass: standard
readReplicas:
persistence:
storageClass: standard
rabbitmq:
persistence:
storageClass: standard
Built-in IAM⚓︎
ZOO-Project DRU has a built-in capability for Identity & Access Management (IAM), in which the zoo-project-dru service is configured as an OIDC client of an OIDC Identity Provider service.
This capability is disabled by the default deployment offered by this guide (ingress.enabled: false
) - which instead (optionally) applies resource protection using the EOEPCA IAM solution. Nevertheless, the built-in IAM can be enabled and configured through helm values.
For example…
iam:
enabled: true
openIdConnectUrl: https://keycloak.192-168-49-2.nip.io/realms/master/.well-known/openid-configuration
type: openIdConnect
name: OpenIDAuth
realm: Secured section
Protection⚓︎
As described in section Resource Protection (Keycloak), the identity-gatekeeper
component can be inserted into the request path of the zoo-project-dru
service to provide access authorization decisions
Gatekeeper⚓︎
Gatekeeper is deployed using its helm chart…
helm install zoo-project-dru-protection identity-gatekeeper -f zoo-protection-values.yaml \
--repo https://eoepca.github.io/helm-charts \
--namespace "zoo" --create-namespace \
--version 1.0.11
The identity-gatekeeper
must be configured with the values applicable to the zoo-project-dru
- in particular the specific ingress requirements for the zoo-project-dru-service
…
Example zoo-protection-values.yaml
…
fullnameOverride: zoo-project-dru-protection
config:
client-id: ades
discovery-url: https://keycloak.192-168-49-2.nip.io/realms/master
cookie-domain: 192-168-49-2.nip.io
targetService:
host: zoo.192-168-49-2.nip.io
name: zoo-project-dru-service
port:
number: 80
secrets:
# Values for secret 'zoo-project-dru-protection'
# Note - if ommitted, these can instead be set by creating the secret independently.
clientSecret: "changeme"
encryptionKey: "changemechangeme"
ingress:
enabled: true
className: nginx
annotations:
ingress.kubernetes.io/ssl-redirect: "true"
nginx.ingress.kubernetes.io/ssl-redirect: "true"
cert-manager.io/cluster-issuer: letsencrypt-production
serverSnippets:
custom: |-
# Open access to some endpoints, including Swagger UI
location ~ /(ogc-api/api|swagger-ui) {
proxy_pass {{ include "identity-gatekeeper.targetUrl" . }}$request_uri;
}
Keycloak Client⚓︎
The Gatekeeper instance relies upon an associated client configured within Keycloak - ref. client-id: ades
above.
This can be created with the create-client
helper script, as descirbed in section Client Registration.
For example, with path protection for test users…
../bin/create-client \
-a https://keycloak.192-168-49-2.nip.io \
-i https://identity-api.192-168-49-2.nip.io \
-r "master" \
-u "admin" \
-p "changeme" \
-c "admin-cli" \
--id=ades \
--name="ADES Gatekeeper" \
--secret="changeme" \
--description="Client to be used by ADES Gatekeeper" \
--resource="eric" --uris='/eric/*' --scopes=view --users="eric" \
--resource="bob" --uris='/bob/*' --scopes=view --users="bob" \
--resource="alice" --uris='/alice/*' --scopes=view --users="alice"
Service URLs⚓︎
The zoo-project-dru
service provides a mutil-user aware set of service interfaces at…
- OGC API Processes:
https://zoo.192-168-49-2.nip.io/<username>/ogc-api/
- Swagger UI:
https://zoo.192-168-49-2.nip.io/swagger-ui/oapip/
Usage Samples⚓︎
See the Example Requests in the Processing Deployment for sample requests that cans be used to test your deployment, and to learn usage of the OGC API Processes.
Debugging Tips⚓︎
This section includes some tips that may be useful in debugging errors with deployed application packages.
For debugging, establish a shell session with the zoofpm
pod…
Execution Logs⚓︎
The logs are in the directory /tmp/zTmp
…
In the log directory, each execution is characterised by a set of files/directories…
<appname>_<jobid>_error.log
<<START HERE
The main log output of the job<appname>_<jobid>.json
The output (results) of the job<jobid>_status.json
The overall status of the job<jobid>_logs.cfg
Index of logs for job workflow stepsconvert-url-c6637d4a-d561-11ee-bf3b-0242ac11000e
(directory)
Subdirectory with a dedicated log file for each step of the CWL workflow, including the stage-in and stage-out steps
Deployed Process ‘Executables’⚓︎
When the process is deployed from its Application Package, then a representation is created using the configured cookiecutter.templateUrl
.
It may be useful to debug the consequent process files, which are located under the path /opt/zooservices_user/<username>
, with a dedicated subdirectory for each deployed process - i.e. /opt/zooservices_user/<username>/<appname>/
.
For example…
$ cd /opt/zooservices_user/eric/convert-url
$ ls -l
total 28
-rw-rw-r-- 1 www-data www-data 0 Feb 27 11:17 __init__.py
drwxrwxr-x 2 www-data www-data 4096 Feb 27 11:17 __pycache__
-rw-rw-r-- 1 www-data www-data 1408 Feb 27 11:17 app-package.cwl
-rw-rw-r-- 1 www-data www-data 17840 Feb 27 11:17 service.py
Note
In the case that the cookie-cutter template is updated, then the process can be re-deployed to force a refresh against the updated template.
Swagger UI (OpenAPI)⚓︎
The zoo-project-dru
service includes a Swagger UI interactive representation of its OpenAPI REST interface - available at the URL https://zoo.192-168-49-2.nip.io/swagger-ui/oapip/
.
Application Package Example⚓︎
For a (trivial) example application package see Example Application Package, which provides a description and illustration of the basics of creating an application that integrates with the expectations of the ADES stage-in and stage-out.
For further reference see…
- Application Packages
- Common Workflow Language (CWL)
Additional Information⚓︎
Additional information regarding the ADES can be found at:
- ZOO-Project DRU…
- Git Repositories…
- ZOO-Project
Core OGC API Processes capability - eoepca-proc-service-template
Cookie-cutter template for Application Package execution in Kubernetes - zoo-calrissian-runner
Python library used by theeoepca-proc-service-template
to aid orchestration of CWL application packages running in Kubernetes via Calrissian - pycalrissian
Python library used byzoo-calrissian-runner
to aid interfacing with Calrissian and Kubernetes
- ZOO-Project