Common Workflow Language (CWL) Workflow Description, v1.2.1 §
This version:
Latest stable version:
Authors:
- Peter Amstutz peter.amstutz@curii.com, Curii Inc. / Arvados; https://orcid.org/0000-0003-3566-7705
- Michael R. Crusoe mrc@commonwl.org, CWL Project Lead; https://orcid.org/0000-0002-2961-9670
- Kaushik Ghose kaushik.ghose@sbgenomics.com, Seven Bridges Genomics, Inc; https://orcid.org/0000-0003-2933-1260
Contributors to v1.2:
- John Chilton jmchilton@gmail.com, Galaxy Project, Pennsylvania State University; https://orcid.org/0000-0002-6794-0756
- Michael Franklin michael.franklin@petermac.org, Peter MacCallum Cancer Centre and University of Melbourne; https://orcid.org/0000-0001-9292-1533
- Bogdan Gavrilovic bogdan.gavrilovic@sbgenomics.com, Seven Bridges Genomics; https://orcid.org/0000-0003-1550-1716
- Stian Soiland-Reyes, University of Manchester; https://orcid.org/0000-0001-9842-9718
Incorporates the work of past authors and contributors to CWL v1.0 and CWL v1.1.
This standard was approved on 2020-08-07 by the CWL leadership team consisting of:
- Peter Amstutz, Curii Inc. / Arvados; https://orcid.org/0000-0003-3566-7705
- John Chilton, Pennsylvania State University / Galaxy Project; https://orcid.org/0000-0002-6794-0756
- Michael R. Crusoe, CWL Project Lead; https://orcid.org/0000-0002-2961-9670
- Brandi Davis Dusenbery, Seven Bridges Genomics, Inc.; https://orcid.org/0000-0001-7811-8613
- Jeff Gentry, Foundation Medicine; https://orcid.org/0000-0001-5351-8442
- Hervé Ménager, Institut Pasteur; https://orcid.org/0000-0002-7552-1009
- Stian Soiland-Reyes, University of Manchester; https://orcid.org/0000-0001-9842-9718
Publisher: Common Workflow Language project, a member project of Software Freedom Conservancy
Abstract §
This specification defines the Common Workflow Language (CWL) Workflow description, a vendor-neutral standard for representing analysis tasks where a sequence of operations are described using a directed graph of operations to transform input to output. CWL is portable across a variety of computing platforms.
Status of this document §
This document is the product of the Common Workflow Language working group. The source for the latest version of this document is available at
https://github.com/common-workflow-language/cwl-v1.2/
The products of the CWL working group (including this document) are made available under the terms of the Apache License, version 2.0.
Table of contents
1. Introduction §
The Common Workflow Language (CWL) working group is an informal, multi-vendor working group consisting of various organizations and individuals that have an interest in portability of data analysis workflows. The goal is to create specifications like this one that enable data scientists to describe analysis tools and workflows that are powerful, easy to use, portable, and support reproducibility.
1.1 Introduction to the CWL Workflow standard v1.2.1 §
There are no new features nor behavior changes in CWL v1.2.1 as compared to CWL v1.2.0. v1.2.1 fixes only typos, adds clarifications, and adds additional conformance tests. Some changes to the schema defining CWL have been made to aid the auto-generation of libraries for the reading and writing of CWL documents.
Documents should continue to specify cwlVersion: v1.2
. However, when
reporting results from running the CWL conformance tests, please do report
all three components; for example "99% of CWL v1.2.0 required tests" or
"100% of CWL v1.2.1 required tests".
See also the CommandLineTool v1.2.1 changelog and the Schema-Salad v1.2.1 changelog.
1.2 Changelog for v1.2.1 §
- CWL has been assigned an official IANA Media Type of
application/cwl
for either JSON or YAML format. For JSON formatted CWL documents,application/cwl+json
has also been assigned and can be used. For specifying a YAML formatted CWL document, one can useapplication/cwl+yaml
. The above has been documented in the Syntax section. - There is now an unofficial JSON Schema for CWL documents,
donated by Francis Charette-Migneault. This schema captures much, but not
all, of the potential complexity of CWL documents. It was created for
the draft
OGC API - Processes - Part 2: Deploy, Replace, Undeploy
standard.
To support the testing of this unofficial JSON Schema for CWL, some of
the
should_fail: true
tests have had the labeljson_schema_invalid
added. - For consistency, all references to
URI
s have been replaced withIRI
s (Internationalized Resource Identifiers). - The
WorkflowStep.run
field description now explicitly states that it can be either a string referencing an external document or an embedded Process. This was previously only stated indirectly. - The
outputSource
field ofWorkflowOutputParameter
now explicitly states that workflow inputs can be referenced. The mandatory conformance testoutput_reference_workflow_input
has been added to confirm this. - The example list of process requirements that
can be inherited from a parent
Workflow
by aCommandLineTool
was incomplete in CWL v1.2;LoadListingRequirement
,WorkReuse
,NetworkAccess
,InplaceUpdateRequirement
,ToolTimeLimit
are also valid. - The BNF grammar description of CWL Parameter References
has been reformatted so that symbols get
code formatting
. - In CWL v1.2, the outputs of
ExpressionTool
s are never type-checked due to a long-standing bug in the CWL reference implementation. This has been made explicit along with the plan to fix this oversight in CWL v1.3. - The purpose and valid circumstances for using
Workflow.id
,ExpressionTool.id
, orOperation.id
have been made more explicit: It is a unique identifier for that Process; Only useful for when those are in a$graph
. Thisid
value should not be exposed to users in graphical or terminal user interfaces.
1.2.1 Clarifications to the schema in CWL v1.2.1 to aid autogenerated libraries §
Many CWL parsing/generating libraries are autogenerated from the official schema
for various programming languages by using schema-salad --codegen
.
In CWL v1.2.1 we made many clarifications to the schema to enable faster parsing; or to produce better results for end users. These changes do not change the CWL syntax or its meaning; we are just now modeling it better.
- The schema for
Requirement
s has changed to enable faster parsing by autogenerated libraries. Theclass
field is now a static enum with a single permissible value instead of a generic string (for example:class: SubworkflowFeatureRequirement
for aSubworkflowFeatureRequirement
hint or requirement.) This allows for autogenerated CWL parsers to recognize any requirement immediately instead of having to check for matching field names and valid values, as was done previously. - Likewise, the schema for
Workflow
,ExpressionTool
, andOperation
has also been changed to enable faster parsing; theclass
field is now a static enum with a single permissible value (class: Workflow
,class: ExpressionTool
,class: Operation
) instead of a generic string. - The schema for the
hints
field ofWorkflow
,ExpressionTool
, andOperation
has been expanded from:Any[]?
to["null", { type: array, items: [ ProcessRequirement, Any] } ]
. This allows autogenerated CWL parsers to deserialize any of the standard CWL hints instead of forcing the users of those parsers to convert the unserialized hints to normal objects themselves. - The schema for
WorkflowOutputParameter.outputSource
had the wrongrefScope
of0
instead of1
; This will correctly remove theid
of the workflow itself when searching for the source of this output. - Everywhere the schema allows a value of type
long
we also explicitly allow a value of typeint
:File.size
,ToolTimeLimit.timelimt
. By JSON rules this is implicit, but by making it explicit we aid autogenerated CWL libraries especially in languages such as Java. - The schema for the
default
field of WorkflowInputParameter, WorkflowStepInput, and OperationInputParameter has been expanded fromAny?
to["null", File, Directory, Any]
so that autogenerated CWL libraries will deserialize anyFile
orDirectory
objects automatically for the user. - The schema for the
hints
field ofWorkflow
,ExpressionTool
, andOperation
has been expanded from:Any[]?
to["null", { type: array, items: [ ProcessRequirement, Any] } ]
. This allows autogenerated CWL parsers to deserialize any of the standard CWL hints instead of forcing the users of those parsers to convert the unserialized hints to normal objects themselves.
1.2.2 Updated Conformance Tests for v1.2.1 §
- Conformance tests are now referred to by their textual identifiers (
id
). Previously this was thelabel
field. Tests without alabel
/id
have been given one. direct_required
,direct_required_nojs
,conditionals_nested_cross_scatter
,conditionals_nested_cross_scatter_nojs
: Marked the workflow outputs as optional to remove ambiguity for these conditionalwhen
tests; allowing conformant CWL runners to be more strict in their interpretation of the typing rules, if they choose so.timelimit_basic_wf
: The timeout has been increased from three seconds to eight seconds to accommodate some runners who count container startup time in the total.timelimit_invalid_wf
: The timing on this test was updated from shorter values to accommodate the startup time of certain container runners, the previous timelimit of 5 seconds was too short, which is why it is now 20 seconds.- The file
tests/wc-tool.cwl
was adapted to produce the same results on BSD systems (like macOS) as GNU/Linux systems. This improved compatibility for the following tests:nested_workflow_noexp
,wf_wc_parseInt
,nested_workflow
,embedded_subworkflow
,step_input_default_value_overriden_2nd_step_noexp
,step_input_default_value_overriden_2nd_step
,step_input_default_value_overriden_2nd_step_null_noexp
,step_input_default_value_overriden_2nd_step_null
,step_input_default_value_overriden_noexp
,step_input_default_value_nosource
,step_input_default_value_nullsource
,step_input_default_value_overriden
,scatter_multi_input_embedded_subworkflow
,workflow_embedded_subworkflow_embedded_subsubworkflow
,workflow_embedded_subworkflow_with_tool_and_subsubworkflow
,workflow_embedded_subworkflow_with_subsubworkflow_and_tool
,scatter_embedded_subworkflow
,step_input_default_value_noexp
,step_input_default_value
,valuefrom_wf_step
.
1.2.3 New Mandatory Conformance tests for v1.2.1 §
output_reference_workflow_input
: Test direct use ofWorkflow
level input fields in the outputs.
1.2.4 New Optional Conformance Tests for v1.2.1 §
1.2.4.1 SchemaDefRequirement
§
schemadef_types_with_import
: TestSchemaDefRequirement
with a workflow, with the$import
under types. It is similar toschemadef-wf
, but the$import
is different.
1.2.4.2 ScatterFeatureRequirement
§
simple_simple_scatter
: Two level nested scatter.dotproduct_simple_scatter
: Two level nested scatter: external dotproduct and internal simple.simple_dotproduct_scatter
: Two level nested scatter: external simple and internal dotproduct.dotproduct_dotproduct_scatter
: Two level nested scatter: external dotproduct and internal dotproduct.flat_crossproduct_simple_scatter
: Two level nested scatter: external flat_crossproduct and internal simple.simple_flat_crossproduct_scatter
: Two level nested scatter: external simple and internal flat_crossproduct.flat_crossproduct_flat_crossproduct_scatter
: Two level nested scatter: external flat_crossproduct and internal flat_crossproduct.nested_crossproduct_simple_scatter
: Two level nested scatter: external nested_crossproduct and internal simple.simple_nested_crossproduct_scatter
: Two level nested scatter: external simple and internal nested_crossproduct.nested_crossproduct_nested_crossproduct_scatter
: Two level nested scatter: external nested_crossproduct and internal nested_crossproduct.
1.2.4.3 StepInputExpressionRequirement
§
default_with_falsey_value
: Confirms that "false"-like (but not 'null') values override any default.
1.3 Introduction to CWL Workflow standard v1.2 §
This specification represents the latest stable release from the
CWL group. Since the v1.1 release, v1.2 introduces the
following updates to the CWL Workflow standard.
Documents should to use cwlVersion: v1.2
to make use of new
syntax and features introduced in v1.2. Existing v1.1 documents
should be trivially updatable by changing cwlVersion
, however
CWL documents that relied on previously undefined or
underspecified behavior may have slightly different behavior in
v1.2. See note about cwl-upgrader
in the changelog.
1.4 Changelog §
- Adds
when
field to WorkflowStep for conditional execution - Adds
pickValue
field to WorkflowStepInput and WorkflowOutputParameter for selecting among null and non-null source values - Add abstract Operation that can be used as a no-op stand-in to describe abstract workflows.
- Workflow, ExpressionTool and
Operation can now express
intent
with an identifier for the type of computational operation. - Clarify there are no limits on the size of file literal
contents
. - When using
loadContents
it now must fail when attempting to load a file greater than 64 KiB instead of silently truncating the data. - Note that only enum and record types can be typedef-ed
- Escaping in string interpolation has been added to the specification along with conformance tests.
- Added discussion of packed documents.
- Specify behavior when
source
is a single-item list and no linkMerge is set. - Added discussion about handling different document versions.
- Added definition of data link
See also the CWL Command Line Tool Description, v1.2 changelog. For other changes since CWL v1.0, see the CWL Workflow Description, v1.1 changelog.
cwl-upgrader
can
be used for upgrading CWL documents from version draft-3
, v1.0
, and v1.1
to v1.2
.
1.5 Purpose §
The Common Workflow Language Command Line Tool Description express workflows for data-intensive science, such as bioinformatics, physics, astronomy, geoscience, and machine learning. This specification is intended to define a data and execution model for Workflows that can be implemented on top of a variety of computing platforms, ranging from an individual workstation to cluster, grid, cloud, and high performance computing systems. Details related to execution of these workflow not laid out in this specification are open to interpretation by the computing platform implementing this specification.
1.6 References to other specifications §
Javascript Object Notation (JSON): http://json.org
JSON Linked Data (JSON-LD): http://json-ld.org
YAML: http://yaml.org
Avro: https://avro.apache.org/docs/1.8.1/spec.html
Internationalized Resource Identifiers (IRIs): https://tools.ietf.org/html/rfc3987
Portable Operating System Interface (POSIX.1-2008): http://pubs.opengroup.org/onlinepubs/9699919799/
Resource Description Framework (RDF): http://www.w3.org/RDF/
XDG Base Directory Specification: https://specifications.freedesktop.org/basedir-spec/basedir-spec-0.6.html
1.7 Scope §
This document describes CWL syntax, execution, and object model. It is not intended to document a CWL specific implementation, however it may serve as a reference for the behavior of conforming implementations.
1.8 Terminology §
The terminology used to describe CWL documents is defined in the Concepts section of the specification. The terms defined in the following list are used in building those definitions and in describing the actions of a CWL implementation:
may: Conforming CWL documents and CWL implementations are permitted but not required to behave as described.
must: Conforming CWL documents and CWL implementations are required to behave as described; otherwise they are in error.
error: A violation of the rules of this specification; results are undefined. Conforming implementations may detect and report an error and may recover from it.
fatal error: A violation of the rules of this specification; results are undefined. Conforming implementations must not continue to execute the current process and may report an error.
at user option: Conforming software may or must (depending on the modal verb in the sentence) behave as described; if it does, it must provide users a means to enable or disable the behavior described.
deprecated: Conforming software may implement a behavior for backwards compatibility. Portable CWL documents should not rely on deprecated behavior. Behavior marked as deprecated may be removed entirely from future revisions of the CWL specification.
1.9 Glossary §
Opaque strings: Opaque strings (or opaque identifiers, opaque values) are nonsensical values that are swapped out with a real value later in the evaluation process. Workflow and tool expressions should not rely on it nor try to parse it.
2. Data model §
2.1 Data concepts §
An object is a data structure equivalent to the "object" type in JSON, consisting of an unordered set of name/value pairs (referred to here as fields) and where the name is a string and the value is a string, number, boolean, array, or object.
A document is a file containing a serialized object, or an array of objects.
A process is a basic unit of computation which accepts input data, performs some computation, and produces output data. Examples include CommandLineTools, Workflows, and ExpressionTools.
An input object is an object describing the inputs to an invocation of a process. The fields of the input object are referred to as "input parameters".
An output object is an object describing the output resulting from an invocation of a process. The fields of the output object are referred to as "output parameters".
An input schema describes the valid format (required fields, data types) for an input object.
An output schema describes the valid format for an output object.
Metadata is information about workflows, tools, or input items.
2.2 Syntax §
CWL documents must consist of an object or array of objects represented using JSON or YAML syntax. Upon loading, a CWL implementation must apply the preprocessing steps described in the Semantic Annotations for Linked Avro Data (SALAD) Specification. An implementation may formally validate the structure of a CWL document using SALAD schemas located at https://github.com/common-workflow-language/cwl-v1.2/
The official IANA media-type for CWL documents is application/cwl
for either JSON or YAML format. For JSON formatted CWL documents,
application/cwl+json
can be used. For specifying a YAML formatted CWL document, one can use
application/cwl+yaml
.
CWL documents commonly reference other CWL documents. Each document
must declare the cwlVersion
of that document. Implementations must
validate against the document's declared version. Implementations
should allow workflows to reference documents of both newer and older
CWL versions (up to the highest version of CWL supported by that
implementation). Where the runtime environment or runtime behavior has
changed between versions, for that portion of the execution an
implementation must provide runtime environment and behavior consistent
with the document's declared version. An implementation must not
expose a newer feature when executing a document that specifies an
older version that does not include that feature.
2.2.1 map §
Note: This section is non-normative.
type: array<ComplexType> | map<
key_field
, ComplexType>
The above syntax in the CWL specifications means there are two or more ways to write the given value.
Option one is an array and is the most verbose option.
Option one generic example:
some_cwl_field:
- key_field: a_complex_type1
field2: foo
field3: bar
- key_field: a_complex_type2
field2: foo2
field3: bar2
- key_field: a_complex_type3
Option one specific example using Workflow.inputs:
array<InputParameter> | map<
id
,type
| InputParameter>
inputs:
- id: workflow_input01
type: string
- id: workflow_input02
type: File
format: http://edamontology.org/format_2572
Option two is enabled by the map<…>
syntax. Instead of an array of entries we
use a mapping, where one field of the ComplexType
(here named key_field
)
becomes the key in the map, and its value is the rest of the ComplexType
without the key field. If all of the other fields of the ComplexType
are
optional and unneeded, then we can indicate this with an empty mapping as the
value: a_complex_type3: {}
Option two generic example:
some_cwl_field:
a_complex_type1: # this was the "key_field" from above
field2: foo
field3: bar
a_complex_type2:
field2: foo2
field3: bar2
a_complex_type3: {} # we accept the default values for "field2" and "field3"
Option two specific example using Workflow.inputs:
array<InputParameter> | map<
id
,type
| InputParameter>
inputs:
workflow_input01:
type: string
workflow_input02:
type: File
format: http://edamontology.org/format_2572
Option two specific example using SoftwareRequirement.packages:
array<SoftwarePackage> | map<
package
,specs
| SoftwarePackage>
hints:
SoftwareRequirement:
packages:
sourmash:
specs: [ https://doi.org/10.21105/joss.00027 ]
screed:
version: [ "1.0" ]
python: {}
Sometimes we have a third and even more compact option denoted like this:
type: array<ComplexType> | map<
key_field
,field2
| ComplexType>
For this example, if we only need the key_field
and field2
when specifying
our ComplexType
s (because the other fields are optional and we are fine with
their default values) then we can abbreviate.
Option three generic example:
some_cwl_field:
a_complex_type1: foo # we accept the default value for field3
a_complex_type2: foo2 # we accept the default value for field3
a_complex_type3: {} # we accept the default values for "field2" and "field3"
Option three specific example using Workflow.inputs:
array<InputParameter> | map<
id
,type
| InputParameter>
inputs:
workflow_input01: string
workflow_input02: File # we accept the default of no File format
Option three specific example using SoftwareRequirement.packages:
array<SoftwarePackage> | map<
package
,specs
| SoftwarePackage>
hints:
SoftwareRequirement:
packages:
sourmash: [ https://doi.org/10.21105/joss.00027 ]
python: {}
What if some entries we want to mix the option 2 and 3? You can!
Mixed option 2 and 3 generic example:
some_cwl_field:
my_complex_type1: foo # we accept the default value for field3
my_complex_type2:
field2: foo2
field3: bar2 # we did not accept the default value for field3
# so we had to use the slightly expanded syntax
my_complex_type3: {} # as before, we accept the default values for both
# "field2" and "field3"
Mixed option 2 and 3 specific example using Workflow.inputs:
array<InputParameter> | map<
id
,type
| InputParameter>
inputs:
workflow_input01: string
workflow_input02: # we use the longer way
type: File # because we want to specify the "format" too
format: http://edamontology.org/format_2572
Mixed option 2 and 3 specific example using SoftwareRequirement.packages:
array<SoftwarePackage> | map<
package
,specs
| SoftwarePackage>
hints:
SoftwareRequirement:
packages:
sourmash: [ https://doi.org/10.21105/joss.00027 ]
screed:
specs: [ https://github.com/dib-lab/screed ]
version: [ "1.0" ]
python: {}
Note: The map<…>
(compact) versions are optional for users, the verbose option #1 is
always allowed, but for presentation reasons option 3 and 2 may be preferred
by human readers. Consumers of CWL must support all three options.
The normative explanation for these variations, aimed at implementers, is in the Schema Salad specification.
2.3 Identifiers §
If an object contains an id
field, that is used to uniquely identify the
object in that document. The value of the id
field must be unique over the
entire document. Identifiers may be resolved relative to either the document
base and/or other identifiers following the rules are described in the
Schema Salad specification.
An implementation may choose to only honor references to object types for
which the id
field is explicitly listed in this specification.
2.4 Document preprocessing §
An implementation must resolve $import and $include directives as described in the Schema Salad specification.
Another transformation defined in Schema salad is simplification of data type definitions.
Type <T>
ending with ?
should be transformed to [<T>, "null"]
.
Type <T>
ending with []
should be transformed to {"type": "array", "items": <T>}
2.5 Extensions and metadata §
Input metadata (for example, a sample identifier) may be represented within a tool or workflow using input parameters which are explicitly propagated to output. Future versions of this specification may define additional facilities for working with input/output metadata.
Implementation extensions not required for correct execution (for example,
fields related to GUI presentation) and metadata about the tool or workflow
itself (for example, authorship for use in citations) may be provided as
additional fields on any object. Such extensions fields must use a namespace
prefix listed in the $namespaces
section of the document as described in the
Schema Salad specification.
It is recommended that concepts from schema.org are used whenever possible.
For the $schemas
field we recommend their RDF encoding: https://schema.org/version/latest/schemaorg-current-https.rdf
Implementation extensions which modify execution semantics must be listed in
the requirements
field.
2.6 Packed documents §
A "packed" CWL document is one that contains multiple process objects. This makes it possible to store and transmit a Workflow together with the processes of each of its steps in a single file.
There are two methods to create packed documents: embedding and $graph. These can be both appear in the same document.
"Embedding" is where the entire process object is copied into the
run
field of a workflow step. If the step process is a subworkflow,
it can be processed recursively to embed the processes of the
subworkflow steps, and so on. Embedded process objects may optionally
include id
fields.
A "$graph" document does not have a process object at the root.
Instead, there is a $graph
field
which consists of a list of process objects. Each process object must
have an id
field. Workflow run
fields cross-reference other
processes in the document $graph
using the id
of the process
object.
All process objects in a packed document must validate and execute as
the cwlVersion
appearing the top level. A cwlVersion
field
appearing anywhere other than the top level must be ignored.
When executing a packed document, the reference to the document may
include a fragment identifier. If present, the fragment identifier
specifies the id
of the process to execute.
If the reference to the packed document does not include a fragment
identifier, the runner must choose the top-level process object as the
entry point. If there is no top-level process object (as in the case
of $graph
) then the runner must choose the process object with an id
of #main
. If there is no #main
object, the runner must return an
error.
3. Execution model §
3.1 Execution concepts §
A parameter is a named symbolic input or output of process, with an associated datatype or schema. During execution, values are assigned to parameters to make the input object or output object used for concrete process invocation.
A CommandLineTool is a process characterized by the execution of a standalone, non-interactive program which is invoked on some input, produces output, and then terminates.
A workflow is a process characterized by multiple subprocess steps, where step outputs are connected to the inputs of downstream steps to form a directed acyclic graph, and independent steps may run concurrently.
A runtime environment is the actual hardware and software environment when executing a command line tool. It includes, but is not limited to, the hardware architecture, hardware resources, operating system, software runtime (if applicable, such as the specific Python interpreter or the specific Java virtual machine), libraries, modules, packages, utilities, and data files required to run the tool.
A workflow platform is a specific hardware and software implementation capable of interpreting CWL documents and executing the processes specified by the document. The responsibilities of the workflow platform may include scheduling process invocation, setting up the necessary runtime environment, making input data available, invoking the tool process, and collecting output.
A data link is a connection from a "Source" parameter to a "Sink" parameter. A data link expresses that when a value becomes available for the source parameter, that value should be copied to the "sink" parameter. Reflecting the direction of data flow, a data link is described as "outgoing" from the source and "inbound" to the sink.
A workflow platform may choose to only implement the Command Line Tool Description part of the CWL specification.
It is intended that the workflow platform has broad leeway outside of this specification to optimize use of computing resources and enforce policies not covered by this specification. Some areas that are currently out of scope for CWL specification but may be handled by a specific workflow platform include:
- Data security and permissions
- Scheduling tool invocations on remote cluster or cloud compute nodes.
- Using virtual machines or operating system containers to manage the runtime (except as described in DockerRequirement).
- Using remote or distributed file systems to manage input and output files.
- Transforming file paths.
- Pausing, resuming or checkpointing processes or workflows.
Conforming CWL processes must not assume anything about the runtime environment or workflow platform unless explicitly declared though the use of process requirements.
3.2 Generic execution process §
The generic execution sequence of a CWL process (including workflows and command line tools) is as follows. Processes are modeled as functions that consume an input object and produce an output object.
- Load input object.
- Load, process and validate a CWL document, yielding one or more process objects.
The
$namespaces
present in the CWL document are also used when validating and processing the input object. - If there are multiple process objects (due to
$graph
) and which process object to start with is not specified in the input object (via acwl:tool
entry) or by any other means (like a URL fragment) then choose the process with theid
of "#main" or "main". - Validate the input object against the
inputs
schema for the process. - Validate process requirements are met.
- Perform any further setup required by the specific process type.
- Execute the process.
- Capture results of process execution into the output object.
- Validate the output object against the
outputs
schema for the process (with the exception of ExpressionTool outputs, which are always considered valid). - Report the output object to the process caller.
3.3 Requirements and hints §
A process requirement modifies the semantics or runtime environment of a process. If an implementation cannot satisfy all requirements, or a requirement is listed which is not recognized by the implementation, it is a fatal error and the implementation must not attempt to run the process, unless overridden at user option.
A hint is similar to a requirement; however, it is not an error if an implementation cannot satisfy all hints. The implementation may report a warning if a hint cannot be satisfied.
Optionally, implementations may allow requirements to be specified in the input
object document as an array of requirements under the field name
cwl:requirements
. If implementations allow this, then such requirements
should be combined with any requirements present in the corresponding Process
as if they were specified there.
Requirements specified in a parent Workflow are inherited by step processes
if they are valid for that step. If the substep is a CommandLineTool
only the InlineJavascriptRequirement
, SchemaDefRequirement
, DockerRequirement
,
SoftwareRequirement
, InitialWorkDirRequirement
, EnvVarRequirement
,
ShellCommandRequirement
, ResourceRequirement
, LoadListingRequirement
,
WorkReuse
, NetworkAccess
, InplaceUpdateRequirement
, ToolTimeLimit
are valid.
As good practice, it is best to have process requirements be self-contained, such that each process can run successfully by itself.
If the same process requirement appears at different levels of the
workflow, the most specific instance of the requirement is used, that is,
an entry in requirements
on a process implementation such as
CommandLineTool will take precedence over an entry in requirements
specified in a workflow step, and an entry in requirements
on a workflow
step takes precedence over the workflow. Entries in hints
are resolved
the same way.
Requirements override hints. If a process implementation provides a
process requirement in hints
which is also provided in requirements
by
an enclosing workflow or workflow step, the enclosing requirements
takes
precedence.
3.4 Parameter references §
Parameter references are denoted by the syntax $(...)
and may be used in any
field permitting the pseudo-type Expression
, as specified by this document.
Conforming implementations must support parameter references. Parameter
references use the following subset of
Javascript/ECMAScript 5.1
syntax, but they are designed to not require a Javascript engine for evaluation.
In the following BNF grammar,
character classes and grammar rules are denoted in {}
, -
denotes
exclusion from a character class, (())
denotes grouping, |
denotes
alternates, trailing *
denotes zero or more repeats, +
denotes one
or more repeats, and all other characters are literal values.
symbol |
::= |
{Unicode alphanumeric}+ |
singleq |
::= |
[' (( {character - { | \ ' \} } ))* '] |
doubleq |
::= |
[" (( {character - { | \ " \} } ))* "] |
index |
::= |
[ {decimal digit}+ ] |
segment |
::= |
. {symbol} | {singleq} | {doubleq} | {index} |
parameter reference |
::= |
( {symbol} {segment}*) |
Use the following algorithm to resolve a parameter reference:
- Match the leading symbol as the key
- If the key is the special value 'null' then the value of the parameter reference is 'null'. If the key is 'null' it must be the only symbol in the parameter reference.
- Look up the key in the parameter context (described below) to get the current value. It is an error if the key is not found in the parameter context.
- If there are no subsequent segments, terminate and return current value
- Else, match the next segment
- Extract the symbol, string, or index from the segment as the key
- Look up the key in current value and assign as new current value.
- If the key is a symbol or string, the current value must be an object.
- If the key is an index, the current value must be an array or string.
- If the next key is the last key and it has the special value 'length' and the current value is an array, the value of the parameter reference is the length of the array. If the value 'length' is encountered in other contexts, normal evaluation rules apply.
- It is an error if the key does not match the required type, or the key is not found or out of range.
- Repeat steps 3-8
The root namespace is the parameter context. The following parameters must be provided:
inputs
: The input object to the current Process.self
: A context-specific value. The contextual values for 'self' are documented for specific fields elsewhere in this specification. If a contextual value of 'self' is not documented for a field, it must be 'null'.runtime
: An object containing configuration details. Specific to the process type. An implementation may provide opaque strings for any or all fields ofruntime
. These must be filled in by the platform after processing the Tool but before actual execution. Parameter references and expressions may only use the literal string value of the field and must not perform computation on the contents, except where noted otherwise.
If the value of a field has no leading or trailing non-whitespace characters around a parameter reference, the effective value of the field becomes the value of the referenced parameter, preserving the return type.
3.4.1 String interpolation §
If the value of a field has non-whitespace leading or trailing characters around a parameter reference, it is subject to string interpolation. The effective value of the field is a string containing the leading characters, followed by the string value of the parameter reference, followed by the trailing characters. The string value of the parameter reference is its textual JSON representation with the following rules:
- Strings are replaced the literal text of the string, any escaped characters replaced by the literal characters they represent, and there are no leading or trailing quotes.
- Objects entries are sorted by key
Multiple parameter references may appear in a single field. This case must be treated as a string interpolation. After interpolating the first parameter reference, interpolation must be recursively applied to the trailing characters to yield the final string value.
When text embedded in a CWL file represents code for another
programming language, the use of $(...)
(and ${...}
in the case of
expressions) may conflict with the syntax of that language. For
example, when writing shell scripts, $(...)
is used to execute a
command in a subshell and replace a portion of the command line with
the standard output of that command.
The following escaping rules apply. The scanner makes a single pass from start to end with 3-character lookahead. After performing a replacement scanning resumes at the next character following the replaced substring.
- The substrings
\$(
and\${
are replaced by$(
and${
respectively. No parameter or expression evaluation interpolation occurs. - A double backslash
\\
is replaced by a single backslash\
. - A substring starting with a backslash that does not match one of the previous rules is left unchanged.
3.5 Expressions (Optional) §
An expression is a fragment of Javascript/ECMAScript 5.1 code evaluated by the workflow platform to affect the inputs, outputs, or behavior of a process. In the generic execution sequence, expressions may be evaluated during step 5 (process setup), step 6 (execute process), and/or step 7 (capture output). Expressions are distinct from regular processes in that they are intended to modify the behavior of the workflow itself rather than perform the primary work of the workflow.
Expressions in CWL are an optional feature and are not required to be implemented by all consumers of CWL documents. They should be used sparingly, when there is no other way to achieve the desired outcome. Excessive use of expressions may be a signal that other refactoring of the tools or workflows would benefit the author, runtime, and users of the CWL document in question.
To declare the use of expressions, the document must include the process
requirement InlineJavascriptRequirement
. Expressions may be used in any
field permitting the pseudo-type Expression
, as specified by this
document.
Expressions are denoted by the syntax $(...)
or ${...}
.
A code fragment wrapped in the $(...)
syntax must be evaluated as a
ECMAScript expression.
A code fragment wrapped in the ${...}
syntax must be evaluated as a
ECMAScript function body
for an anonymous, zero-argument function. This means the code will be
evaluated as (function() { ... })()
.
Expressions must return a valid JSON data type: one of null, string, number,
boolean, array, object. Other return values must result in a
permanentFailure
. Implementations must permit any syntactically valid
Javascript and account for nesting of parenthesis or braces and that strings
that may contain parenthesis or braces when scanning for expressions.
The runtime must include any code defined in the "expressionLib" field of InlineJavascriptRequirement prior to executing the actual expression.
Before executing the expression, the runtime must initialize as global variables the fields of the parameter context described above.
The effective value of the field after expression evaluation follows the same rules as parameter references discussed above. Multiple expressions may appear in a single field.
Expressions must be evaluated in an isolated context (a "sandbox") which permits no side effects to leak outside the context. Expressions also must be evaluated in Javascript strict mode.
The order in which expressions are evaluated is undefined except where otherwise noted in this document.
An implementation may choose to implement parameter references by evaluating as a Javascript expression. The results of evaluating parameter references must be identical whether implemented by Javascript evaluation or some other means.
Implementations may apply other limits, such as process isolation, timeouts, and operating system containers/jails to minimize the security risks associated with running untrusted code embedded in a CWL document.
Javascript exceptions thrown from a CWL expression must result in a
permanentFailure
of the CWL process.
3.6 Executing CWL documents as scripts §
By convention, a CWL document may begin with #!/usr/bin/env cwl-runner
and be marked as executable (the POSIX "+x" permission bits) to enable it
to be executed directly. A workflow platform may support this mode of
operation; if so, it must provide cwl-runner
as an alias for the
platform's CWL implementation.
A CWL input object document may similarly begin with #!/usr/bin/env cwl-runner
and be marked as executable. In this case, the input object
must include the field cwl:tool
supplying an IRI to the default CWL
document that should be executed using the fields of the input object as
input parameters.
The cwl-runner
interface is required for conformance testing and is
documented in cwl-runner.cwl.
3.7 Discovering CWL documents on a local filesystem §
To discover CWL documents look in the following locations:
For each value in the XDG_DATA_DIRS
environment variable (which is a :
colon
separated list), check the ./commonwl
subdirectory. If XDG_DATA_DIRS
is
unset or empty, then check using the default value for XDG_DATA_DIRS
:
/usr/local/share/:/usr/share/
(That is to say, check /usr/share/commonwl/
and /usr/local/share/commonwl/
)
Then check $XDG_DATA_HOME/commonwl/
.
If the XDG_DATA_HOME
environment variable is unset, its default value is
$HOME/.local/share
(That is to say, check $HOME/.local/share/commonwl
)
$XDG_DATA_HOME
and $XDG_DATA_DIRS
are from the XDG Base Directory
Specification
4. Workflow §
A workflow describes a set of steps and the dependencies between those steps. When a step produces output that will be consumed by a second step, the first step is a dependency of the second step.
When there is a dependency, the workflow engine must execute the preceding step and wait for it to successfully produce output before executing the dependent step. If two steps are defined in the workflow graph that are not directly or indirectly dependent, these steps are independent, and may execute in any order or execute concurrently. A workflow is complete when all steps have been executed.
Dependencies between parameters are expressed using the source
field on workflow step input parameters and
outputSource
field on workflow output
parameters.
The source
field on each workflow step input parameter expresses
the data links that contribute to the value of the step input
parameter (the "sink"). A workflow step can only begin execution
when every data link connected to a step has been fulfilled.
The outputSource
field on each workflow step input parameter
expresses the data links that contribute to the value of the
workflow output parameter (the "sink"). Workflow execution cannot
complete successfully until every data link connected to an output
parameter has been fulfilled.
Workflow success and failure §
A completed step must result in one of success
, temporaryFailure
or
permanentFailure
states. An implementation may choose to retry a step
execution which resulted in temporaryFailure
. An implementation may
choose to either continue running other steps of a workflow, or terminate
immediately upon permanentFailure
.
If any step of a workflow execution results in
permanentFailure
, then the workflow status ispermanentFailure
.If one or more steps result in
temporaryFailure
and all other steps completesuccess
or are not executed, then the workflow status istemporaryFailure
.If all workflow steps are executed and complete with
success
, then the workflow status issuccess
.
Extensions §
ScatterFeatureRequirement and SubworkflowFeatureRequirement are available as standard extensions to core workflow semantics.
Fields
inputs
Defines the input parameters of the process. The process is ready to run when all required input parameters are associated with concrete values. Input parameters include a schema for each parameter which is used to validate the input object. It may also be used to build a user interface for constructing the input object.
When accepting an input object, all input parameters must have a value.
If an input parameter is missing from the input object, it must be
assigned a value of null
(or the value of default
for that
parameter, if provided) for the purposes of validation and evaluation
of expressions.
outputs
Defines the parameters representing the output of the process. May be used to generate and/or validate the output object.
class
Workflow
steps
The individual steps that make up the workflow. Each step is executed when all of its input data links are fulfilled. An implementation may choose to execute the steps in a different order than listed and/or execute steps concurrently, provided that dependencies between steps are met.
id
The unique identifier for this object.
Only useful for $graph
at Process
level. Should not be exposed
to users in graphical or terminal user interfaces.
doc
A documentation string for this object, or an array of strings which should be concatenated.
requirements
map<
class
, InlineJavascriptRequirement | SchemaDefRequirement | LoadListingRequirement | DockerRequirement | SoftwareRequirement | InitialWorkDirRequirement | EnvVarRequirement | ShellCommandRequirement | ResourceRequirement | WorkReuse | NetworkAccess | InplaceUpdateRequirement | ToolTimeLimit | SubworkflowFeatureRequirement | ScatterFeatureRequirement | MultipleInputFeatureRequirement | StepInputExpressionRequirement>Declares requirements that apply to either the runtime environment or the workflow engine that must be met in order to execute this process. If an implementation cannot satisfy all requirements, or a requirement is listed which is not recognized by the implementation, it is a fatal error and the implementation must not attempt to run the process, unless overridden at user option.
hints
map<
class
, InlineJavascriptRequirement | SchemaDefRequirement | LoadListingRequirement | CommandLineTool.html#DockerRequirement | SoftwareRequirement | InitialWorkDirRequirement | CommandLineTool.html#EnvVarRequirement | CommandLineTool.html#ShellCommandRequirement | CommandLineTool.html#ResourceRequirement | WorkReuse | NetworkAccess | InplaceUpdateRequirement | ToolTimeLimit | SubworkflowFeatureRequirement | ScatterFeatureRequirement | MultipleInputFeatureRequirement | StepInputExpressionRequirement | Any>Declares hints applying to either the runtime environment or the workflow engine that may be helpful in executing this process. It is not an error if an implementation cannot satisfy all hints, however the implementation may report a warning.
cwlVersion
CWL document version. Always required at the document root. Not required for a Process embedded inside another Process.
intent
An identifier for the type of computational operation, of this Process.
Especially useful for Operation
, but can also be used for
CommandLineTool
,
Workflow
, or ExpressionTool.
If provided, then this must be an IRI of a concept node that represents the type of operation, preferably defined within an ontology.
For example, in the domain of bioinformatics, one can use an IRI from the EDAM Ontology's Operation concept nodes, like Alignment, or Clustering; or a more specific Operation concept like Split read mapping.
4.1 WorkflowInputParameter §
Fields
type
Specify valid types of data that may be assigned to this parameter.
secondaryFiles
Only valid when type: File
or is an array of items: File
.
Provides a pattern or expression specifying files or
directories that should be included alongside the primary
file. Secondary files may be required or optional. When not
explicitly specified, secondary files specified for inputs
are required and outputs
are optional. An implementation
must include matching Files and Directories in the
secondaryFiles
property of the primary file. These Files
and Directories must be transferred and staged alongside the
primary file. An implementation may fail workflow execution
if a required secondary file does not exist.
If the value is an expression, the value of self
in the expression
must be the primary input or output File object to which this binding
applies. The basename
, nameroot
and nameext
fields must be
present in self
. For CommandLineTool
outputs the path
field must
also be present. The expression must return a filename string relative
to the path to the primary File, a File or Directory object with either
path
or location
and basename
fields set, or an array consisting
of strings or File or Directory objects. It is legal to reference an
unchanged File or Directory object taken from input as a secondaryFile.
The expression may return "null" in which case there is no secondaryFile
from that expression.
To work on non-filename-preserving storage systems, portable tool
descriptions should avoid constructing new values from location
, but
should construct relative references using basename
or nameroot
instead.
If a value in secondaryFiles
is a string that is not an expression,
it specifies that the following pattern should be applied to the path
of the primary file to yield a filename relative to the primary File:
- If string ends with
?
character, remove the last?
and mark the resulting secondary file as optional. - If string begins with one or more caret
^
characters, for each caret, remove the last file extension from the path (the last period.
and all following characters). If there are no file extensions, the path is unchanged. - Append the remainder of the string to the end of the file path.
streamable
Only valid when type: File
or is an array of items: File
.
A value of true
indicates that the file is read or written
sequentially without seeking. An implementation may use this flag to
indicate whether it is valid to stream file contents using a named
pipe. Default: false
.
doc
A documentation string for this object, or an array of strings which should be concatenated.
format
Only valid when type: File
or is an array of items: File
.
This must be one or more IRIs of concept nodes that represents file formats which are allowed as input to this parameter, preferably defined within an ontology. If no ontology is available, file formats may be tested by exact match.
loadContents
Only valid when type: File
or is an array of items: File
.
If true, the file (or each file in the array) must be a UTF-8
text file 64 KiB or smaller, and the implementation must read
the entire contents of the file (or file array) and place it
in the contents
field of the File object for use by
expressions. If the size of the file is greater than 64 KiB,
the implementation must raise a fatal error.
loadListing
Only valid when type: Directory
or is an array of items: Directory
.
Specify the desired behavior for loading the listing
field of
a Directory object for use by expressions.
The order of precedence for loadListing is:
loadListing
on an individual parameter- Inherited from
LoadListingRequirement
- By default:
no_listing
default
The default value to use for this parameter if the parameter is missing
from the input object, or if the value of the parameter in the input
object is null
. Default values are applied before evaluating expressions
(e.g. dependent valueFrom
fields).
inputBinding
Deprecated. Preserved for v1.0 backwards compatibility. Will be removed in
CWL v2.0. Use WorkflowInputParameter.loadContents
instead.
4.1.1 SecondaryFileSchema §
Secondary files are specified using the following micro-DSL for secondary files:
- If the value is a string, it is transformed to an object with two fields
pattern
andrequired
- By default, the value of
required
isnull
(this indicates default behavior, which may be based on the context) - If the value ends with a question mark
?
the question mark is stripped off and the value of the fieldrequired
is set toFalse
- The remaining value is assigned to the field
pattern
For implementation details and examples, please see this section in the Schema Salad specification.
Fields
pattern
Provides a pattern or expression specifying files or directories that should be included alongside the primary file.
If the value is an expression, the value of self
in the
expression must be the primary input or output File object to
which this binding applies. The basename
, nameroot
and
nameext
fields must be present in self
. For
CommandLineTool
inputs the location
field must also be
present. For CommandLineTool
outputs the path
field must
also be present. If secondary files were included on an input
File object as part of the Process invocation, they must also
be present in secondaryFiles
on self
.
The expression must return either: a filename string relative
to the path to the primary File, a File or Directory object
(class: File
or class: Directory
) with either location
(for inputs) or path
(for outputs) and basename
fields
set, or an array consisting of strings or File or Directory
objects as previously described.
It is legal to use location
from a File or Directory object
passed in as input, including location
from secondary files
on self
. If an expression returns a File object with the
same location
but a different basename
as a secondary file
that was passed in, the expression result takes precedence.
Setting the basename with an expression this way affects the
path
where the secondary file will be staged to in the
CommandLineTool.
The expression may return "null" in which case there is no secondary file from that expression.
To work on non-filename-preserving storage systems, portable
tool descriptions should treat location
as an
opaque identifier and avoid constructing new
values from location
, but should construct relative references
using basename
or nameroot
instead, or propagate location
from defined inputs.
If a value in secondaryFiles
is a string that is not an expression,
it specifies that the following pattern should be applied to the path
of the primary file to yield a filename relative to the primary File:
- If string ends with
?
character, remove the last?
and mark the resulting secondary file as optional. - If string begins with one or more caret
^
characters, for each caret, remove the last file extension from the path (the last period.
and all following characters). If there are no file extensions, the path is unchanged. - Append the remainder of the string to the end of the file path.
required
An implementation must not fail workflow execution if required
is
set to false
and the expected secondary file does not exist.
Default value for required
field is true
for secondary files on
input and false
for secondary files on output.
4.1.2 Expression §
'Expression' is not a real type. It indicates that a field must allow runtime parameter references. If InlineJavascriptRequirement is declared and supported by the platform, the field must also allow Javascript expressions.
Symbols
symbol | description |
---|---|
ExpressionPlaceholder |
4.1.3 LoadListingEnum §
Specify the desired behavior for loading the listing
field of
a Directory object for use by expressions.
Symbols
symbol | description |
---|---|
no_listing | Do not load the directory listing. |
shallow_listing | Only load the top level listing, do not recurse into subdirectories. |
deep_listing | Load the directory listing and recursively load all subdirectories as well. |
4.1.4 File §
Represents a file (or group of files when secondaryFiles
is provided) that
will be accessible by tools using standard POSIX file system call API such as
open(2) and read(2).
Files are represented as objects with class
of File
. File objects have
a number of properties that provide metadata about the file.
The location
property of a File is a IRI that uniquely identifies the
file. Implementations must support the file://
IRI scheme and may support
other schemes such as http://
and https://
. The value of location
may also be a
relative reference, in which case it must be resolved relative to the IRI
of the document it appears in. Alternately to location
, implementations
must also accept the path
property on File, which must be a filesystem
path available on the same host as the CWL runner (for inputs) or the
runtime environment of a command line tool execution (for command line tool
outputs).
If no location
or path
is specified, a file object must specify
contents
with the UTF-8 text content of the file. This is a "file
literal". File literals do not correspond to external resources, but are
created on disk with contents
with when needed for executing a tool.
Where appropriate, expressions can return file literals to define new files
on a runtime. The maximum size of contents
is 64 kilobytes.
The basename
property defines the filename on disk where the file is
staged. This may differ from the resource name. If not provided,
basename
must be computed from the last path part of location
and made
available to expressions.
The secondaryFiles
property is a list of File or Directory objects that
must be staged in the same directory as the primary file. It is an error
for file names to be duplicated in secondaryFiles
.
The size
property is the size in bytes of the File. It must be computed
from the resource and made available to expressions. The checksum
field
contains a cryptographic hash of the file content for use it verifying file
contents. Implementations may, at user option, enable or disable
computation of the checksum
field for performance or other reasons.
However, the ability to compute output checksums is required to pass the
CWL conformance test suite.
When executing a CommandLineTool, the files and secondary files may be
staged to an arbitrary directory, but must use the value of basename
for
the filename. The path
property must be file path in the context of the
tool execution runtime (local to the compute node, or within the executing
container). All computed properties should be available to expressions.
File literals also must be staged and path
must be set.
When collecting CommandLineTool outputs, glob
matching returns file paths
(with the path
property) and the derived properties. This can all be
modified by outputEval
. Alternately, if the file cwl.output.json
is
present in the output, outputBinding
is ignored.
File objects in the output must provide either a location
IRI or a path
property in the context of the tool execution runtime (local to the compute
node, or within the executing container).
When evaluating an ExpressionTool, file objects must be referenced via
location
(the expression tool does not have access to files on disk so
path
is meaningless) or as file literals. It is legal to return a file
object with an existing location
but a different basename
. The
loadContents
field of ExpressionTool inputs behaves the same as on
CommandLineTool inputs, however it is not meaningful on the outputs.
An ExpressionTool may forward file references from input to output by using
the same value for location
.
Fields
class
File
Must be File
to indicate this object describes a file.
location
An IRI that identifies the file resource. This may be a relative reference, in which case it must be resolved using the base IRI of the document. The location may refer to a local or remote resource; the implementation must use the IRI to retrieve file content. If an implementation is unable to retrieve the file content stored at a remote resource (due to unsupported protocol, access denied, or other issue) it must signal an error.
If the location
field is not provided, the contents
field must be
provided. The implementation must assign a unique identifier for
the location
field.
If the path
field is provided but the location
field is not, an
implementation may assign the value of the path
field to location
,
then follow the rules above.
path
The local host path where the File is available when a CommandLineTool is
executed. This field must be set by the implementation. The final
path component must match the value of basename
. This field
must not be used in any other context. The command line tool being
executed must be able to access the file at path
using the POSIX
open(2)
syscall.
As a special case, if the path
field is provided but the location
field is not, an implementation may assign the value of the path
field to location
, and remove the path
field.
If the path
contains POSIX shell metacharacters
(|
,&
, ;
, <
, >
, (
,)
, $
,`
, \
, "
, '
,
<space>
, <tab>
, and <newline>
) or characters
not allowed
for Internationalized Domain Names for Applications
then implementations may terminate the process with a
permanentFailure
.
basename
The base name of the file, that is, the name of the file without any
leading directory path. The base name must not contain a slash /
.
If not provided, the implementation must set this field based on the
location
field by taking the final path component after parsing
location
as an IRI. If basename
is provided, it is not required to
match the value from location
.
When this file is made available to a CommandLineTool, it must be named
with basename
, i.e. the final component of the path
field must match
basename
.
dirname
The name of the directory containing file, that is, the path leading up
to the final slash in the path such that dirname + '/' + basename == path
.
The implementation must set this field based on the value of path
prior to evaluating parameter references or expressions in a
CommandLineTool document. This field must not be used in any other
context.
nameroot
The basename root such that nameroot + nameext == basename
, and
nameext
is empty or begins with a period and contains at most one
period. For the purposes of path splitting leading periods on the
basename are ignored; a basename of .cshrc
will have a nameroot of
.cshrc
.
The implementation must set this field automatically based on the value
of basename
prior to evaluating parameter references or expressions.
nameext
The basename extension such that nameroot + nameext == basename
, and
nameext
is empty or begins with a period and contains at most one
period. Leading periods on the basename are ignored; a basename of
.cshrc
will have an empty nameext
.
The implementation must set this field automatically based on the value
of basename
prior to evaluating parameter references or expressions.
checksum
Optional hash code for validating file integrity. Currently, must be in the form "sha1$ + hexadecimal string" using the SHA-1 algorithm.
secondaryFiles
A list of additional files or directories that are associated with the
primary file and must be transferred alongside the primary file.
Examples include indexes of the primary file, or external references
which must be included when loading primary document. A file object
listed in secondaryFiles
may itself include secondaryFiles
for
which the same rules apply.
format
The format of the file: this must be an IRI of a concept node that represents the file format, preferably defined within an ontology. If no ontology is available, file formats may be tested by exact match.
Reasoning about format compatibility must be done by checking that an
input file format is the same, owl:equivalentClass
or
rdfs:subClassOf
the format required by the input parameter.
owl:equivalentClass
is transitive with rdfs:subClassOf
, e.g. if
<B> owl:equivalentClass <C>
and <B> owl:subclassOf <A>
then infer
<C> owl:subclassOf <A>
.
File format ontologies may be provided in the "$schemas" metadata at the
root of the document. If no ontologies are specified in $schemas
, the
runtime may perform exact file format matches.
contents
File contents literal.
If neither location
nor path
is provided, contents
must be
non-null. The implementation must assign a unique identifier for the
location
field. When the file is staged as input to CommandLineTool,
the value of contents
must be written to a file.
If contents
is set as a result of a Javascript expression,
an entry
in InitialWorkDirRequirement
, or read in from
cwl.output.json
, there is no specified upper limit on the
size of contents
. Implementations may have practical limits
on the size of contents
based on memory and storage
available to the workflow runner or other factors.
If the loadContents
field of an InputParameter
or
OutputParameter
is true, and the input or output File object
location
is valid, the file must be a UTF-8 text file 64 KiB
or smaller, and the implementation must read the entire
contents of the file and place it in the contents
field. If
the size of the file is greater than 64 KiB, the
implementation must raise a fatal error.
4.1.4.1 Directory §
Represents a directory to present to a command line tool.
Directories are represented as objects with class
of Directory
. Directory objects have
a number of properties that provide metadata about the directory.
The location
property of a Directory is a IRI that uniquely identifies
the directory. Implementations must support the file:// IRI scheme and may
support other schemes such as http://. Alternately to location
,
implementations must also accept the path
property on Directory, which
must be a filesystem path available on the same host as the CWL runner (for
inputs) or the runtime environment of a command line tool execution (for
command line tool outputs).
A Directory object may have a listing
field. This is a list of File and
Directory objects that are contained in the Directory. For each entry in
listing
, the basename
property defines the name of the File or
Subdirectory when staged to disk. If listing
is not provided, the
implementation must have some way of fetching the Directory listing at
runtime based on the location
field.
If a Directory does not have location
, it is a Directory literal. A
Directory literal must provide listing
. Directory literals must be
created on disk at runtime as needed.
The resources in a Directory literal do not need to have any implied
relationship in their location
. For example, a Directory listing may
contain two files located on different hosts. It is the responsibility of
the runtime to ensure that those files are staged to disk appropriately.
Secondary files associated with files in listing
must also be staged to
the same Directory.
When executing a CommandLineTool, Directories must be recursively staged
first and have local values of path
assigned.
Directory objects in CommandLineTool output must provide either a
location
IRI or a path
property in the context of the tool execution
runtime (local to the compute node, or within the executing container).
An ExpressionTool may forward file references from input to output by using
the same value for location
.
Name conflicts (the same basename
appearing multiple times in listing
or in any entry in secondaryFiles
in the listing) is a fatal error.
Fields
class
Directory
Must be Directory
to indicate this object describes a Directory.
location
An IRI that identifies the directory resource. This may be a relative
reference, in which case it must be resolved using the base IRI of the
document. The location may refer to a local or remote resource. If
the listing
field is not set, the implementation must use the
location IRI to retrieve directory listing. If an implementation is
unable to retrieve the directory listing stored at a remote resource (due to
unsupported protocol, access denied, or other issue) it must signal an
error.
If the location
field is not provided, the listing
field must be
provided. The implementation must assign a unique identifier for
the location
field.
If the path
field is provided but the location
field is not, an
implementation may assign the value of the path
field to location
,
then follow the rules above.
path
The local path where the Directory is made available prior to executing a
CommandLineTool. This must be set by the implementation. This field
must not be used in any other context. The command line tool being
executed must be able to access the directory at path
using the POSIX
opendir(2)
syscall.
If the path
contains POSIX shell metacharacters
(|
,&
, ;
, <
, >
, (
,)
, $
,`
, \
, "
, '
,
<space>
, <tab>
, and <newline>
) or characters
not allowed
for Internationalized Domain Names for Applications
then implementations may terminate the process with a
permanentFailure
.
basename
The base name of the directory, that is, the name of the file without any
leading directory path. The base name must not contain a slash /
.
If not provided, the implementation must set this field based on the
location
field by taking the final path component after parsing
location
as an IRI. If basename
is provided, it is not required to
match the value from location
.
When this file is made available to a CommandLineTool, it must be named
with basename
, i.e. the final component of the path
field must match
basename
.
listing
List of files or subdirectories contained in this directory. The name
of each file or subdirectory is determined by the basename
field of
each File
or Directory
object. It is an error if a File
shares a
basename
with any other entry in listing
. If two or more
Directory
object share the same basename
, this must be treated as
equivalent to a single subdirectory with the listings recursively
merged.
4.1.5 Any §
The Any type validates for any non-null value.
Symbols
symbol | description |
---|---|
Any |
4.1.6 CWLType §
Extends primitive types with the concept of a file and directory as a builtin type.
Symbols
symbol | description |
---|---|
null | no value |
boolean | a binary value |
int | 32-bit signed integer |
long | 64-bit signed integer |
float | single precision (32-bit) IEEE 754 floating-point number |
double | double precision (64-bit) IEEE 754 floating-point number |
string | Unicode character sequence |
null | no value |
boolean | a binary value |
int | 32-bit signed integer |
long | 64-bit signed integer |
float | single precision (32-bit) IEEE 754 floating-point number |
double | double precision (64-bit) IEEE 754 floating-point number |
string | Unicode character sequence |
File | A File object |
Directory | A Directory object |
4.1.7 InputRecordSchema §
Fields
type
record
Must be record
fields
Defines the fields of the record.
doc
A documentation string for this object, or an array of strings which should be concatenated.
4.1.8 InputRecordField §
Fields
type
The field type
doc
A documentation string for this object, or an array of strings which should be concatenated.
secondaryFiles
Only valid when type: File
or is an array of items: File
.
Provides a pattern or expression specifying files or
directories that should be included alongside the primary
file. Secondary files may be required or optional. When not
explicitly specified, secondary files specified for inputs
are required and outputs
are optional. An implementation
must include matching Files and Directories in the
secondaryFiles
property of the primary file. These Files
and Directories must be transferred and staged alongside the
primary file. An implementation may fail workflow execution
if a required secondary file does not exist.
If the value is an expression, the value of self
in the expression
must be the primary input or output File object to which this binding
applies. The basename
, nameroot
and nameext
fields must be
present in self
. For CommandLineTool
outputs the path
field must
also be present. The expression must return a filename string relative
to the path to the primary File, a File or Directory object with either
path
or location
and basename
fields set, or an array consisting
of strings or File or Directory objects. It is legal to reference an
unchanged File or Directory object taken from input as a secondaryFile.
The expression may return "null" in which case there is no secondaryFile
from that expression.
To work on non-filename-preserving storage systems, portable tool
descriptions should avoid constructing new values from location
, but
should construct relative references using basename
or nameroot
instead.
If a value in secondaryFiles
is a string that is not an expression,
it specifies that the following pattern should be applied to the path
of the primary file to yield a filename relative to the primary File:
- If string ends with
?
character, remove the last?
and mark the resulting secondary file as optional. - If string begins with one or more caret
^
characters, for each caret, remove the last file extension from the path (the last period.
and all following characters). If there are no file extensions, the path is unchanged. - Append the remainder of the string to the end of the file path.
streamable
Only valid when type: File
or is an array of items: File
.
A value of true
indicates that the file is read or written
sequentially without seeking. An implementation may use this flag to
indicate whether it is valid to stream file contents using a named
pipe. Default: false
.
format
Only valid when type: File
or is an array of items: File
.
This must be one or more IRIs of concept nodes that represents file formats which are allowed as input to this parameter, preferably defined within an ontology. If no ontology is available, file formats may be tested by exact match.
loadContents
Only valid when type: File
or is an array of items: File
.
If true, the file (or each file in the array) must be a UTF-8
text file 64 KiB or smaller, and the implementation must read
the entire contents of the file (or file array) and place it
in the contents
field of the File object for use by
expressions. If the size of the file is greater than 64 KiB,
the implementation must raise a fatal error.
loadListing
Only valid when type: Directory
or is an array of items: Directory
.
Specify the desired behavior for loading the listing
field of
a Directory object for use by expressions.
The order of precedence for loadListing is:
loadListing
on an individual parameter- Inherited from
LoadListingRequirement
- By default:
no_listing
4.1.8.1 InputEnumSchema §
Fields
type
enum
Must be enum
4.1.8.2 InputArraySchema §
Fields
items
Defines the type of the array elements.
type
array
Must be array
doc
A documentation string for this object, or an array of strings which should be concatenated.
4.1.9 InputBinding §
Fields
loadContents
Use of loadContents
in InputBinding
is deprecated.
Preserved for v1.0 backwards compatibility. Will be removed in
CWL v2.0. Use InputParameter.loadContents
instead.
4.2 WorkflowOutputParameter §
Describe an output parameter of a workflow. The parameter must be connected to one or more parameters defined in the workflow that will provide the value of the output parameter. It is legal to connect a WorkflowInputParameter to a WorkflowOutputParameter.
See WorkflowStepInput for discussion of
linkMerge
and pickValue
.
Fields
type
Specify valid types of data that may be assigned to this parameter.
secondaryFiles
Only valid when type: File
or is an array of items: File
.
Provides a pattern or expression specifying files or
directories that should be included alongside the primary
file. Secondary files may be required or optional. When not
explicitly specified, secondary files specified for inputs
are required and outputs
are optional. An implementation
must include matching Files and Directories in the
secondaryFiles
property of the primary file. These Files
and Directories must be transferred and staged alongside the
primary file. An implementation may fail workflow execution
if a required secondary file does not exist.
If the value is an expression, the value of self
in the expression
must be the primary input or output File object to which this binding
applies. The basename
, nameroot
and nameext
fields must be
present in self
. For CommandLineTool
outputs the path
field must
also be present. The expression must return a filename string relative
to the path to the primary File, a File or Directory object with either
path
or location
and basename
fields set, or an array consisting
of strings or File or Directory objects. It is legal to reference an
unchanged File or Directory object taken from input as a secondaryFile.
The expression may return "null" in which case there is no secondaryFile
from that expression.
To work on non-filename-preserving storage systems, portable tool
descriptions should avoid constructing new values from location
, but
should construct relative references using basename
or nameroot
instead.
If a value in secondaryFiles
is a string that is not an expression,
it specifies that the following pattern should be applied to the path
of the primary file to yield a filename relative to the primary File:
- If string ends with
?
character, remove the last?
and mark the resulting secondary file as optional. - If string begins with one or more caret
^
characters, for each caret, remove the last file extension from the path (the last period.
and all following characters). If there are no file extensions, the path is unchanged. - Append the remainder of the string to the end of the file path.
streamable
Only valid when type: File
or is an array of items: File
.
A value of true
indicates that the file is read or written
sequentially without seeking. An implementation may use this flag to
indicate whether it is valid to stream file contents using a named
pipe. Default: false
.
doc
A documentation string for this object, or an array of strings which should be concatenated.
format
Only valid when type: File
or is an array of items: File
.
This is the file format that will be assigned to the output File object.
outputSource
Specifies one or more names of an output from a workflow step (in the form
step_name/output_name
with a /
separator`), or a workflow input name,
that supply their value(s) to the output parameter.
the output parameter. It is valid to reference workflow level inputs
here.
linkMerge
The method to use to merge multiple sources into a single array. If not specified, the default method is "merge_nested".
pickValue
The method to use to choose non-null elements among multiple sources.
4.2.1 LinkMergeMethod §
The input link merge method, described in WorkflowStepInput.
Symbols
symbol | description |
---|---|
merge_nested | |
merge_flattened |
4.2.2 PickValueMethod §
Picking non-null values among inbound data links, described in WorkflowStepInput.
Symbols
symbol | description |
---|---|
first_non_null | |
the_only_non_null | |
all_non_null |
4.2.3 OutputRecordSchema §
Fields
type
record
Must be record
fields
Defines the fields of the record.
doc
A documentation string for this object, or an array of strings which should be concatenated.
4.2.4 OutputRecordField §
Fields
type
The field type
doc
A documentation string for this object, or an array of strings which should be concatenated.
secondaryFiles
Only valid when type: File
or is an array of items: File
.
Provides a pattern or expression specifying files or
directories that should be included alongside the primary
file. Secondary files may be required or optional. When not
explicitly specified, secondary files specified for inputs
are required and outputs
are optional. An implementation
must include matching Files and Directories in the
secondaryFiles
property of the primary file. These Files
and Directories must be transferred and staged alongside the
primary file. An implementation may fail workflow execution
if a required secondary file does not exist.
If the value is an expression, the value of self
in the expression
must be the primary input or output File object to which this binding
applies. The basename
, nameroot
and nameext
fields must be
present in self
. For CommandLineTool
outputs the path
field must
also be present. The expression must return a filename string relative
to the path to the primary File, a File or Directory object with either
path
or location
and basename
fields set, or an array consisting
of strings or File or Directory objects. It is legal to reference an
unchanged File or Directory object taken from input as a secondaryFile.
The expression may return "null" in which case there is no secondaryFile
from that expression.
To work on non-filename-preserving storage systems, portable tool
descriptions should avoid constructing new values from location
, but
should construct relative references using basename
or nameroot
instead.
If a value in secondaryFiles
is a string that is not an expression,
it specifies that the following pattern should be applied to the path
of the primary file to yield a filename relative to the primary File:
- If string ends with
?
character, remove the last?
and mark the resulting secondary file as optional. - If string begins with one or more caret
^
characters, for each caret, remove the last file extension from the path (the last period.
and all following characters). If there are no file extensions, the path is unchanged. - Append the remainder of the string to the end of the file path.
streamable
Only valid when type: File
or is an array of items: File
.
A value of true
indicates that the file is read or written
sequentially without seeking. An implementation may use this flag to
indicate whether it is valid to stream file contents using a named
pipe. Default: false
.
format
Only valid when type: File
or is an array of items: File
.
This is the file format that will be assigned to the output File object.
4.2.4.1 OutputEnumSchema §
Fields
type
enum
Must be enum
4.2.4.2 OutputArraySchema §
Fields
items
Defines the type of the array elements.
type
array
Must be array
doc
A documentation string for this object, or an array of strings which should be concatenated.
4.3 WorkflowStep §
A workflow step is an executable element of a workflow. It specifies the
underlying process implementation (such as CommandLineTool
or another
Workflow
) in the run
field and connects the input and output parameters
of the underlying process to workflow parameters.
Scatter/gather §
To use scatter/gather, ScatterFeatureRequirement must be specified in the workflow or workflow step requirements.
A "scatter" operation specifies that the associated workflow step or subworkflow should execute separately over a list of input elements. Each job making up a scatter operation is independent and may be executed concurrently.
The scatter
field specifies one or more input parameters which will be
scattered. An input parameter may be listed more than once. The declared
type of each input parameter implicitly becomes an array of items of the
input parameter type. If a parameter is listed more than once, it becomes
a nested array. As a result, upstream parameters which are connected to
scattered parameters must be arrays.
All output parameter types are also implicitly wrapped in arrays. Each job in the scatter results in an entry in the output array.
If any scattered parameter runtime value is an empty array, all outputs are set to empty arrays and no work is done for the step, according to applicable scattering rules.
If scatter
declares more than one input parameter, scatterMethod
describes how to decompose the input into a discrete set of jobs.
dotproduct specifies that each of the input arrays are aligned and one element taken from each array to construct each job. It is an error if all input arrays are not the same length.
nested_crossproduct specifies the Cartesian product of the inputs, producing a job for every combination of the scattered inputs. The output must be nested arrays for each level of scattering, in the order that the input arrays are listed in the
scatter
field.flat_crossproduct specifies the Cartesian product of the inputs, producing a job for every combination of the scattered inputs. The output arrays must be flattened to a single level, but otherwise listed in the order that the input arrays are listed in the
scatter
field.
Conditional execution (Optional) §
Conditional execution makes execution of a step conditional on an
expression. A step that is not executed is "skipped". A skipped
step produces null
for all output parameters.
The condition is evaluated after scatter
, using the input object
of each individual scatter job. This means over a set of scatter
jobs, some may be executed and some may be skipped. When the
results are gathered, skipped steps must be null
in the output
arrays.
The when
field controls conditional execution. This is an
expression that must be evaluated with inputs
bound to the step
input object (or individual scatter job), and returns a boolean
value. It is an error if this expression returns a value other
than true
or false
.
Conditionals in CWL are an optional feature and are not required to be implemented by all consumers of CWL documents. An implementation that does not support conditionals must return a fatal error when attempting to execute a workflow that uses conditional constructs the implementation does not support.
Subworkflows §
To specify a nested workflow as part of a workflow step, SubworkflowFeatureRequirement must be specified in the workflow or workflow step requirements.
It is a fatal error if a workflow directly or indirectly invokes itself as a subworkflow (recursive workflows are not allowed).
Fields
in
Defines the input parameters of the workflow step. The process is ready to run when all required input parameters are associated with concrete values. Input parameters include a schema for each parameter which is used to validate the input object. It may also be used build a user interface for constructing the input object.
out
Defines the parameters representing the output of the process. May be used to generate and/or validate the output object.
run
Specifies the process to run. If run
is a string, it must be an absolute IRI
or a relative path from the primary document.
doc
A documentation string for this object, or an array of strings which should be concatenated.
requirements
map<
class
, InlineJavascriptRequirement | SchemaDefRequirement | LoadListingRequirement | DockerRequirement | SoftwareRequirement | InitialWorkDirRequirement | EnvVarRequirement | ShellCommandRequirement | ResourceRequirement | WorkReuse | NetworkAccess | InplaceUpdateRequirement | ToolTimeLimit | SubworkflowFeatureRequirement | ScatterFeatureRequirement | MultipleInputFeatureRequirement | StepInputExpressionRequirement>Declares requirements that apply to either the runtime environment or the workflow engine that must be met in order to execute this workflow step. If an implementation cannot satisfy all requirements, or a requirement is listed which is not recognized by the implementation, it is a fatal error and the implementation must not attempt to run the process, unless overridden at user option.
hints
Declares hints applying to either the runtime environment or the workflow engine that may be helpful in executing this workflow step. It is not an error if an implementation cannot satisfy all hints, however the implementation may report a warning.
when
If defined, only run the step when the expression evaluates to
true
. If false
the step is skipped. A skipped step
produces a null
on each output.
4.3.1 WorkflowStepInput §
The input of a workflow step connects an upstream parameter (from the
workflow inputs, or the outputs of other workflows steps) with the input
parameters of the process specified by the run
field. Only input parameters
declared by the target process will be passed through at runtime to the process
though additional parameters may be specified (for use within valueFrom
expressions for instance) - unconnected or unused parameters do not represent an
error condition.
Input object §
A WorkflowStepInput object must contain an id
field in the form
#fieldname
or #prefix/fieldname
. When the id
field contains a slash
/
the field name consists of the characters following the final slash
(the prefix portion may contain one or more slashes to indicate scope).
This defines a field of the workflow step input object with the value of
the source
parameter(s).
Merging multiple inbound data links §
To merge multiple inbound data links, MultipleInputFeatureRequirement must be specified in the workflow or workflow step requirements.
If the sink parameter is an array, or named in a workflow
scatter operation, there may be multiple inbound
data links listed in the source
field. The values from the
input links are merged depending on the method specified in the
linkMerge
field. If both linkMerge
and pickValue
are null
or not specified, and there is more than one element in the
source
array, the default method is "merge_nested".
If both linkMerge
and pickValue
are null or not specified, and
there is only a single element in the source
, then the input
parameter takes the scalar value from the single input link (it is
not wrapped in a single-list).
merge_nested
The input must be an array consisting of exactly one entry for each input link. If "merge_nested" is specified with a single link, the value from the link must be wrapped in a single-item list.
merge_flattened
- The source and sink parameters must be compatible types, or the source type must be compatible with single element from the "items" type of the destination array parameter.
- Source parameters which are arrays are concatenated. Source parameters which are single element types are appended as single elements.
Picking non-null values among inbound data links §
If present, pickValue
specifies how to pick non-null values among inbound data links.
pickValue
is evaluated
- Once all source values from upstream step or parameters are available.
- After
linkMerge
. - Before
scatter
orvalueFrom
.
This is specifically intended to be useful in combination with
conditional execution, where several upstream
steps may be connected to a single input (source
is a list), and
skipped steps produce null values.
Static type checkers should check for type consistency after inferring what the type
will be after pickValue
is applied, just as they do currently for linkMerge
.
first_non_null
For the first level of a list input, pick the first non-null element. The result is a scalar. It is an error if there is no non-null element. Examples:
[null, x, null, y] -> x
[null, [null], null, y] -> [null]
[null, null, null] -> Runtime Error
Intended use case: If-else pattern where the value comes either from a conditional step or from a default or fallback value. The conditional step(s) should be placed first in the list.
the_only_non_null
For the first level of a list input, pick the single non-null element. The result is a scalar. It is an error if there is more than one non-null element. Examples:
[null, x, null] -> x
[null, x, null, y] -> Runtime Error
[null, [null], null] -> [null]
[null, null, null] -> Runtime Error
Intended use case: Switch type patterns where developer considers more than one active code path as a workflow error (possibly indicating an error in writing
when
condition expressions).all_non_null
For the first level of a list input, pick all non-null values. The result is a list, which may be empty. Examples:
[null, x, null] -> [x]
[x, null, y] -> [x, y]
[null, [x], [null]] -> [[x], [null]]
[null, null, null] -> []
Intended use case: It is valid to have more than one source, but sources are conditional, so null sources (from skipped steps) should be filtered out.
Fields
source
Specifies one or more workflow parameters that will provide input to the underlying step parameter.
linkMerge
The method to use to merge multiple inbound links into a single array. If not specified, the default method is "merge_nested".
pickValue
The method to use to choose non-null elements among multiple sources.
loadContents
Only valid when type: File
or is an array of items: File
.
If true, the file (or each file in the array) must be a UTF-8
text file 64 KiB or smaller, and the implementation must read
the entire contents of the file (or file array) and place it
in the contents
field of the File object for use by
expressions. If the size of the file is greater than 64 KiB,
the implementation must raise a fatal error.
loadListing
Only valid when type: Directory
or is an array of items: Directory
.
Specify the desired behavior for loading the listing
field of
a Directory object for use by expressions.
The order of precedence for loadListing is:
loadListing
on an individual parameter- Inherited from
LoadListingRequirement
- By default:
no_listing
default
The default value for this parameter to use if either there is no
source
field, or the value produced by the source
is null
. The
default must be applied prior to scattering or evaluating valueFrom
.
valueFrom
To use valueFrom, StepInputExpressionRequirement must be specified in the workflow or workflow step requirements.
If valueFrom
is a constant string value, use this as the value for
this input parameter.
If valueFrom
is a parameter reference or expression, it must be
evaluated to yield the actual value to be assigned to the input field.
The self
value in the parameter reference or expression must be
parameter is specified in this workflow step's scatter
field.
null
if there is nosource
field- the value of the parameter(s) specified in the
source
field when this workflow input parameter is not specified in this workflow step'sscatter
field. - an element of the parameter specified in the
source
field when this workflow input
The value of inputs
in the parameter reference or expression must be
the input object to the workflow step after assigning the source
values, applying default
, and then scattering. The order of
evaluating valueFrom
among step input parameters is undefined and the
result of evaluating valueFrom
on a parameter must not be visible to
evaluation of valueFrom
on other parameters.
4.3.2 WorkflowStepOutput §
Associate an output parameter of the underlying process with a workflow
parameter. The workflow parameter (given in the id
field) be may be used
as a source
to connect with input parameters of other workflow steps, or
with an output parameter of the process.
A unique identifier for this workflow output parameter. This is
the identifier to use in the source
field of WorkflowStepInput
to connect the output value to downstream parameters.
Fields
4.3.3 ScatterMethod §
The scatter method, as described in workflow step scatter.
Symbols
symbol | description |
---|---|
dotproduct | |
nested_crossproduct | |
flat_crossproduct |
4.3.4 InlineJavascriptRequirement §
Indicates that the workflow platform must support inline Javascript expressions. If this requirement is not present, the workflow platform must not perform expression interpolation.
Fields
class
InlineJavascriptRequirement
Always 'InlineJavascriptRequirement'
expressionLib
Additional code fragments that will also be inserted before executing the expression code. Allows for function definitions that may be called from CWL expressions.
4.3.5 SchemaDefRequirement §
This field consists of an array of type definitions which must be used when
interpreting the inputs
and outputs
fields. When a type
field
contains a IRI, the implementation must check if the type is defined in
schemaDefs
and use that definition. If the type is not found in
schemaDefs
, it is an error. The entries in schemaDefs
must be
processed in the order listed such that later schema definitions may refer
to earlier schema definitions.
- Type definitions are allowed for
enum
andrecord
types only. - Type definitions may be shared by defining them in a file and then
$include
-ing them in thetypes
field. - A file can contain a list of type definitions
Fields
class
SchemaDefRequirement
Always 'SchemaDefRequirement'
types
The list of type definitions.
4.3.5.1 CommandInputRecordSchema §
Fields
type
record
Must be record
fields
Defines the fields of the record.
doc
A documentation string for this object, or an array of strings which should be concatenated.
inputBinding
Describes how to turn this object into command line arguments.
4.3.5.1.1 CommandInputRecordField §
Fields
type
The field type
doc
A documentation string for this object, or an array of strings which should be concatenated.
secondaryFiles
Only valid when type: File
or is an array of items: File
.
Provides a pattern or expression specifying files or
directories that should be included alongside the primary
file. Secondary files may be required or optional. When not
explicitly specified, secondary files specified for inputs
are required and outputs
are optional. An implementation
must include matching Files and Directories in the
secondaryFiles
property of the primary file. These Files
and Directories must be transferred and staged alongside the
primary file. An implementation may fail workflow execution
if a required secondary file does not exist.
If the value is an expression, the value of self
in the expression
must be the primary input or output File object to which this binding
applies. The basename
, nameroot
and nameext
fields must be
present in self
. For CommandLineTool
outputs the path
field must
also be present. The expression must return a filename string relative
to the path to the primary File, a File or Directory object with either
path
or location
and basename
fields set, or an array consisting
of strings or File or Directory objects. It is legal to reference an
unchanged File or Directory object taken from input as a secondaryFile.
The expression may return "null" in which case there is no secondaryFile
from that expression.
To work on non-filename-preserving storage systems, portable tool
descriptions should avoid constructing new values from location
, but
should construct relative references using basename
or nameroot
instead.
If a value in secondaryFiles
is a string that is not an expression,
it specifies that the following pattern should be applied to the path
of the primary file to yield a filename relative to the primary File:
- If string ends with
?
character, remove the last?
and mark the resulting secondary file as optional. - If string begins with one or more caret
^
characters, for each caret, remove the last file extension from the path (the last period.
and all following characters). If there are no file extensions, the path is unchanged. - Append the remainder of the string to the end of the file path.
streamable
Only valid when type: File
or is an array of items: File
.
A value of true
indicates that the file is read or written
sequentially without seeking. An implementation may use this flag to
indicate whether it is valid to stream file contents using a named
pipe. Default: false
.
format
Only valid when type: File
or is an array of items: File
.
This must be one or more IRIs of concept nodes that represents file formats which are allowed as input to this parameter, preferably defined within an ontology. If no ontology is available, file formats may be tested by exact match.
loadContents
Only valid when type: File
or is an array of items: File
.
If true, the file (or each file in the array) must be a UTF-8
text file 64 KiB or smaller, and the implementation must read
the entire contents of the file (or file array) and place it
in the contents
field of the File object for use by
expressions. If the size of the file is greater than 64 KiB,
the implementation must raise a fatal error.
loadListing
Only valid when type: Directory
or is an array of items: Directory
.
Specify the desired behavior for loading the listing
field of
a Directory object for use by expressions.
The order of precedence for loadListing is:
loadListing
on an individual parameter- Inherited from
LoadListingRequirement
- By default:
no_listing
inputBinding
Describes how to turn this object into command line arguments.
4.3.5.1.1.1 CommandInputEnumSchema §
Fields
type
enum
Must be enum
doc
A documentation string for this object, or an array of strings which should be concatenated.
inputBinding
Describes how to turn this object into command line arguments.
4.3.5.1.1.2 CommandLineBinding §
When listed under inputBinding
in the input schema, the term
"value" refers to the corresponding value in the input object. For
binding objects listed in CommandLineTool.arguments
, the term "value"
refers to the effective value after evaluating valueFrom
.
The binding behavior when building the command line depends on the data type of the value. If there is a mismatch between the type described by the input schema and the effective value, such as resulting from an expression evaluation, an implementation must use the data type of the effective value.
string: Add
prefix
and the string to the command line.number: Add
prefix
and decimal representation to command line.boolean: If true, add
prefix
to the command line. If false, add nothing.File: Add
prefix
and the value ofFile.path
to the command line.Directory: Add
prefix
and the value ofDirectory.path
to the command line.array: If
itemSeparator
is specified, addprefix
and the join the array into a single string withitemSeparator
separating the items. Otherwise, first addprefix
, then recursively process individual elements. If the array is empty, it does not add anything to command line.object: Add
prefix
only, and recursively add object fields for whichinputBinding
is specified.null: Add nothing.
Fields
loadContents
Use of loadContents
in InputBinding
is deprecated.
Preserved for v1.0 backwards compatibility. Will be removed in
CWL v2.0. Use InputParameter.loadContents
instead.
position
The sorting key. Default position is 0. If a CWL Parameter Reference
or CWL Expression) is used and if the
inputBinding is associated with an input parameter, then the value of
self
will be the value of the input parameter. Input parameter
defaults (as specified by the InputParameter.default
field) must be
applied before evaluating the expression. Expressions must return a
single value of type int or a null.
separate
If true (default), then the prefix and value must be added as separate command line arguments; if false, prefix and value must be concatenated into a single command line argument.
itemSeparator
Join the array elements into a single string with the elements
separated by itemSeparator
.
valueFrom
If valueFrom
is a constant string value, use this as the value and
apply the binding rules above.
If valueFrom
is an expression, evaluate the expression to yield the
actual value to use to build the command line and apply the binding
rules above. If the inputBinding is associated with an input
parameter, the value of self
in the expression will be the value of
the input parameter. Input parameter defaults (as specified by the
InputParameter.default
field) must be applied before evaluating the
expression.
If the value of the associated input parameter is null
, valueFrom
is
not evaluated and nothing is added to the command line.
When a binding is part of the CommandLineTool.arguments
field,
the valueFrom
field is required.
shellQuote
If ShellCommandRequirement
is in the requirements for the current command,
this controls whether the value is quoted on the command line (default is true).
Use shellQuote: false
to inject metacharacters for operations such as pipes.
If shellQuote
is true or not provided, the implementation must not
permit interpretation of any shell metacharacters or directives.
4.3.5.1.1.3 CommandInputArraySchema §
Fields
items
Defines the type of the array elements.
type
array
Must be array
doc
A documentation string for this object, or an array of strings which should be concatenated.
inputBinding
Describes how to turn this object into command line arguments.
4.3.6 LoadListingRequirement §
Specify the desired behavior for loading the listing
field of
a Directory object for use by expressions.
Fields
class
LoadListingRequirement
Always 'LoadListingRequirement'
4.3.7 SoftwareRequirement §
A list of software packages that should be configured in the environment of the defined process.
Fields
class
SoftwareRequirement
Always 'SoftwareRequirement'
packages
The list of software to be configured.
4.3.8 SoftwarePackage §
Fields
package
The name of the software to be made available. If the name is
common, inconsistent, or otherwise ambiguous it should be combined with
one or more identifiers in the specs
field.
version
The (optional) versions of the software that are known to be compatible.
specs
One or more IRIs
identifying resources for installing or enabling the software named in
the package
field. Implementations may provide resolvers which map
these software identifier IRIs to some configuration action; or they can
use only the name from the package
field on a best effort basis.
For example, the IRI https://packages.debian.org/bowtie could
be resolved with apt-get install bowtie
. The IRI
https://anaconda.org/bioconda/bowtie could be resolved with conda install -c bioconda bowtie
.
IRIs can also be system independent and used to map to a specific software installation or selection mechanism. Using RRID as an example: https://identifiers.org/rrid/RRID:SCR_005476 could be fulfilled using the above-mentioned Debian or bioconda package, a local installation managed by Environment Modules, or any other mechanism the platform chooses. IRIs can also be from identifier sources that are discipline specific yet still system independent. As an example, the equivalent ELIXIR Tools and Data Service Registry IRI to the previous RRID example is https://bio.tools/tool/bowtie2/version/2.2.8. If supported by a given registry, implementations are encouraged to query these system independent software identifier IRIs directly for links to packaging systems.
A site specific IRI can be listed as well. For example, an academic
computing cluster using Environment Modules could list the IRI
https://hpc.example.edu/modules/bowtie-tbb/1.22
to indicate that
module load bowtie-tbb/1.1.2
should be executed to make available
bowtie
version 1.1.2 compiled with the TBB library prior to running
the accompanying Workflow or CommandLineTool. Note that the example IRI
is specific to a particular institution and computing environment as
the Environment Modules system does not have a common namespace or
standardized naming convention.
This last example is the least portable and should only be used if
mechanisms based off of the package
field or more generic IRIs are
unavailable or unsuitable. While harmless to other sites, site specific
software IRIs should be left out of shared CWL descriptions to avoid
clutter.
4.3.9 InitialWorkDirRequirement §
Define a list of files and subdirectories that must be staged by the workflow platform prior to executing the command line tool.
Normally files are staged within the designated output directory. However, when running inside containers, files may be staged at arbitrary locations, see discussion for Dirent.entryname
. Together with DockerRequirement.dockerOutputDirectory
it is possible to control the locations of both input and output files when running in containers.
Fields
class
InitialWorkDirRequirement
InitialWorkDirRequirement
listing
The list of files or subdirectories that must be staged prior to executing the command line tool.
Return type of each expression must validate as ["null", File, Directory, Dirent, {type: array, items: [File, Directory]}]
.
Each File
or Directory
that is returned by an Expression
must be added to the designated output directory prior to
executing the tool.
Each Dirent
record that is listed or returned by an
expression specifies a file to be created or staged in the
designated output directory prior to executing the tool.
Expressions may return null, in which case they have no effect.
Files or Directories which are listed in the input parameters
and appear in the InitialWorkDirRequirement
listing must
have their path
set to their staged location. If the same
File or Directory appears more than once in the
InitialWorkDirRequirement
listing, the implementation must
choose exactly one value for path
; how this value is chosen
is undefined.
4.3.9.1 Dirent §
Define a file or subdirectory that must be staged to a particular place prior to executing the command line tool. May be the result of executing an expression, such as building a configuration file from a template.
Usually files are staged within the designated output directory.
However, under certain circumstances, files may be staged at
arbitrary locations, see discussion for entryname
.
Fields
entry
If the value is a string literal or an expression which evaluates to a string, a new text file must be created with the string as the file contents.
If the value is an expression that evaluates to a File
or
Directory
object, or an array of File
or Directory
objects, this indicates the referenced file or directory
should be added to the designated output directory prior to
executing the tool.
If the value is an expression that evaluates to null
,
nothing is added to the designated output directory, the entry
has no effect.
If the value is an expression that evaluates to some other
array, number, or object not consisting of File
or
Directory
objects, a new file must be created with the value
serialized to JSON text as the file contents. The JSON
serialization behavior should match the behavior of string
interpolation of Parameter
references.
entryname
The "target" name of the file or subdirectory. If entry
is
a File or Directory, the entryname
field overrides the value
of basename
of the File or Directory object.
- Required when
entry
evaluates to file contents only - Optional when
entry
evaluates to a File or Directory object with abasename
- Invalid when
entry
evaluates to an array of File or Directory objects.
If entryname
is a relative path, it specifies a name within
the designated output directory. A relative path starting
with ../
or that resolves to location above the designated output directory is an error.
If entryname
is an absolute path (starts with a slash /
)
it is an error unless the following conditions are met:
DockerRequirement
is present inrequirements
- The program is will run inside a software container where, from the perspective of the program, the root filesystem is not shared with any other user or running program.
In this case, and the above conditions are met, then
entryname
may specify the absolute path within the container
where the file or directory must be placed.
writable
If true, the File or Directory (or array of Files or
Directories) declared in entry
must be writable by the tool.
Changes to the file or directory must be isolated and not visible by any other CommandLineTool process. This may be implemented by making a copy of the original file or directory.
Disruptive changes to the referenced file or directory must not
be allowed unless InplaceUpdateRequirement.inplaceUpdate
is true.
Default false (files and directories read-only by default).
A directory marked as writable: true
implies that all files and
subdirectories are recursively writable as well.
If writable
is false, the file may be made available using a
bind mount or file system link to avoid unnecessary copying of
the input file. Command line tools may receive an error on
attempting to rename or delete files or directories that are
not explicitly marked as writable.
4.3.10 WorkReuse §
For implementations that support reusing output from past work (on the assumption that same code and same input produce same results), control whether to enable or disable the reuse behavior for a particular tool or step (to accommodate situations where that assumption is incorrect). A reused step is not executed but instead returns the same output as the original execution.
If WorkReuse
is not specified, correct tools should assume it
is enabled by default.
Fields
class
WorkReuse
Always 'WorkReuse'
4.3.11 NetworkAccess §
Indicate whether a process requires outgoing IPv4/IPv6 network access. Choice of IPv4 or IPv6 is implementation and site specific, correct tools must support both.
If networkAccess
is false or not specified, tools must not
assume network access, except for localhost (the loopback device).
If networkAccess
is true, the tool must be able to make outgoing
connections to network resources. Resources may be on a private
subnet or the public Internet. However, implementations and sites
may apply their own security policies to restrict what is
accessible by the tool.
Enabling network access does not imply a publicly routable IP address or the ability to accept inbound connections.
Fields
class
NetworkAccess
Always 'NetworkAccess'
4.3.12 InplaceUpdateRequirement §
If inplaceUpdate
is true, then an implementation supporting this
feature may permit tools to directly update files with writable: true
in InitialWorkDirRequirement. That is, as an optimization,
files may be destructively modified in place as opposed to copied
and updated.
An implementation must ensure that only one workflow step may access a writable file at a time. It is an error if a file which is writable by one workflow step file is accessed (for reading or writing) by any other workflow step running independently. However, a file which has been updated in a previous completed step may be used as input to multiple steps, provided it is read-only in every step.
Workflow steps which modify a file must produce the modified file as output. Downstream steps which further process the file must use the output of previous steps, and not refer to a common input (this is necessary for both ordering and correctness).
Workflow authors should provide this in the hints
section. The
intent of this feature is that workflows produce the same results
whether or not InplaceUpdateRequirement is supported by the
implementation, and this feature is primarily available as an
optimization for particular environments.
Users and implementers should be aware that workflows that destructively modify inputs may not be repeatable or reproducible. In particular, enabling this feature implies that WorkReuse should not be enabled.
Fields
class
InplaceUpdateRequirement
Always 'InplaceUpdateRequirement'
4.3.13 ToolTimeLimit §
Set an upper limit on the execution time of a CommandLineTool. A CommandLineTool whose execution duration exceeds the time limit may be preemptively terminated and considered failed. May also be used by batch systems to make scheduling decisions. The execution duration excludes external operations, such as staging of files, pulling a docker image etc, and only counts wall-time for the execution of the command line itself.
Fields
class
ToolTimeLimit
Always 'ToolTimeLimit'
timelimit
The time limit, in seconds. A time limit of zero means no time limit. Negative time limits are an error.
4.3.14 SubworkflowFeatureRequirement §
Indicates that the workflow platform must support nested workflows in
the run
field of WorkflowStep.
Fields
class
SubworkflowFeatureRequirement
Always 'SubworkflowFeatureRequirement'
4.3.15 ScatterFeatureRequirement §
Indicates that the workflow platform must support the scatter
and
scatterMethod
fields of WorkflowStep.
Fields
class
ScatterFeatureRequirement
Always 'ScatterFeatureRequirement'
4.3.16 MultipleInputFeatureRequirement §
Indicates that the workflow platform must support multiple inbound data links
listed in the source
field of WorkflowStepInput.
Fields
class
MultipleInputFeatureRequirement
Always 'MultipleInputFeatureRequirement'
4.3.17 StepInputExpressionRequirement §
Indicate that the workflow platform must support the valueFrom
field
of WorkflowStepInput.
Fields
class
StepInputExpressionRequirement
Always 'StepInputExpressionRequirement'
4.3.18 ExpressionTool §
An ExpressionTool is a type of Process object that can be run by itself or as a Workflow step. It executes a pure Javascript expression that has access to the same input parameters as a workflow. It is meant to be used sparingly as a way to isolate complex Javascript expressions that need to operate on input data and produce some result; perhaps just a rearrangement of the inputs. No Docker software container is required or allowed.
Fields
inputs
Defines the input parameters of the process. The process is ready to run when all required input parameters are associated with concrete values. Input parameters include a schema for each parameter which is used to validate the input object. It may also be used to build a user interface for constructing the input object.
When accepting an input object, all input parameters must have a value.
If an input parameter is missing from the input object, it must be
assigned a value of null
(or the value of default
for that
parameter, if provided) for the purposes of validation and evaluation
of expressions.
outputs
Defines the parameters representing the output of the process. May be used to generate and/or validate the output object.
class
ExpressionTool
expression
The expression to execute. The expression must return a plain Javascript object which matches the output parameters of the ExpressionTool.
id
The unique identifier for this object.
Only useful for $graph
at Process
level. Should not be exposed
to users in graphical or terminal user interfaces.
doc
A documentation string for this object, or an array of strings which should be concatenated.
requirements
map<
class
, InlineJavascriptRequirement | SchemaDefRequirement | LoadListingRequirement | DockerRequirement | SoftwareRequirement | InitialWorkDirRequirement | EnvVarRequirement | ShellCommandRequirement | ResourceRequirement | WorkReuse | NetworkAccess | InplaceUpdateRequirement | ToolTimeLimit | SubworkflowFeatureRequirement | ScatterFeatureRequirement | MultipleInputFeatureRequirement | StepInputExpressionRequirement>Declares requirements that apply to either the runtime environment or the workflow engine that must be met in order to execute this process. If an implementation cannot satisfy all requirements, or a requirement is listed which is not recognized by the implementation, it is a fatal error and the implementation must not attempt to run the process, unless overridden at user option.
hints
map<
class
, InlineJavascriptRequirement | SchemaDefRequirement | LoadListingRequirement | CommandLineTool.html#DockerRequirement | SoftwareRequirement | InitialWorkDirRequirement | CommandLineTool.html#EnvVarRequirement | CommandLineTool.html#ShellCommandRequirement | CommandLineTool.html#ResourceRequirement | WorkReuse | NetworkAccess | InplaceUpdateRequirement | ToolTimeLimit | SubworkflowFeatureRequirement | ScatterFeatureRequirement | MultipleInputFeatureRequirement | StepInputExpressionRequirement | Any>Declares hints applying to either the runtime environment or the workflow engine that may be helpful in executing this process. It is not an error if an implementation cannot satisfy all hints, however the implementation may report a warning.
cwlVersion
CWL document version. Always required at the document root. Not required for a Process embedded inside another Process.
intent
An identifier for the type of computational operation, of this Process.
Especially useful for Operation
, but can also be used for
CommandLineTool
,
Workflow
, or ExpressionTool.
If provided, then this must be an IRI of a concept node that represents the type of operation, preferably defined within an ontology.
For example, in the domain of bioinformatics, one can use an IRI from the EDAM Ontology's Operation concept nodes, like Alignment, or Clustering; or a more specific Operation concept like Split read mapping.
4.3.18.1 ExpressionToolOutputParameter §
Fields
type
Specify valid types of data that may be assigned to this parameter. Note that this field just acts as a hint, as the outputs of an ExpressionTool process are always considered valid.
secondaryFiles
Only valid when type: File
or is an array of items: File
.
Provides a pattern or expression specifying files or
directories that should be included alongside the primary
file. Secondary files may be required or optional. When not
explicitly specified, secondary files specified for inputs
are required and outputs
are optional. An implementation
must include matching Files and Directories in the
secondaryFiles
property of the primary file. These Files
and Directories must be transferred and staged alongside the
primary file. An implementation may fail workflow execution
if a required secondary file does not exist.
If the value is an expression, the value of self
in the expression
must be the primary input or output File object to which this binding
applies. The basename
, nameroot
and nameext
fields must be
present in self
. For CommandLineTool
outputs the path
field must
also be present. The expression must return a filename string relative
to the path to the primary File, a File or Directory object with either
path
or location
and basename
fields set, or an array consisting
of strings or File or Directory objects. It is legal to reference an
unchanged File or Directory object taken from input as a secondaryFile.
The expression may return "null" in which case there is no secondaryFile
from that expression.
To work on non-filename-preserving storage systems, portable tool
descriptions should avoid constructing new values from location
, but
should construct relative references using basename
or nameroot
instead.
If a value in secondaryFiles
is a string that is not an expression,
it specifies that the following pattern should be applied to the path
of the primary file to yield a filename relative to the primary File:
- If string ends with
?
character, remove the last?
and mark the resulting secondary file as optional. - If string begins with one or more caret
^
characters, for each caret, remove the last file extension from the path (the last period.
and all following characters). If there are no file extensions, the path is unchanged. - Append the remainder of the string to the end of the file path.
streamable
Only valid when type: File
or is an array of items: File
.
A value of true
indicates that the file is read or written
sequentially without seeking. An implementation may use this flag to
indicate whether it is valid to stream file contents using a named
pipe. Default: false
.
doc
A documentation string for this object, or an array of strings which should be concatenated.
format
Only valid when type: File
or is an array of items: File
.
This is the file format that will be assigned to the output File object.
4.3.18.2 CWLVersion §
Version symbols for published CWL document versions.
Symbols
symbol | description |
---|---|
draft-2 | |
draft-3.dev1 | |
draft-3.dev2 | |
draft-3.dev3 | |
draft-3.dev4 | |
draft-3.dev5 | |
draft-3 | |
draft-4.dev1 | |
draft-4.dev2 | |
draft-4.dev3 | |
v1.0.dev4 | |
v1.0 | |
v1.1.0-dev1 | |
v1.1 | |
v1.2.0-dev1 | |
v1.2.0-dev2 | |
v1.2.0-dev3 | |
v1.2.0-dev4 | |
v1.2.0-dev5 | |
v1.2 |
4.3.19 Operation §
This record describes an abstract operation. It is a potential step of a workflow that has not yet been bound to a concrete implementation. It specifies an input and output signature, but does not provide enough information to be executed. An implementation (or other tooling) may provide a means of binding an Operation to a concrete process (such as Workflow, CommandLineTool, or ExpressionTool) with a compatible signature.
Fields
inputs
Defines the input parameters of the process. The process is ready to run when all required input parameters are associated with concrete values. Input parameters include a schema for each parameter which is used to validate the input object. It may also be used to build a user interface for constructing the input object.
When accepting an input object, all input parameters must have a value.
If an input parameter is missing from the input object, it must be
assigned a value of null
(or the value of default
for that
parameter, if provided) for the purposes of validation and evaluation
of expressions.
outputs
Defines the parameters representing the output of the process. May be used to generate and/or validate the output object.
class
Operation
id
The unique identifier for this object.
Only useful for $graph
at Process
level. Should not be exposed
to users in graphical or terminal user interfaces.
doc
A documentation string for this object, or an array of strings which should be concatenated.
requirements
map<
class
, InlineJavascriptRequirement | SchemaDefRequirement | LoadListingRequirement | DockerRequirement | SoftwareRequirement | InitialWorkDirRequirement | EnvVarRequirement | ShellCommandRequirement | ResourceRequirement | WorkReuse | NetworkAccess | InplaceUpdateRequirement | ToolTimeLimit | SubworkflowFeatureRequirement | ScatterFeatureRequirement | MultipleInputFeatureRequirement | StepInputExpressionRequirement>Declares requirements that apply to either the runtime environment or the workflow engine that must be met in order to execute this process. If an implementation cannot satisfy all requirements, or a requirement is listed which is not recognized by the implementation, it is a fatal error and the implementation must not attempt to run the process, unless overridden at user option.
hints
map<
class
, InlineJavascriptRequirement | SchemaDefRequirement | LoadListingRequirement | CommandLineTool.html#DockerRequirement | SoftwareRequirement | InitialWorkDirRequirement | CommandLineTool.html#EnvVarRequirement | CommandLineTool.html#ShellCommandRequirement | CommandLineTool.html#ResourceRequirement | WorkReuse | NetworkAccess | InplaceUpdateRequirement | ToolTimeLimit | SubworkflowFeatureRequirement | ScatterFeatureRequirement | MultipleInputFeatureRequirement | StepInputExpressionRequirement | Any>Declares hints applying to either the runtime environment or the workflow engine that may be helpful in executing this process. It is not an error if an implementation cannot satisfy all hints, however the implementation may report a warning.
cwlVersion
CWL document version. Always required at the document root. Not required for a Process embedded inside another Process.
intent
An identifier for the type of computational operation, of this Process.
Especially useful for Operation
, but can also be used for
CommandLineTool
,
Workflow
, or ExpressionTool.
If provided, then this must be an IRI of a concept node that represents the type of operation, preferably defined within an ontology.
For example, in the domain of bioinformatics, one can use an IRI from the EDAM Ontology's Operation concept nodes, like Alignment, or Clustering; or a more specific Operation concept like Split read mapping.
4.3.19.1 OperationInputParameter §
Describe an input parameter of an operation.
Fields
type
Specify valid types of data that may be assigned to this parameter.
secondaryFiles
Only valid when type: File
or is an array of items: File
.
Provides a pattern or expression specifying files or
directories that should be included alongside the primary
file. Secondary files may be required or optional. When not
explicitly specified, secondary files specified for inputs
are required and outputs
are optional. An implementation
must include matching Files and Directories in the
secondaryFiles
property of the primary file. These Files
and Directories must be transferred and staged alongside the
primary file. An implementation may fail workflow execution
if a required secondary file does not exist.
If the value is an expression, the value of self
in the expression
must be the primary input or output File object to which this binding
applies. The basename
, nameroot
and nameext
fields must be
present in self
. For CommandLineTool
outputs the path
field must
also be present. The expression must return a filename string relative
to the path to the primary File, a File or Directory object with either
path
or location
and basename
fields set, or an array consisting
of strings or File or Directory objects. It is legal to reference an
unchanged File or Directory object taken from input as a secondaryFile.
The expression may return "null" in which case there is no secondaryFile
from that expression.
To work on non-filename-preserving storage systems, portable tool
descriptions should avoid constructing new values from location
, but
should construct relative references using basename
or nameroot
instead.
If a value in secondaryFiles
is a string that is not an expression,
it specifies that the following pattern should be applied to the path
of the primary file to yield a filename relative to the primary File:
- If string ends with
?
character, remove the last?
and mark the resulting secondary file as optional. - If string begins with one or more caret
^
characters, for each caret, remove the last file extension from the path (the last period.
and all following characters). If there are no file extensions, the path is unchanged. - Append the remainder of the string to the end of the file path.
streamable
Only valid when type: File
or is an array of items: File
.
A value of true
indicates that the file is read or written
sequentially without seeking. An implementation may use this flag to
indicate whether it is valid to stream file contents using a named
pipe. Default: false
.
doc
A documentation string for this object, or an array of strings which should be concatenated.
format
Only valid when type: File
or is an array of items: File
.
This must be one or more IRIs of concept nodes that represents file formats which are allowed as input to this parameter, preferably defined within an ontology. If no ontology is available, file formats may be tested by exact match.
loadContents
Only valid when type: File
or is an array of items: File
.
If true, the file (or each file in the array) must be a UTF-8
text file 64 KiB or smaller, and the implementation must read
the entire contents of the file (or file array) and place it
in the contents
field of the File object for use by
expressions. If the size of the file is greater than 64 KiB,
the implementation must raise a fatal error.
loadListing
Only valid when type: Directory
or is an array of items: Directory
.
Specify the desired behavior for loading the listing
field of
a Directory object for use by expressions.
The order of precedence for loadListing is:
loadListing
on an individual parameter- Inherited from
LoadListingRequirement
- By default:
no_listing
4.3.19.2 OperationOutputParameter §
Describe an output parameter of an operation.
Fields
type
Specify valid types of data that may be assigned to this parameter.
secondaryFiles
Only valid when type: File
or is an array of items: File
.
Provides a pattern or expression specifying files or
directories that should be included alongside the primary
file. Secondary files may be required or optional. When not
explicitly specified, secondary files specified for inputs
are required and outputs
are optional. An implementation
must include matching Files and Directories in the
secondaryFiles
property of the primary file. These Files
and Directories must be transferred and staged alongside the
primary file. An implementation may fail workflow execution
if a required secondary file does not exist.
If the value is an expression, the value of self
in the expression
must be the primary input or output File object to which this binding
applies. The basename
, nameroot
and nameext
fields must be
present in self
. For CommandLineTool
outputs the path
field must
also be present. The expression must return a filename string relative
to the path to the primary File, a File or Directory object with either
path
or location
and basename
fields set, or an array consisting
of strings or File or Directory objects. It is legal to reference an
unchanged File or Directory object taken from input as a secondaryFile.
The expression may return "null" in which case there is no secondaryFile
from that expression.
To work on non-filename-preserving storage systems, portable tool
descriptions should avoid constructing new values from location
, but
should construct relative references using basename
or nameroot
instead.
If a value in secondaryFiles
is a string that is not an expression,
it specifies that the following pattern should be applied to the path
of the primary file to yield a filename relative to the primary File:
- If string ends with
?
character, remove the last?
and mark the resulting secondary file as optional. - If string begins with one or more caret
^
characters, for each caret, remove the last file extension from the path (the last period.
and all following characters). If there are no file extensions, the path is unchanged. - Append the remainder of the string to the end of the file path.
streamable
Only valid when type: File
or is an array of items: File
.
A value of true
indicates that the file is read or written
sequentially without seeking. An implementation may use this flag to
indicate whether it is valid to stream file contents using a named
pipe. Default: false
.
doc
A documentation string for this object, or an array of strings which should be concatenated.
format
Only valid when type: File
or is an array of items: File
.
This is the file format that will be assigned to the output File object.