Common Workflow Language (CWL) Workflow Description, draft 3

This version:

Current version:

Authors:

Contributers:

Abstract

A Workflow is an analysis task represented by a directed graph describing a sequence of operations that transform an input data set to output. This specification defines the Common Workflow Language (CWL) Workflow description, a vendor-neutral standard for representing workflows intended to be portable across a variety of computing platforms.

Status of This Document

This document is the product of the Common Workflow Language working group. The latest version of this document is available in the "draft-3" directory at

https://github.com/common-workflow-language/common-workflow-language

The products of the CWL working group (including this document) are made available under the terms of the Apache License, version 2.0.

Table of contents

1. Introduction

The Common Workflow Language (CWL) working group is an informal, multi-vendor working group consisting of various organizations and individuals that have an interest in portability of data analysis workflows. The goal is to create specifications like this one that enable data scientists to describe analysis tools and workflows that are powerful, easy to use, portable, and support reproducibility.

1.1 Introduction to draft 3

This specification represents the third milestone of the CWL group. Since draft-2, this draft introduces the following changes and additions:

  • Greatly simplified naming within a document with scoped identifiers, as described in the Schema Salad specification.
  • The draft-2 concept of pluggable expression engines has been replaced by a [streamlined expression syntax)[#Parameter_references] and standardization on Javascript.
  • File objects can now include a format field to indicate the file type.
  • The addition of MultipleInputFeatureRequirement.
  • The addition of StepInputExpressionRequirement.
  • The separation of Workflow and CommandLineTool components into separate specifications.

1.2 Purpose

The Common Workflow Language Command Line Tool Description express workflows for data-intensive science, such as Bioinformatics, Chemistry, Physics, and Astronomy. This specification is intended to define a data and execution model for Workflows that can be implemented on top of a variety of computing platforms, ranging from an individual workstation to cluster, grid, cloud, and high performance computing systems.

1.3 References to Other Specifications

Javascript Object Notation (JSON): http://json.org

JSON Linked Data (JSON-LD): http://json-ld.org

YAML: http://yaml.org

Avro: https://avro.apache.org/docs/current/spec.html

Uniform Resource Identifier (URI) Generic Syntax: https://tools.ietf.org/html/rfc3986)

Portable Operating System Interface (POSIX.1-2008): http://pubs.opengroup.org/onlinepubs/9699919799/

Resource Description Framework (RDF): http://www.w3.org/RDF/

1.4 Scope

This document describes CWL syntax, execution, and object model. It is not intended to document a CWL specific implementation, however it may serve as a reference for the behavior of conforming implementations.

1.5 Terminology

The terminology used to describe CWL documents is defined in the Concepts section of the specification. The terms defined in the following list are used in building those definitions and in describing the actions of an CWL implementation:

may: Conforming CWL documents and CWL implementations are permitted but not required to behave as described.

must: Conforming CWL documents and CWL implementations are required to behave as described; otherwise they are in error.

error: A violation of the rules of this specification; results are undefined. Conforming implementations may detect and report an error and may recover from it.

fatal error: A violation of the rules of this specification; results are undefined. Conforming implementations must not continue to execute the current process and may report an error.

at user option: Conforming software may or must (depending on the modal verb in the sentence) behave as described; if it does, it must provide users a means to enable or disable the behavior described.

deprecated: Conforming software may implement a behavior for backwards compatibility. Portable CWL documents should not rely on deprecated behavior. Behavior marked as deprecated may be removed entirely from future revisions of the CWL specification.

2. Data model

2.1 Data concepts

An object is a data structure equivalent to the "object" type in JSON, consisting of a unordered set of name/value pairs (referred to here as fields) and where the name is a string and the value is a string, number, boolean, array, or object.

A document is a file containing a serialized object, or an array of objects.

A process is a basic unit of computation which accepts input data, performs some computation, and produces output data.

An input object is an object describing the inputs to a invocation of process.

An output object is an object describing the output of an invocation of a process.

An input schema describes the valid format (required fields, data types) for an input object.

An output schema describes the valid format for a output object.

Metadata is information about workflows, tools, or input items that is not used directly in the computation.

2.2 Syntax

CWL documents must consist of an object or array of objects represented using JSON or YAML syntax. Upon loading, a CWL implementation must apply the preprocessing steps described in the Semantic Annotations for Linked Avro Data (SALAD) Specification. A implementation may formally validate the structure of a CWL document using SALAD schemas located at https://github.com/common-workflow-language/common-workflow-language/tree/master/draft-3

2.3 Identifiers

If an object contains an id field, that is used to uniquely identify the object in that document. The value of the id field must be unique over the entire document. Identifiers may be resolved relative to other the document base and/or other identifiers following the rules are described in the Schema Salad specification.

An implementation may choose to only honor references to object types for which the id field is explicitly listed in this specification.

2.4 Document preprocessing

An implementation must resolve $import and $include directives as described in the Schema Salad specification.

2.5 Extensions and Metadata

Input metadata (for example, a lab sample identifier) may be represented within a tool or workflow using input parameters which are explicitly propagated to output. Future versions of this specification may define additional facilities for working with input/output metadata.

Implementation extensions not required for correct execution (for example, fields related to GUI presentation) and metadata about the tool or workflow itself (for example, authorship for use in citations) may be provided as additional fields on any object. Such extensions fields must use a namespace prefix listed in the $namespaces section of the document as described in the Schema Salad specification.

Implementation extensions which modify execution semantics must be listed in the requirements field.

3. Execution model

3.1 Execution concepts

A parameter is a named symbolic input or output of process, with an associated datatype or schema. During execution, values are assigned to parameters to make the input object or output object used for concrete process invocation.

A command line tool is a process characterized by the execution of a standalone, non-interactive program which is invoked on some input, produces output, and then terminates.

A workflow is a process characterized by multiple subprocess steps, where step outputs are connected to the inputs of other downstream steps to form a directed graph, and independent steps may run concurrently.

A runtime environment is the actual hardware and software environment when executing a command line tool. It includes, but is not limited to, the hardware architecture, hardware resources, operating system, software runtime (if applicable, such as the Python interpreter or the JVM), libraries, modules, packages, utilities, and data files required to run the tool.

A workflow platform is a specific hardware and software implementation capable of interpreting CWL documents and executing the processes specified by the document. The responsibilities of the workflow platform may include scheduling process invocation, setting up the necessary runtime environment, making input data available, invoking the tool process, and collecting output.

A workflow platform may choose to only implement the Command Line Tool Description part of the CWL specification.

It is intended that the workflow platform has broad leeway outside of this specification to optimize use of computing resources and enforce policies not covered by this specification. Some areas that are currently out of scope for CWL specification but may be handled by a specific workflow platform include:

  • Data security and permissions.
  • Scheduling tool invocations on remote cluster or cloud compute nodes.
  • Using virtual machines or operating system containers to manage the runtime (except as described in DockerRequirement).
  • Using remote or distributed file systems to manage input and output files.
  • Transforming file paths.
  • Determining if a process has previously been executed, skipping it and reusing previous results.
  • Pausing, resuming or checkpointing processes or workflows.

Conforming CWL processes must not assume anything about the runtime environment or workflow platform unless explicitly declared though the use of process requirements.

3.2 Generic execution process

The generic execution sequence of a CWL process (including workflows and command line line tools) is as follows.

  1. Load, process and validate a CWL document, yielding a process object.
  2. Load input object.
  3. Validate the input object against the inputs schema for the process.
  4. Validate that process requirements are met.
  5. Perform any further setup required by the specific process type.
  6. Execute the process.
  7. Capture results of process execution into the output object.
  8. Validate the output object against the outputs schema for the process.
  9. Report the output object to the process caller.

3.3 Requirements and hints

A process requirement modifies the semantics or runtime environment of a process. If an implementation cannot satisfy all requirements, or a requirement is listed which is not recognized by the implementation, it is a fatal error and the implementation must not attempt to run the process, unless overridden at user option.

A hint is similar to a requirement, however it is not an error if an implementation cannot satisfy all hints. The implementation may report a warning if a hint cannot be satisfied.

Requirements are inherited. A requirement specified in a Workflow applies to all workflow steps; a requirement specified on a workflow step will apply to the process implementation.

If the same process requirement appears at different levels of the workflow, the most specific instance of the requirement is used, that is, an entry in requirements on a process implementation such as CommandLineTool will take precedence over an entry in requirements specified in a workflow step, and an entry in requirements on a workflow step takes precedence over the workflow. Entries in hints are resolved the same way.

Requirements override hints. If a process implementation provides a process requirement in hints which is also provided in requirements by an enclosing workflow or workflow step, the enclosing requirements takes precedence.

3.4 Parameter references

Parameter references are denoted by the syntax $(...) and may be used in any field permitting the pseudo-type Expression, as specified by this document. Conforming implementations must support parameter references. Parameter references use the following subset of Javascript/ECMAScript 5.1 syntax.

In the following BNF grammar, character classes and grammar rules are denoted in '{}', '-' denotes exclusion from a character class, '(())' denotes grouping, '|' denotes alternates, trailing '*' denotes zero or more repeats, '+' denote one or more repeats, all other characters are literal values.

symbol:: {Unicode alphanumeric}+
singleq:: [' (( {character - '} | \' ))* ']
doubleq:: [" (( {character - "} | \" ))* "]
index:: [ {decimal digit}+ ]
segment:: . {symbol} | {singleq} | {doubleq} | {index}
parameter::$( {symbol} {segment}*)

Use the following algorithm to resolve a parameter reference:

  1. Match the leading symbol as key
  2. Look up the key in the parameter context (described below) to get the current value. It is an error if the key is not found in the parameter context.
  3. If there are no subsequent segments, terminate and return current value
  4. Else, match the next segment
  5. Extract the symbol, string, or index from the segment as key
  6. Look up the key in current value and assign as new current value. If the key is a symbol or string, the current value must be an object. If the key is an index, the current value must be an array or string. It is an error if the key does not match the required type, or the key is not found or out of range.
  7. Repeat steps 3-6

The root namespace is the parameter context. The following parameters must be provided:

  • inputs: The input object to the current Process.
  • self: A context-specific value. The contextual values for 'self' are documented for specific fields elsewhere in this specification. If a contextual value of 'self' is not documented for a field, it must be 'null'.
  • runtime: An object containing configuration details. Specific to the process type. An implementation may provide may provide opaque strings for any or all fields of runtime. These must be filled in by the platform after processing the Tool but before actual execution. Parameter references and expressions may only use the literal string value of the field and must not perform computation on the contents.

If the value of a field has no leading or trailing non-whitespace characters around a parameter reference, the effective value of the field becomes the value of the referenced parameter, preserving the return type.

If the value of a field has non-whitespace leading or trailing characters around an parameter reference, it is subject to string interpolation. The effective value of the field is a string containing the leading characters; followed by the string value of the parameter reference; followed by the trailing characters. The string value of the parameter reference is its textual JSON representation with the following rules:

  • Leading and trailing quotes are stripped from strings
  • Objects entries are sorted by key

Multiple parameter references may appear in a single field. This case is must be treated as a string interpolation. After interpolating the first parameter reference, interpolation must be recursively applied to the trailing characters to yield the final string value.

3.5 Expressions

An expression is a fragment of Javascript/ECMAScript 5.1 code which is evaluated by the workflow platform to affect the inputs, outputs, or behavior of a process. In the generic execution sequence, expressions may be evaluated during step 5 (process setup), step 6 (execute process), and/or step 7 (capture output). Expressions are distinct from regular processes in that they are intended to modify the behavior of the workflow itself rather than perform the primary work of the workflow.

To declare the use of expressions, the document must include the process requirement InlineJavascriptRequirement. Expressions may be used in any field permitting the pseudo-type Expression, as specified by this document.

Expressions are denoted by the syntax $(...) or ${...}. A code fragment wrapped in the $(...) syntax must be evaluated as a ECMAScript expression. A code fragment wrapped in the ${...} syntax must be evaluated as a EMACScript function body for an anonymous, zero-argument function. Expressions must return a valid JSON data type: one of null, string, number, boolean, array, object. Implementations must permit any syntactically valid Javascript and account for nesting of parenthesis or braces and that strings that may contain parenthesis or braces when scanning for expressions.

The runtime must include any code defined in the "expressionLib" field of InlineJavascriptRequirement prior to executing the actual expression.

Before executing the expression, the runtime must initialize as global variables the fields of the parameter context described above.

The effective value of the field after expression evaluation follows the same rules as parameter references discussed above. Multiple expressions may appear in a single field.

Expressions must be evaluated in an isolated context (a "sandbox") which permits no side effects to leak outside the context. Expressions also must be evaluated in Javascript strict mode.

The order in which expressions are evaluated is undefined except where otherwise noted in this document.

An implementation may choose to implement parameter references by evaluating as a Javascript expression. The results of evaluating parameter references must be identical whether implemented by Javascript evaluation or some other means.

Implementations may apply other limits, such as process isolation, timeouts, and operating system containers/jails to minimize the security risks associated with running untrusted code embedded in a CWL document.

3.6 Success and failure

A completed process must result in one of success, temporaryFailure or permanentFailure states. An implementation may choose to retry a process execution which resulted in temporaryFailure. An implementation may choose to either continue running other steps of a workflow, or terminate immediately upon permanentFailure.

  • If any step of a workflow execution results in permanentFailure, then the workflow status is permanentFailure.

  • If one or more steps result in temporaryFailure and all other steps complete success or are not executed, then the workflow status is temporaryFailure.

  • If all workflow steps are executed and complete with success, then the workflow status is success.

3.7 Executing CWL documents as scripts

By convention, a CWL document may begin with #!/usr/bin/env cwl-runner and be marked as executable (the POSIX "+x" permission bits) to enable it to be executed directly. A workflow platform may support this mode of operation; if so, it must provide cwl-runner as an alias for the platform's CWL implementation.

A CWL input object document may similarly begin with #!/usr/bin/env cwl-runner and be marked as executable. In this case, the input object must include the field cwl:tool supplying a URI to the default CWL document that should be executed using the fields of the input object as input parameters.

4. Workflow

A workflow describes a set of steps and the dependencies between those processes. When a process produces output that will be consumed by a second process, the first process is a dependency of the second process.

When there is a dependency, the workflow engine must execute the preceeding process and wait for it to successfully produce output before executing the dependent process. If two processes are defined in the workflow graph that are not directly or indirectly dependent, these processes are independent, and may execute in any order or execute concurrently. A workflow is complete when all steps have been executed.

Dependencies between parameters are expressed using the source field on workflow step input parameters and workflow output parameters.

The source field expresses the dependency of one parameter on another such that when a value is associated with the parameter specified by source, that value is propagated to the destination parameter. When all data links inbound to a given step are fufilled, the step is ready to execute.

Workflow success and failure

A completed process must result in one of success, temporaryFailure or permanentFailure states. An implementation may choose to retry a process execution which resulted in temporaryFailure. An implementation may choose to either continue running other steps of a workflow, or terminate immediately upon permanentFailure.

  • If any step of a workflow execution results in permanentFailure, then the workflow status is permanentFailure.

  • If one or more steps result in temporaryFailure and all other steps complete success or are not executed, then the workflow status is temporaryFailure.

  • If all workflow steps are executed and complete with success, then the workflow status is success.

Extensions

ScatterFeatureRequirement and SubworkflowFeatureRequirement are available as standard extensions to core workflow semantics.

Fields

fieldtyperequireddescription
inputsarray<InputParameter>True

Defines the input parameters of the process. The process is ready to run when all required input parameters are associated with concrete values. Input parameters include a schema for each parameter which is used to validate the input object. It may also be used to build a user interface for constructing the input object.

outputsarray<WorkflowOutputParameter>True

Defines the parameters representing the output of the process. May be used to generate and/or validate the output object.

classstringTrue
stepsarray<WorkflowStep>True

The individual steps that make up the workflow. Each step is executed when all of its input data links are fufilled. An implementation may choose to execute the steps in a different order than listed and/or execute steps concurrently, provided that dependencies between steps are met.

idstringFalse

The unique identifier for this process object.

requirementsarray<InlineJavascriptRequirement | SchemaDefRequirement | DockerRequirement | CreateFileRequirement | EnvVarRequirement | ShellCommandRequirement | ResourceRequirement | SubworkflowFeatureRequirement | ScatterFeatureRequirement | MultipleInputFeatureRequirement | StepInputExpressionRequirement>False

Declares requirements that apply to either the runtime environment or the workflow engine that must be met in order to execute this process. If an implementation cannot satisfy all requirements, or a requirement is listed which is not recognized by the implementation, it is a fatal error and the implementation must not attempt to run the process, unless overridden at user option.

hintsarray<Any>False

Declares hints applying to either the runtime environment or the workflow engine that may be helpful in executing this process. It is not an error if an implementation cannot satisfy all hints, however the implementation may report a warning.

labelstringFalse

A short, human-readable label of this process object.

descriptionstringFalse

A long, human-readable description of this process object.

cwlVersionCWLVersionsFalse

CWL document version

4.1 WorkflowOutputParameter

Describe an output parameter of a workflow. The parameter must be connected to one or more parameters defined in the workflow that will provide the value of the output parameter.

Fields

fieldtyperequireddescription
idstringTrue

The unique identifier for this parameter object.

secondaryFilesstring | Expression | array<string | Expression>False

Only valid when type: File or is an array of items: File.

Describes files that must be included alongside the primary file(s).

If the value is an expression, the value of self in the expression must be the primary input or output File to which this binding applies.

If the value is a string, it specifies that the following pattern should be applied to the primary file:

  1. If string begins with one or more caret ^ characters, for each caret, remove the last file extension from the path (the last period . and all following characters). If there are no file extensions, the path is unchanged.
  2. Append the remainder of the string to the end of the file path.
formatstring | array<string> | ExpressionFalse

Only valid when type: File or is an array of items: File.

For input parameters, this must be one or more URIs of a concept nodes that represents file formats which are allowed as input to this parameter, preferrably defined within an ontology. If no ontology is available, file formats may be tested by exact match.

For output parameters, this is the file format that will be assigned to the output parameter.

streamablebooleanFalse

Only valid when type: File or is an array of items: File.

A value of true indicates that the file is read or written sequentially without seeking. An implementation may use this flag to indicate whether it is valid to stream file contents using a named pipe. Default: false.

typeCWLType | OutputRecordSchema | OutputEnumSchema | OutputArraySchema | string | array<CWLType | OutputRecordSchema | OutputEnumSchema | OutputArraySchema | string>False

Specify valid types of data that may be assigned to this parameter.

labelstringFalse

A short, human-readable label of this parameter object.

descriptionstringFalse

A long, human-readable description of this parameter object.

outputBindingCommandOutputBindingFalse

Describes how to handle the outputs of a process.

sourcestring | array<string>False

Specifies one or more workflow parameters that will provide input to the underlying process parameter.

linkMergeLinkMergeMethodFalse

The method to use to merge multiple inbound links into a single array. If not specified, the default method is "merge_nested".

4.1.1 Expression

Not a real type. Indicates that a field must allow runtime parameter references. If InlineJavascriptRequirement is declared and supported by the platform, the field must also allow Javascript expressions.

Symbols

symboldescription
ExpressionPlaceholder

4.1.2 CWLType

Extends primitive types with the concept of a file as a first class type.

Symbols

symboldescription
null no value
boolean a binary value
int 32-bit signed integer
long 64-bit signed integer
float single precision (32-bit) IEEE 754 floating-point number
double double precision (64-bit) IEEE 754 floating-point number
string Unicode character sequence
File A File object

4.1.3 File

Represents a file (or group of files if secondaryFiles is specified) that must be accessible by tools using standard POSIX file system call API such as open(2) and read(2).

Fields

fieldtyperequireddescription
classFile_classTrue

Must be File to indicate this object describes a file.

pathstringTrue

The path to the file.

checksumstringFalse

Optional hash code for validating file integrity. Currently must be in the form "sha1$ + hexidecimal string" using the SHA-1 algorithm.

sizelongFalse

Optional file size.

secondaryFilesarray<File>False

A list of additional files that are associated with the primary file and must be transferred alongside the primary file. Examples include indexes of the primary file, or external references which must be included when loading primary document. A file object listed in secondaryFiles may itself include secondaryFiles for which the same rules apply.

formatstringFalse

The format of the file. This must be a URI of a concept node that represents the file format, preferrably defined within an ontology. If no ontology is available, file formats may be tested by exact match.

Reasoning about format compatability must be done by checking that an input file format is the same, owl:equivalentClass or rdfs:subClassOf the format required by the input parameter. owl:equivalentClass is transitive with rdfs:subClassOf, e.g. if <B> owl:equivalentClass <C> and <B> owl:subclassOf <A> then infer <C> owl:subclassOf <A>.

File format ontologies may be provided in the "$schema" metadata at the root of the document. If no ontologies are specified in $schema, the runtime may perform exact file format matches.

4.1.4 OutputRecordSchema

Fields

fieldtyperequireddescription
typeRecord_symbolTrue

Must be record

fieldsarray<OutputRecordField>False

Defines the fields of the record.

secondaryFilesstring | Expression | array<string | Expression>False

Only valid when type: File or is an array of items: File.

Describes files that must be included alongside the primary file(s).

If the value is an expression, the value of self in the expression must be the primary input or output File to which this binding applies.

If the value is a string, it specifies that the following pattern should be applied to the primary file:

  1. If string begins with one or more caret ^ characters, for each caret, remove the last file extension from the path (the last period . and all following characters). If there are no file extensions, the path is unchanged.
  2. Append the remainder of the string to the end of the file path.
formatstring | array<string> | ExpressionFalse

Only valid when type: File or is an array of items: File.

For input parameters, this must be one or more URIs of a concept nodes that represents file formats which are allowed as input to this parameter, preferrably defined within an ontology. If no ontology is available, file formats may be tested by exact match.

For output parameters, this is the file format that will be assigned to the output parameter.

streamablebooleanFalse

Only valid when type: File or is an array of items: File.

A value of true indicates that the file is read or written sequentially without seeking. An implementation may use this flag to indicate whether it is valid to stream file contents using a named pipe. Default: false.

4.1.4.1 OutputRecordField

Fields

fieldtyperequireddescription
namestringTrue

The name of the field

typePrimitiveType | OutputRecordSchema | OutputEnumSchema | OutputArraySchema | string | array<PrimitiveType | OutputRecordSchema | OutputEnumSchema | OutputArraySchema | string>True

The field type

docstringFalse

A documentation string for this field

outputBindingCommandOutputBindingFalse
4.1.4.1.1 PrimitiveType

Salad data types are based on Avro schema declarations. Refer to the Avro schema declaration documentation for detailed information.

Symbols

symboldescription
null
boolean
int
long
float
double
string
4.1.4.1.2 OutputEnumSchema

Fields

fieldtyperequireddescription
typeEnum_symbolTrue

Must be enum

symbolsarray<string>True

Defines the set of valid symbols.

secondaryFilesstring | Expression | array<string | Expression>False

Only valid when type: File or is an array of items: File.

Describes files that must be included alongside the primary file(s).

If the value is an expression, the value of self in the expression must be the primary input or output File to which this binding applies.

If the value is a string, it specifies that the following pattern should be applied to the primary file:

  1. If string begins with one or more caret ^ characters, for each caret, remove the last file extension from the path (the last period . and all following characters). If there are no file extensions, the path is unchanged.
  2. Append the remainder of the string to the end of the file path.
formatstring | array<string> | ExpressionFalse

Only valid when type: File or is an array of items: File.

For input parameters, this must be one or more URIs of a concept nodes that represents file formats which are allowed as input to this parameter, preferrably defined within an ontology. If no ontology is available, file formats may be tested by exact match.

For output parameters, this is the file format that will be assigned to the output parameter.

streamablebooleanFalse

Only valid when type: File or is an array of items: File.

A value of true indicates that the file is read or written sequentially without seeking. An implementation may use this flag to indicate whether it is valid to stream file contents using a named pipe. Default: false.

outputBindingCommandOutputBindingFalse
4.1.4.1.2.1 CommandOutputBinding

Describes how to generate an output parameter based on the files produced by a CommandLineTool.

The output parameter is generated by applying these operations in the following order:

  • glob
  • loadContents
  • outputEval

Fields

fieldtyperequireddescription
globstring | Expression | array<string>False

Find files relative to the output directory, using POSIX glob(3) pathname matching. If provided an array, find files that match any pattern in the array. If provided an expression, the expression must return a string or an array of strings, which will then be evaluated as one or more glob patterns. Must only match and return files which actually exist.

loadContentsbooleanFalse

For each file matched in glob, read up to the first 64 KiB of text from the file and place it in the contents field of the file object for manipulation by outputEval.

outputEvalstring | ExpressionFalse

Evaluate an expression to generate the output value. If glob was specified, the value of self must be an array containing file objects that were matched. If no files were matched, self' must be a zero length array; if a single file was matched, the value ofselfis an array of a single element. Additionally, ifloadContentsistrue, the File objects must include up to the first 64 KiB of file contents in thecontents` field.

4.1.4.1.3 OutputArraySchema

Fields

fieldtyperequireddescription
typeArray_symbolTrue

Must be array

itemsPrimitiveType | OutputRecordSchema | OutputEnumSchema | OutputArraySchema | string | array<PrimitiveType | OutputRecordSchema | OutputEnumSchema | OutputArraySchema | string>True

Defines the type of the array elements.

secondaryFilesstring | Expression | array<string | Expression>False

Only valid when type: File or is an array of items: File.

Describes files that must be included alongside the primary file(s).

If the value is an expression, the value of self in the expression must be the primary input or output File to which this binding applies.

If the value is a string, it specifies that the following pattern should be applied to the primary file:

  1. If string begins with one or more caret ^ characters, for each caret, remove the last file extension from the path (the last period . and all following characters). If there are no file extensions, the path is unchanged.
  2. Append the remainder of the string to the end of the file path.
formatstring | array<string> | ExpressionFalse

Only valid when type: File or is an array of items: File.

For input parameters, this must be one or more URIs of a concept nodes that represents file formats which are allowed as input to this parameter, preferrably defined within an ontology. If no ontology is available, file formats may be tested by exact match.

For output parameters, this is the file format that will be assigned to the output parameter.

streamablebooleanFalse

Only valid when type: File or is an array of items: File.

A value of true indicates that the file is read or written sequentially without seeking. An implementation may use this flag to indicate whether it is valid to stream file contents using a named pipe. Default: false.

outputBindingCommandOutputBindingFalse

4.1.5 LinkMergeMethod

The input link merge method, described in WorkflowStepInput.

Symbols

symboldescription
merge_nested
merge_flattened

4.2 WorkflowStep

A workflow step is an executable element of a workflow. It specifies the underlying process implementation (such as CommandLineTool) in the run field and connects the input and output parameters of the underlying process to workflow parameters.

Scatter/gather

To use scatter/gather, ScatterFeatureRequirement must be specified in the workflow or workflow step requirements.

A "scatter" operation specifies that the associated workflow step or subworkflow should execute separately over a list of input elements. Each job making up a scatter operation is independent and may be executed concurrently.

The scatter field specifies one or more input parameters which will be scattered. An input parameter may be listed more than once. The declared type of each input parameter is implicitly wrapped in an array for each time it appears in the scatter field. As a result, upstream parameters which are connected to scattered parameters may be arrays.

All output parameter types are also implicitly wrapped in arrays. Each job in the scatter results in an entry in the output array.

If scatter declares more than one input parameter, scatterMethod describes how to decompose the input into a discrete set of jobs.

  • dotproduct specifies that each of the input arrays are aligned and one element taken from each array to construct each job. It is an error if all input arrays are not the same length.

  • nested_crossproduct specifies the Cartesian product of the inputs, producing a job for every combination of the scattered inputs. The output must be nested arrays for each level of scattering, in the order that the input arrays are listed in the scatter field.

  • flat_crossproduct specifies the Cartesian product of the inputs, producing a job for every combination of the scattered inputs. The output arrays must be flattened to a single level, but otherwise listed in the order that the input arrays are listed in the scatter field.

Subworkflows

To specify a nested workflow as part of a workflow step, SubworkflowFeatureRequirement must be specified in the workflow or workflow step requirements.

Fields

fieldtyperequireddescription
idstringTrue

The unique identifier for this workflow step.

inputsarray<WorkflowStepInput>True

Defines the input parameters of the workflow step. The process is ready to run when all required input parameters are associated with concrete values. Input parameters include a schema for each parameter which is used to validate the input object. It may also be used build a user interface for constructing the input object.

outputsarray<WorkflowStepOutput>True

Defines the parameters representing the output of the process. May be used to generate and/or validate the output object.

runstring | CommandLineTool | ExpressionTool | WorkflowTrue

Specifies the process to run.

requirementsarray<InlineJavascriptRequirement | SchemaDefRequirement | DockerRequirement | CreateFileRequirement | EnvVarRequirement | ShellCommandRequirement | ResourceRequirement | SubworkflowFeatureRequirement | ScatterFeatureRequirement | MultipleInputFeatureRequirement | StepInputExpressionRequirement>False

Declares requirements that apply to either the runtime environment or the workflow engine that must be met in order to execute this workflow step. If an implementation cannot satisfy all requirements, or a requirement is listed which is not recognized by the implementation, it is a fatal error and the implementation must not attempt to run the process, unless overridden at user option.

hintsarray<Any>False

Declares hints applying to either the runtime environment or the workflow engine that may be helpful in executing this workflow step. It is not an error if an implementation cannot satisfy all hints, however the implementation may report a warning.

labelstringFalse

A short, human-readable label of this process object.

descriptionstringFalse

A long, human-readable description of this process object.

scatterstring | array<string>False
scatterMethodScatterMethodFalse

Required if scatter is an array of more than one element.

4.2.1 WorkflowStepInput

The input of a workflow step connects an upstream parameter (from the workflow inputs, or the outputs of other workflows steps) with the input parameters of the underlying process.

Input object

A WorkflowStepInput object must contain an id field in the form #fieldname or #stepname.fieldname. When the id field contains a period . the field name consists of the characters following the final period. This defines a field of the workflow step input object with the value of the source parameter(s).

Merging

To merge multiple inbound data links, MultipleInputFeatureRequirement must be specified in the workflow or workflow step requirements.

If the sink parameter is an array, or named in a workflow scatter operation, there may be multiple inbound data links listed in the source field. The values from the input links are merged depending on the method specified in the linkMerge field. If not specified, the default method is "merge_nested".

  • merge_nested

    The input must be an array consisting of exactly one entry for each input link. If "merge_nested" is specified with a single link, the value from the link must be wrapped in a single-item list.

  • merge_flattened

    1. The source and sink parameters must be compatible types, or the source type must be compatible with single element from the "items" type of the destination array parameter.
    2. Source parameters which are arrays are concatenated. Source parameters which are single element types are appended as single elements.

Fields

fieldtyperequireddescription
idstringTrue

A unique identifier for this workflow input parameter.

sourcestring | array<string>False

Specifies one or more workflow parameters that will provide input to the underlying process parameter.

linkMergeLinkMergeMethodFalse

The method to use to merge multiple inbound links into a single array. If not specified, the default method is "merge_nested".

defaultAnyFalse

The default value for this parameter if there is no source field.

valueFromstring | ExpressionFalse

To use valueFrom, StepInputExpressionRequirement must be specified in the workflow or workflow step requirements.

If valueFrom is a constant string value, use this as the value for this input parameter.

If valueFrom is a parameter reference or expression, it must be evaluated to yield the actual value to be assiged to the input field.

The self value of in the parameter reference or expression must be the value of the parameter(s) specified in the source field, or null if there is no source field.

The value of inputs in the parameter reference or expression is the input object to the workflow step after assigning the source values, but before evaluating any step with valueFrom. The order of evaluating valueFrom among step input parameters is undefined.

4.2.1.1 Any

The Any type validates for any non-null value.

Symbols

symboldescription
Any

4.2.2 WorkflowStepOutput

Associate an output parameter of the underlying process with a workflow parameter. The workflow parameter (given in the id field) be may be used as a source to connect with input parameters of other workflow steps, or with an output parameter of the process.

Fields

fieldtyperequireddescription
idstringTrue

A unique identifier for this workflow output parameter. This is the identifier to use in the source field of WorkflowStepInput to connect the output value to downstream parameters.

4.2.3 ScatterMethod

The scatter method, as described in workflow step scatter.

Symbols

symboldescription
dotproduct
nested_crossproduct
flat_crossproduct

4.2.4 InlineJavascriptRequirement

Indicates that the workflow platform must support inline Javascript expressions. If this requirement is not present, the workflow platform must not perform expression interpolatation.

Fields

fieldtyperequireddescription
classstringTrue

The specific requirement type.

expressionLibarray<string>False

Additional code fragments that will also be inserted before executing the expression code. Allows for function definitions that may be called from CWL expressions.

4.2.5 SchemaDefRequirement

This field consists of an array of type definitions which must be used when interpreting the inputs and outputs fields. When a type field contain a URI, the implementation must check if the type is defined in schemaDefs and use that definition. If the type is not found in schemaDefs, it is an error. The entries in schemaDefs must be processed in the order listed such that later schema definitions may refer to earlier schema definitions.

Fields

fieldtyperequireddescription
classstringTrue

The specific requirement type.

typesarray<InputRecordSchema | InputEnumSchema | InputArraySchema>True

The list of type definitions.

4.2.5.1 InputRecordSchema

Fields

fieldtyperequireddescription
typeRecord_symbolTrue

Must be record

fieldsarray<InputRecordField>False

Defines the fields of the record.

secondaryFilesstring | Expression | array<string | Expression>False

Only valid when type: File or is an array of items: File.

Describes files that must be included alongside the primary file(s).

If the value is an expression, the value of self in the expression must be the primary input or output File to which this binding applies.

If the value is a string, it specifies that the following pattern should be applied to the primary file:

  1. If string begins with one or more caret ^ characters, for each caret, remove the last file extension from the path (the last period . and all following characters). If there are no file extensions, the path is unchanged.
  2. Append the remainder of the string to the end of the file path.
formatstring | array<string> | ExpressionFalse

Only valid when type: File or is an array of items: File.

For input parameters, this must be one or more URIs of a concept nodes that represents file formats which are allowed as input to this parameter, preferrably defined within an ontology. If no ontology is available, file formats may be tested by exact match.

For output parameters, this is the file format that will be assigned to the output parameter.

streamablebooleanFalse

Only valid when type: File or is an array of items: File.

A value of true indicates that the file is read or written sequentially without seeking. An implementation may use this flag to indicate whether it is valid to stream file contents using a named pipe. Default: false.

4.2.5.1.1 InputRecordField

Fields

fieldtyperequireddescription
namestringTrue

The name of the field

typePrimitiveType | InputRecordSchema | InputEnumSchema | InputArraySchema | string | array<PrimitiveType | InputRecordSchema | InputEnumSchema | InputArraySchema | string>True

The field type

docstringFalse

A documentation string for this field

inputBindingCommandLineBindingFalse
4.2.5.1.1.1 InputEnumSchema

Fields

fieldtyperequireddescription
typeEnum_symbolTrue

Must be enum

symbolsarray<string>True

Defines the set of valid symbols.

secondaryFilesstring | Expression | array<string | Expression>False

Only valid when type: File or is an array of items: File.

Describes files that must be included alongside the primary file(s).

If the value is an expression, the value of self in the expression must be the primary input or output File to which this binding applies.

If the value is a string, it specifies that the following pattern should be applied to the primary file:

  1. If string begins with one or more caret ^ characters, for each caret, remove the last file extension from the path (the last period . and all following characters). If there are no file extensions, the path is unchanged.
  2. Append the remainder of the string to the end of the file path.
formatstring | array<string> | ExpressionFalse

Only valid when type: File or is an array of items: File.

For input parameters, this must be one or more URIs of a concept nodes that represents file formats which are allowed as input to this parameter, preferrably defined within an ontology. If no ontology is available, file formats may be tested by exact match.

For output parameters, this is the file format that will be assigned to the output parameter.

streamablebooleanFalse

Only valid when type: File or is an array of items: File.

A value of true indicates that the file is read or written sequentially without seeking. An implementation may use this flag to indicate whether it is valid to stream file contents using a named pipe. Default: false.

inputBindingCommandLineBindingFalse
# 4.2.5.1.1.1.1 CommandLineBinding

When listed under inputBinding in the input schema, the term "value" refers to the the corresponding value in the input object. For binding objects listed in CommandLineTool.arguments, the term "value" refers to the effective value after evaluating valueFrom.

The binding behavior when building the command line depends on the data type of the value. If there is a mismatch between the type described by the input schema and the effective value, such as resulting from an expression evaluation, an implementation must use the data type of the effective value.

  • string: Add prefix and the string to the command line.

  • number: Add prefix and decimal representation to command line.

  • boolean: If true, add prefix to the command line. If false, add nothing.

  • File: Add prefix and the value of File.path to the command line.

  • array: If itemSeparator is specified, add prefix and the join the array into a single string with itemSeparator separating the items. Otherwise first add prefix, then recursively process individual elements.

  • object: Add prefix only, and recursively add object fields for which inputBinding is specified.

  • null: Add nothing.

Fields

fieldtyperequireddescription
loadContentsbooleanFalse

Only valid when type: File or is an array of items: File.

Read up to the first 64 KiB of text from the file and place it in the "contents" field of the file object for use by expressions.

positionintFalse

The sorting key. Default position is 0.

prefixstringFalse

Command line prefix to add before the value.

separatebooleanFalse

If true (default), then the prefix and value must be added as separate command line arguments; if false, prefix and value must be concatenated into a single command line argument.

itemSeparatorstringFalse

Join the array elements into a single string with the elements separated by by itemSeparator.

valueFromstring | ExpressionFalse

If valueFrom is a constant string value, use this as the value and apply the binding rules above.

If valueFrom is an expression, evaluate the expression to yield the actual value to use to build the command line and apply the binding rules above. If the inputBinding is associated with an input parameter, the value of self in the expression will be the value of the input parameter.

When a binding is part of the CommandLineTool.arguments field, the valueFrom field is required.

shellQuotebooleanFalse

If ShellCommandRequirement is in the requirements for the current command, this controls whether the value is quoted on the command line (default is true). Use shellQuote: false to inject metacharacters for operations such as pipes.

4.2.5.1.1.2 InputArraySchema

Fields

fieldtyperequireddescription
typeArray_symbolTrue

Must be array

itemsPrimitiveType | InputRecordSchema | InputEnumSchema | InputArraySchema | string | array<PrimitiveType | InputRecordSchema | InputEnumSchema | InputArraySchema | string>True

Defines the type of the array elements.

secondaryFilesstring | Expression | array<string | Expression>False

Only valid when type: File or is an array of items: File.

Describes files that must be included alongside the primary file(s).

If the value is an expression, the value of self in the expression must be the primary input or output File to which this binding applies.

If the value is a string, it specifies that the following pattern should be applied to the primary file:

  1. If string begins with one or more caret ^ characters, for each caret, remove the last file extension from the path (the last period . and all following characters). If there are no file extensions, the path is unchanged.
  2. Append the remainder of the string to the end of the file path.
formatstring | array<string> | ExpressionFalse

Only valid when type: File or is an array of items: File.

For input parameters, this must be one or more URIs of a concept nodes that represents file formats which are allowed as input to this parameter, preferrably defined within an ontology. If no ontology is available, file formats may be tested by exact match.

For output parameters, this is the file format that will be assigned to the output parameter.

streamablebooleanFalse

Only valid when type: File or is an array of items: File.

A value of true indicates that the file is read or written sequentially without seeking. An implementation may use this flag to indicate whether it is valid to stream file contents using a named pipe. Default: false.

inputBindingCommandLineBindingFalse

4.2.6 SubworkflowFeatureRequirement

Indicates that the workflow platform must support nested workflows in the run field of (WorkflowStep)(#WorkflowStep).

Fields

fieldtyperequireddescription
classstringTrue

The specific requirement type.

4.2.7 ScatterFeatureRequirement

Indicates that the workflow platform must support the scatter and scatterMethod fields of WorkflowStep.

Fields

fieldtyperequireddescription
classstringTrue

The specific requirement type.

4.2.8 MultipleInputFeatureRequirement

Indicates that the workflow platform must support multiple inbound data links listed in the source field of WorkflowStepInput.

Fields

fieldtyperequireddescription
classstringTrue

The specific requirement type.

4.2.9 StepInputExpressionRequirement

Indicate that the workflow platform must support the valueFrom field of WorkflowStepInput.

Fields

fieldtyperequireddescription
classstringTrue

The specific requirement type.

4.2.10 ExpressionTool

Execute an expression as a process step.

Fields

fieldtyperequireddescription
inputsarray<InputParameter>True

Defines the input parameters of the process. The process is ready to run when all required input parameters are associated with concrete values. Input parameters include a schema for each parameter which is used to validate the input object. It may also be used to build a user interface for constructing the input object.

outputsarray<OutputParameter>True

Defines the parameters representing the output of the process. May be used to generate and/or validate the output object.

classstringTrue
expressionstring | ExpressionTrue

The expression to execute. The expression must return a JSON object which matches the output parameters of the ExpressionTool.

idstringFalse

The unique identifier for this process object.

requirementsarray<InlineJavascriptRequirement | SchemaDefRequirement | DockerRequirement | CreateFileRequirement | EnvVarRequirement | ShellCommandRequirement | ResourceRequirement | SubworkflowFeatureRequirement | ScatterFeatureRequirement | MultipleInputFeatureRequirement | StepInputExpressionRequirement>False

Declares requirements that apply to either the runtime environment or the workflow engine that must be met in order to execute this process. If an implementation cannot satisfy all requirements, or a requirement is listed which is not recognized by the implementation, it is a fatal error and the implementation must not attempt to run the process, unless overridden at user option.

hintsarray<Any>False

Declares hints applying to either the runtime environment or the workflow engine that may be helpful in executing this process. It is not an error if an implementation cannot satisfy all hints, however the implementation may report a warning.

labelstringFalse

A short, human-readable label of this process object.

descriptionstringFalse

A long, human-readable description of this process object.

cwlVersionCWLVersionsFalse

CWL document version

4.2.10.1 InputParameter

Fields

fieldtyperequireddescription
idstringTrue

The unique identifier for this parameter object.

secondaryFilesstring | Expression | array<string | Expression>False

Only valid when type: File or is an array of items: File.

Describes files that must be included alongside the primary file(s).

If the value is an expression, the value of self in the expression must be the primary input or output File to which this binding applies.

If the value is a string, it specifies that the following pattern should be applied to the primary file:

  1. If string begins with one or more caret ^ characters, for each caret, remove the last file extension from the path (the last period . and all following characters). If there are no file extensions, the path is unchanged.
  2. Append the remainder of the string to the end of the file path.
formatstring | array<string> | ExpressionFalse

Only valid when type: File or is an array of items: File.

For input parameters, this must be one or more URIs of a concept nodes that represents file formats which are allowed as input to this parameter, preferrably defined within an ontology. If no ontology is available, file formats may be tested by exact match.

For output parameters, this is the file format that will be assigned to the output parameter.

streamablebooleanFalse

Only valid when type: File or is an array of items: File.

A value of true indicates that the file is read or written sequentially without seeking. An implementation may use this flag to indicate whether it is valid to stream file contents using a named pipe. Default: false.

typeCWLType | InputRecordSchema | InputEnumSchema | InputArraySchema | string | array<CWLType | InputRecordSchema | InputEnumSchema | InputArraySchema | string>False

Specify valid types of data that may be assigned to this parameter.

labelstringFalse

A short, human-readable label of this parameter object.

descriptionstringFalse

A long, human-readable description of this parameter object.

inputBindingCommandLineBindingFalse

Describes how to handle the inputs of a process and convert them into a concrete form for execution, such as command line parameters.

defaultAnyFalse

The default value for this parameter if not provided in the input object.

4.2.10.2 OutputParameter

Fields

fieldtyperequireddescription
idstringTrue

The unique identifier for this parameter object.

secondaryFilesstring | Expression | array<string | Expression>False

Only valid when type: File or is an array of items: File.

Describes files that must be included alongside the primary file(s).

If the value is an expression, the value of self in the expression must be the primary input or output File to which this binding applies.

If the value is a string, it specifies that the following pattern should be applied to the primary file:

  1. If string begins with one or more caret ^ characters, for each caret, remove the last file extension from the path (the last period . and all following characters). If there are no file extensions, the path is unchanged.
  2. Append the remainder of the string to the end of the file path.
formatstring | array<string> | ExpressionFalse

Only valid when type: File or is an array of items: File.

For input parameters, this must be one or more URIs of a concept nodes that represents file formats which are allowed as input to this parameter, preferrably defined within an ontology. If no ontology is available, file formats may be tested by exact match.

For output parameters, this is the file format that will be assigned to the output parameter.

streamablebooleanFalse

Only valid when type: File or is an array of items: File.

A value of true indicates that the file is read or written sequentially without seeking. An implementation may use this flag to indicate whether it is valid to stream file contents using a named pipe. Default: false.

typeCWLType | OutputRecordSchema | OutputEnumSchema | OutputArraySchema | string | array<CWLType | OutputRecordSchema | OutputEnumSchema | OutputArraySchema | string>False

Specify valid types of data that may be assigned to this parameter.

labelstringFalse

A short, human-readable label of this parameter object.

descriptionstringFalse

A long, human-readable description of this parameter object.

outputBindingCommandOutputBindingFalse

Describes how to handle the outputs of a process.

4.2.10.3 CWLVersions

Version symbols for published CWL document versions.

Symbols

symboldescription
draft-2
draft-3.dev1
draft-3.dev2
draft-3.dev3
draft-3.dev4
draft-3.dev5
draft-3