# Workflows A workflow is a CWL processing unit that executes command-line tools, expression tools, or workflows (sub-workflows) as steps. It must have `inputs`, `outputs`, and `steps` defined in the CWL document. % TODO: Fix the missing link the graph below. We cannot have % it here as this file is included in two other files. % Sphinx prohibits it for the case where this could lead % to duplicate anchors in a page (e.g. single-html). % :name: workflow-graph ```{graphviz} :caption: CWL workflow. :align: center digraph G { compound=true; rankdir="LR"; fontname="Verdana"; fontsize="10"; graph [splines=ortho]; node [fontname="Verdana", fontsize="10", shape=box]; edge [fontname="Verdana", fontsize="10"]; subgraph cluster_0 { node [width = 1.75]; steps_0[style="filled" label="Command-line tools"]; steps_1[style="filled" label="Expression tools"]; steps_2[style="filled" label="Sub-workflows"]; label="steps"; fill=gray; } inputs -> steps_1 [lhead=cluster_0]; steps_1 -> outputs [ltail=cluster_0]; } ``` The CWL document `echo-uppercase.cwl` defines a workflow that runs the command-line tool, and the expression tool showed in the earlier examples. % TODO: Fix the missing link the code below. We cannot have % it here as this file is included in two other files. % Sphinx prohibits it for the case where this could lead % to duplicate anchors in a page (e.g. single-html). % :name: echo-uppercase.cwl ```{code-block} cwl :caption: "`echo-uppercase.cwl`" cwlVersion: v1.2 class: Workflow requirements: InlineJavascriptRequirement: {} inputs: message: string outputs: out: type: string outputSource: uppercase/uppercase_message steps: echo: run: echo.cwl in: message: message out: [out] uppercase: run: uppercase.cwl in: message: source: echo/out out: [uppercase_message] ``` A command-line tool or expression tool can also be written directly in the same CWL document as the workflow. For example, we can rewrite the `echo-uppercase.cwl` workflow as a single file: % TODO: Fix the missing link the code below. We cannot have % it here as this file is included in two other files. % Sphinx prohibits it for the case where this could lead % to duplicate anchors in a page (e.g. single-html). % :name: echo-uppercase-single-file.cwl ```{code-block} cwl :caption: "`echo-uppercase-single-file.cwl`" cwlVersion: v1.2 class: Workflow requirements: InlineJavascriptRequirement: {} inputs: message: string outputs: out: type: string outputSource: uppercase/uppercase_message steps: echo: run: class: CommandLineTool baseCommand: echo stdout: output.txt inputs: message: type: string inputBinding: {} outputs: out: type: string outputBinding: glob: output.txt loadContents: true outputEval: $(self[0].contents) in: message: message out: [out] uppercase: run: class: ExpressionTool requirements: InlineJavascriptRequirement: {} inputs: message: string outputs: uppercase_message: string expression: | ${ return {"uppercase_message": inputs.message.toUpperCase()}; } in: message: source: echo/out out: [uppercase_message] ``` Having separate files helps with modularity and code organization. But it can be helpful writing everything in a single file for development. There are other ways to combine multiple files into a single file (e.g. `cwltool --pack`) discussed further in other sections of this user guide. % TODO: add a link to the page about SubworkflowFeatureRequirement ```{note} For a sub-workflows you need to enable the requirement `SubworkflowFeatureRequirement`. It is covered in another section of this user guide in more detail. ``` ## Writing Workflows This workflow extracts a java source file from a tar file and then compiles it. ```{literalinclude} /_includes/cwl/workflows/1st-workflow.cwl :language: cwl :caption: "`1st-workflow.cwl`" :name: 1st-workflow.cwl ``` % TODO: The link below is for a previous commit with the workflow above. Ideally, I think we should either use `cwltool` % or Python to add the graph here. Or, maybe re-create the graph for the latest main version? ```{admonition} Visualization of 1st-workflow.cwl [](https://view.commonwl.org/graph/png/github.com/common-workflow-language/user_guide/blob/a29e7eae0006660946fc705a310b37a21a7e1edc/_includes/cwl/21-1st-workflow/1st-workflow.cwl) ``` Use a YAML or a JSON object in a separate file to describe the input of a run: ```{literalinclude} /_includes/cwl/workflows/1st-workflow-job.yml :language: yaml :caption: "`1st-workflow-job.yml`" :name: 1st-workflow-job.yml ``` Next, create a sample Java file and add it to a tar file to use with the command-line tool. ```{code-block} console $ echo "public class Hello {}" > Hello.java && tar -cvf hello.tar Hello.java Hello.java ``` Now invoke `cwltool` with the tool description and the input object on the command line: ```{runcmd} cwltool 1st-workflow.cwl 1st-workflow-job.yml :working-directory: src/_includes/cwl/workflows/ ``` What's going on here? Let's break it down: ```cwl cwlVersion: v1.0 class: Workflow ``` The `cwlVersion` field indicates the version of the CWL spec used by the document. The `class` field indicates this document describes a workflow. ```cwl inputs: tarball: File name_of_file_to_extract: string ``` The `inputs` section describes the inputs of the workflow. This is a list of input parameters where each parameter consists of an identifier and a data type. These parameters can be used as sources for input to specific workflows steps. ```cwl outputs: compiled_class: type: File outputSource: compile/classfile ``` The `outputs` section describes the outputs of the workflow. This is a list of output parameters where each parameter consists of an identifier and a data type. The `outputSource` connects the output parameter `classfile` of the `compile` step to the workflow output parameter `compiled_class`. ```cwl steps: untar: run: tar-param.cwl in: tarfile: tarball extractfile: name_of_file_to_extract out: [extracted_file] ``` The `steps` section describes the actual steps of the workflow. In this example, the first step extracts a file from a tar file, and the second step compiles the file from the first step using the java compiler. Workflow steps are not necessarily run in the order they are listed, instead the order is determined by the dependencies between steps (using `source`). In addition, workflow steps which do not depend on one another may run in parallel. The first step, `untar` runs `tar-param.cwl` (described previously in [Parameter References](parameter-references.md)). This tool has two input parameters, `tarfile` and `extractfile` and one output parameter `extracted_file`. The ``in`` section of the workflow step connects these two input parameters to the inputs of the workflow, `tarball` and `name_of_file_to_extract` using `source`. This means that when the workflow step is executed, the values assigned to `tarball` and `name_of_file_to_extract` will be used for the parameters `tarfile` and `extractfile` in order to run the tool. The `out` section of the workflow step lists the output parameters that are expected from the tool. ```cwl compile: run: arguments.cwl in: src: untar/extracted_file out: [classfile] ``` The second step `compile` depends on the results from the first step by connecting the input parameter `src` to the output parameter of `untar` using `untar/extracted_file`. It runs `arguments.cwl` (described previously in [Additional Arguments and Parameters](additional-arguments-and-parameters.md)). The output of this step `classfile` is connected to the `outputs` section for the Workflow, described above. ## Nested Workflows Workflows are ways to combine multiple tools to perform a larger operations. We can also think of a workflow as being a tool itself; a CWL workflow can be used as a step in another CWL workflow, if the workflow engine supports the `SubworkflowFeatureRequirement`: ```cwl requirements: SubworkflowFeatureRequirement: {} ``` Here's an example workflow that uses our `1st-workflow.cwl` as a nested workflow: ```{literalinclude} /_includes/cwl/workflows/nestedworkflows.cwl :language: cwl :caption: "`nestedworkflows.cwl`" :name: nestedworkflows.cwl ``` ```{note}
Visualization of the workflow and the inner workflow from its `compile` step
This two-step workflow starts with the `create-tar` step which is connected to
the `compile` step in orange; `compile` is another workflow, diagrammed on the
right. In purple we see the fixed string `"Hello.java"` being supplied as the
`name_of_file_to_extract`.
```
A CWL `Workflow` can be used as a `step` just like a `CommandLineTool`, its CWL
file is included with `run`. The workflow inputs (`tarball` and `name_of_file_to_extract`) and outputs
(`compiled_class`) then can be mapped to become the step's input/outputs.
```cwl
compile:
run: 1st-workflow.cwl
in:
tarball: create-tar/tar_compressed_java_file
name_of_file_to_extract:
default: "Hello.java"
out: [compiled_class]
```
Our `1st-workflow.cwl` was parameterized with workflow inputs, so when running
it we had to provide a job file to denote the tar file and `*.java` filename.
This is generally best-practice, as it means it can be reused in multiple parent
workflows, or even in multiple steps within the same workflow.
Here we use `default:` to hard-code `"Hello.java"` as the `name_of_file_to_extract`
input, however our workflow also requires a tar file at `tarball`, which we will
prepare in the `create-tar` step. At this point it is probably a good idea to refactor
`1st-workflow.cwl` to have more specific input/output names, as those also
appear in its usage as a tool.
It is also possible to do a less generic approach and avoid external
dependencies in the job file. So in this workflow we can generate a hard-coded
`Hello.java` file using the previously mentioned `InitialWorkDirRequirement`
requirement, before adding it to a tar file.
```cwl
create-tar:
requirements:
InitialWorkDirRequirement:
listing:
- entryname: Hello.java
entry: |
public class Hello {
public static void main(String[] argv) {
System.out.println("Hello from Java");
}
}
```
In this case our step can assume `Hello.java` rather than be parameterized, so
we can use hardcoded values `hello.tar` and `Hello.java` in a `baseCommand` and
the resulting `outputs`:
```cwl
run:
class: CommandLineTool
inputs: []
baseCommand: [tar, --create, --file=hello.tar, Hello.java]
outputs:
tar_compressed_java_file:
type: File
streamable: true
outputBinding:
glob: "hello.tar"
```
Did you notice that we didn't split out the `tar --create` tool to a separate file,
but rather embedded it within the CWL Workflow file? This is generally not best
practice, as the tool then can't be reused. The reason for doing it in this case
is because the command line is hard-coded with filenames that only make sense
within this workflow.
In this example we had to prepare a tar file outside, but only because our inner
workflow was designed to take that as an input. A better refactoring of the
inner workflow would be to take a list of Java files to compile, which would
simplify its usage as a tool step in other workflows.
Nested workflows can be a powerful feature to generate higher-level functional
and reusable workflow units - but just like for creating a CWL Tool description,
care must be taken to improve its usability in multiple workflows.
## Scattering Steps
Now that we know how to write workflows, we can start utilizing the `ScatterFeatureRequirement`.
This feature tells the runner that you wish to run a tool or workflow multiple times over a list
of inputs. The workflow then takes the input(s) as an array and will run the specified step(s)
on each element of the array as if it were a single input. This allows you to run the same workflow
on multiple inputs without having to generate many different commands or input yaml files.
```cwl
requirements:
ScatterFeatureRequirement: {}
```
The most common reason a new user might want to use scatter is to perform the same analysis on
different samples. Let's start with a simple workflow that calls our first example
(`hello_world.cwl`) and takes an array of strings as input to the workflow:
```{literalinclude} /_includes/cwl/workflows/scatter-workflow.cwl
:language: cwl
:caption: "`scatter-workflow.cwl`"
:name: scatter-workflow.cwl
```
Aside from the `requirements` section including `ScatterFeatureRequirement`, what is
going on here?
```cwl
inputs:
message_array: string[]
```
First of all, notice that the main workflow level input here requires an array of strings.
```cwl
steps:
echo:
run: hello_world.cwl
scatter: message
in:
message: message_array
out: []
```
Here we've added a new field to the step `echo` called `scatter`. This field tells the
runner that we'd like to scatter over this input for this particular step. Note that
the input name listed after scatter is the one of the step's input, not a workflow level input.
For our first scatter, it's as simple as that! Since our tool doesn't collect any outputs, we
still use `outputs: []` in our workflow, but if you expect that the final output of your
workflow will now have multiple outputs to collect, be sure to update that to an array type
as well!
Using the following input file:
```{literalinclude} /_includes/cwl/workflows/scatter-job.yml
:language: yaml
:caption: "`scatter-job.yml`"
:name: scatter-job.yml
```
As a reminder, [`hello_world.cwl`](../introduction/quick-start.md) simply calls the command
`echo` on a message. If we invoke `cwltool scatter-workflow.cwl scatter-job.yml` on the
command line:
```{runcmd} cwltool scatter-workflow.cwl scatter-job.yml
:working-directory: src/_includes/cwl/workflows/
```
You can see that the workflow calls echo multiple times on each element of our
`message_array`. Ok, so how about if we want to scatter over two steps in a workflow?
Let's perform a simple echo like above, but capturing `stdout` by adding the following
lines instead of `outputs: []`
```{code-block} cwl
:caption: "`hello_world_to_stdout.cwl`"
:name: hello_world_to_stdout.cwl
outputs:
echo_out:
type: stdout
```
And add a second step that uses `wc` to count the characters in each file. See the tool
below:
```{literalinclude} /_includes/cwl/workflows/wc-tool.cwl
:language: cwl
:caption: "`wc-tool.cwl`"
:name: wc-tool.cwl
```
Now, how do we incorporate scatter? Remember the scatter field is under each step:
```{literalinclude} /_includes/cwl/workflows/scatter-two-steps.cwl
:language: cwl
:caption: "`scatter-two-steps.cwl`"
:name: scatter-two-steps.cwl
```
Here we have placed the scatter field under each step. This is fine for this example since
it runs quickly, but if you're running many samples for a more complex workflow, you may
wish to consider an alternative. Here we are running scatter on each step independently, but
since the second step is not dependent on the first step completing all languages, we aren't
using the scatter functionality efficiently. The second step expects an array as input from
the first step, so it will wait until everything in step one is finished before doing anything.
Pretend that `echo Hello World!` takes 1 minute to perform, `wc -c` on the output takes 3 minutes
and that `echo Hallo welt!` takes 5 minutes to perform, and `wc` on that output takes 3 minutes.
Even though `echo Hello World!` could finish in 4 minutes, it will actually finish in 8 minutes
because the first step must wait on `echo Hallo welt!`. You can see how this might not scale
well.
Ok, so how do we scatter on steps that can proceed independent of other samples? Remember from
[Nested Workflows](#nested-workflows), that we can make an entire workflow a single step in another workflow! Convert our
two-step workflow to a single step subworkflow:
```{literalinclude} /_includes/cwl/workflows/scatter-nested-workflow.cwl
:language: cwl
:caption: "`scatter-nested-workflow.cwl`"
:name: scatter-nested-workflow.cwl
```
Now the scatter acts on a single step, but that step consists of two steps so each step is performed
in parallel.
## Conditional Workflows
This workflow contains a conditional step and is executed based on the input.
This allows workflows to skip additional steps based on input parameters given at the start of the program or by previous steps.
```{code-block} cwl
:caption: "`conditional-workflow.cwl`"
:name: conditional-workflow.cwl
class: Workflow
cwlVersion: v1.2
inputs:
val: int
steps:
step1:
in:
in1: val
a_new_var: val
run: foo.cwl
when: $(inputs.in1 < 1)
out: [out1]
step2:
in:
in1: val
a_new_var: val
run: foo.cwl
when: $(inputs.a_new_var > 2)
out: [out1]
outputs:
out1:
type: string
outputSource:
- step1/out1
- step2/out1
pickValue: first_non_null
requirements:
InlineJavascriptRequirement: {}
MultipleInputFeatureRequirement: {}
```
The first thing you'll notice is that this workflow is only compatible for version 1.2 or greater of the CWL standards.
```cwl
class: Workflow
cwlVersion: v1.2
```
The first step of the workflow (step1) contains two input properties and will execute foo.cwl when the conditions are met. The new property `when` is where the condition validation takes place. In this case only when `in1` from the workflow contains a value `< 1` this step will be executed.
```cwl
steps:
step1:
in:
in1: val
a_new_var: val
run: foo.cwl
when: $(inputs.in1 < 1)
out: [out1]
```
Using the following command `cwltool cond-wf-003.1.cwl --val 0` the value will pass the first conditional step and will therefore be executed and is shown in the log by `INFO [step step1] start` whereas the second step is skipped as indicated by `INFO [step step2] will be skipped`.
```{code-block} console
INFO [workflow ] start
INFO [workflow ] starting step step1
INFO [step step1] start
INFO [job step1] /private/tmp/docker_tmpdcyoto2d$ echo
INFO [job step1] completed success
INFO [step step1] completed success
INFO [workflow ] starting step step2
INFO [step step2] will be skipped
INFO [step step2] completed skipped
INFO [workflow ] completed success
{
"out1": "foo 0"
}
INFO Final process status is success
```
When a value of 3 is given the first conditional step will not be executed but the second step will `cwltool cond-wf-003.1.cwl --val 3`.
```{code-block} console
INFO [workflow ] start
INFO [workflow ] starting step step1
INFO [step step1] will be skipped
INFO [step step1] completed skipped
INFO [workflow ] starting step step2
INFO [step step2] start
INFO [job step2] /private/tmp/docker_tmpqwr93mxx$ echo
INFO [job step2] completed success
INFO [step step2] completed success
INFO [workflow ] completed success
{
"out1": "foo 3"
}
INFO Final process status is success
```
If no conditions are met for example when using `--val 2` the workflow will raise a permanentFail.
```{code-block} console
$ cwltool cond-wf-003.1.cwl --val 2
INFO [workflow ] start
INFO [workflow ] starting step step1
INFO [step step1] will be skipped
INFO [step step1] completed skipped
INFO [workflow ] starting step step2
INFO [step step2] will be skipped
INFO [step step2] completed skipped
ERROR [workflow ] Cannot collect workflow output: All sources for 'out1' are null
INFO [workflow ] completed permanentFail
WARNING Final process status is permanentFail
```
% TODO
% - Scatter
% - ScatterMethod https://github.com/common-workflow-language/user_guide/issues/29
% - Also in the **episode 23** of the current User Guide - https://www.commonwl.org/user_guide/workflows/index.html
% - Subworkflows/nested workflows
% - Covered in the **episode 22** from the current User Guide - https://www.commonwl.org/user_guide/workflows/index.html
% - Conditionals https://github.com/common-workflow-language/user_guide/issues/191 & https://github.com/common-workflow-language/user_guide/issues/188
% - Also in the **episode 24** of the current User Guide - https://www.commonwl.org/user_guide/24_conditional-workflow/index.html