Common Workflow Language User Guide: Recommended Practices

Below are a set of recommended good practices to keep in mind when writing a Common Workflow Language description for a tool or workflow. These guidelines are presented for consideration on a scale of usefulness: more is better, not all are required.

☐ No type: string parameters for names of input or reference files/directories; use type: File or type: Directory as appropriate.

☐ Include a license that allows for re-use by anyone, e.g. Apache 2.0. Example of license inclusion.

☐ Include attribution information for the author(s) of the CWL tool or workflow description. Use unambiguous identifiers like ORCID.

☐ In tool descriptions, list dependencies using short name(s) under SoftwareRequirement.

☐ Include SciCrunch identifiers for dependencies in format.

☐ All input and output identifiers should reflect their conceptual identity. Use informative names like unaligned_sequences, reference_genome, phylogeny, or aligned_sequences instead of foo_input, foo_file, result, input, output, and so forth.

☐ In tool descriptions, include a list of version(s) of the tool that are known to work with this description under SoftwareRequirement.

format should be specified for all input and output Files. Bioinformatics tools should use format identifiers from EDAM. See also iana:text/plain, iana:text/tab-separated-values with $namespaces: { iana: "" }. Full IANA media type list (also known as MIME types). For non-bioinformatics tools use or build an appropriate ontology/controlled vocabulary in the same way. Please edit this page to let us know about it.

☐ Mark all input and output Files that are read from or written to in a streaming compatible way (only once, no random-access), as streamable: true.

☐ Each CommandLineTool description should focus on a single operation only, even if the (sub)command is capable of more. Don’t overcomplicate your tool descriptions with options that you don’t need/use.

☐ Custom types should be defined with one external YAML per type definition for re-use.

☐ Include a top level short label summarising the tool/workflow.

☐ If useful, include a top level doc as well. This should provide a longer, more detailed description than was provided in the top level label (see above).

☐ Use type: enum instead of type: string for elements with a fixed list of valid values.

☐ Evaluate all use of JavaScript for possible elimination or replacement. One common example: manipulating File names and paths? Consider whether one of the built in File properties like basename, nameroot, nameext, etc, could be used instead.

☐ Give the tool description to a colleague (preferably at a different institution) to test and provide feedback.