Below are a set of recommended good practices to keep in mind when writing a Common Workflow Language description for a tool or workflow. These guidelines are presented for consideration on a scale of usefulness: more is better, not all are required.
☐ No type: string
parameters for names of input or reference
files/directories; use type: File
or type: Directory
as appropriate.
☐ Include a license that allows for re-use by anyone, e.g.
Apache 2.0. If possible, the license should be specified with
its corresponding SPDX identifier. Construct the metadata field for the
licence by providing a URL of the form https://spdx.org/licenses/[SPDX-ID]
where SPDX-ID
is the taken from the list of identifiers linked above. See the
example snippet below for guidance. For non-standard licenses without an SPDX
identifier, provide a URL to the license.
Example of metadata field for license with SPDX identifier:
$namespaces:
s: https://schema.org/
s:license: https://spdx.org/licenses/Apache-2.0
# other s: declarations
For more examples of providing metadata within CWL descriptions, see the Metadata and Authorship section of this User Guide.
☐ Include attribution information for the author(s) of the CWL tool or workflow description. Use unambiguous identifiers like ORCID.
☐ In tool descriptions, list dependencies using short name(s) under
SoftwareRequirement
.
☐ Include SciCrunch identifiers for dependencies in
https://identifiers.org/rrid/RRID:SCR_NNNNNN
format.
☐ All input
and output
identifiers should reflect their conceptual
identity. Use informative names like unaligned_sequences
, reference_genome
,
phylogeny
, or aligned_sequences
instead of foo_input
, foo_file
,
result
, input
, output
, and so forth.
☐ In tool descriptions, include a list of version(s) of the tool that are
known to work with this description under SoftwareRequirement
.
☐ format
should be specified for all input and output File
s.
Bioinformatics tools should use format identifiers from EDAM.
See also iana:text/plain
, iana:text/tab-separated-values
with
$namespaces: { iana: "https://www.iana.org/assignments/media-types/" }
.
Full IANA media type list (also known as MIME types). For
non-bioinformatics tools use or build an appropriate ontology/controlled
vocabulary in the same way. Please edit this page to let us know about it.
☐ Mark all input and output File
s that are read from or written to in a
streaming compatible way (only once, no random-access), as streamable: true
.
☐ Each CommandLineTool
description should focus on a single operation
only, even if the (sub)command is capable of more. Don’t overcomplicate your
tool descriptions with options that you don’t need/use.
☐ Custom types should be defined with one external YAML per type definition for re-use.
☐ Include a top level short label
summarising the tool/workflow.
☐ If useful, include a top level doc
as well. This should provide a
longer, more detailed description than was provided in the top level label
(see above).
☐ Use type: enum
instead of type: string
for elements with a fixed
list of valid values.
☐ Evaluate all use of JavaScript for possible elimination or replacement.
One common example: manipulating File
names and paths? Consider whether one
of the built in File
properties like basename
, nameroot
,
nameext
, etc, could be used instead.
☐ Give the tool description to a colleague (preferably at a different institution) to test and provide feedback.
☐ Complex workflows with individual components which can be abstracted
should utilise the SubworkflowFeatureRequirement
to make their
workflow modular and allow sections of them to be easily reused.
☐ Software containers should be made to be conformant to the “Recommendations for the packaging and containerizing of bioinformatics software” (also useful to other disciplines).