We use a standardized workflow structure based on Nextflow, which simplify developing methods and run experiments across different environments.
Create a new workflow structure
The system will ask you a few questions and then create the structure for you. Successively, create a repo in GitHub with the name of the directory just created.
├── .github │ └── workflows │ └── ci.yml ├── bin │ ├── fit.py │ └── plots.py ├── conf │ └── base.config ├── containers │ ├── Dockerfile │ └── environment.yml ├── testdata │ └── mydata.txt ├── .bumpversion.cfg ├── .devcontainer.json ├── .gitignore ├── main.nf ├── nextflow.config └── readme.md
The workflow file
main.nf file contains the entrypoint for the workflow, and it uses
Nextflow DSL2 by default. The workflow parameters are stored in the
nextflow.config file, which in turn include other files in the
directory; usually, you only have to define the parameters of your specific
pipeline, since the
conf/base.conf file includes profiles to run your workflow
in different computing environment, e.g. Slurm, GitHub.
Please, refer to the Nextflow documentation for an overview of the framework.
Custom scripts management
Custom code (aka your scripts and classes) needed by the pipeline should be
added to the
bin directory; the code in this directory is automatically added
$PATH when running the pipeline, which makes custom scripts easily portable
and accessible. If you are using Python, you should have a file for each class
of operations, e.g. a file
plots.py for all the plots, and use
docopt to have standard Unix command line
interface. See the auto-generated pipeline for an example.
Third-party software is managed by
micromamba and specified in a
environment.yml file; keep the
yml file updated and specify the version of
each software you are using in order to ensure reproducibility.
To ensure reproducibility and running experiments on local machines and HPC
clusters, it is strongly recommended to build a Docker image. The bundled
Dockerfile can be used to build an image with the software specified in your
environment.yml file. To do that, run:
docker build . -t ghcr.io/stracquadaniolab/<my-workflow>:<version> -f containers/Dockerfile
<my-workflow> is the name of your workflow and
is the current version of your workflow, starting from
The template comes with an auto-generated
.devcontainer.json file, which
allows developing your scripts inside a container with all the software
Sometimes you would want to pull a docker image from GitHub container registry:
In order to successfully pull an image, first you need to authenticate yourself with your personal access token, see here: Authenticating with the container registry
docker pull ghcr.io/stracquadaniolab/<workflow_name>:<version>
It is important to build workflows that can be automatically tested; thus, you
will have to add small test data into the
testdata directory, and modify the
test profile in
conf/base.config configuration file to specify any parameter
needed for your workflow to run. See the auto-generated pipeline for an example.
All projects must follow a semantic version scheme. The format adopted is
- MAJOR: drastic changes that make disruptive changes with a previous release.
- MINOR: add functions to the workflow but keeps everything compatible within the MAJOR version.
- PATCH: bug fixes or settings update.
To update the version of your workflow, you should run the following command from the command line:
bump2version major #for major release bump2version minor #for minor release bump2version patch #for patch release
Push your code to GitHub
As the project is version controlled using Git, you can push your code to GitHub as follows:
git add . git commit -am "new: added super cool feature" git push -u origin master
Importantly, after a
bumpversion, you also have to push the tag just created as follows:
git push --tags
Each pipeline comes with a pre-configured GitHub workflow to automatically test
the code and build a Docker image; the workflow is stored in
.github/workflows/ci.yml. Please note that a Docker image is only released
when you push a tag.
Each workflow must have an updated
readme.md file, describing:
- what the workflow does
- how to configure the workflow
- how to run the workflow
- a description of the output generated
readme.md file with the required sections is automatically generated by this