We use a standardized directory structure, called workspace, where we run and store all our experiments for a given project.
Create a workspace
The system will ask you a few questions and then create the structure for you.
. ├── conf ├── data ├── logs ├── resources ├── results └── readme.md
conf directory contains Nextflow config files to run a pipeline; you must
define the parameters of each experiment in a config file rather than passing
them on the command line.
data directory contains data to be processed by a pipeline. This directory
usually contains the raw data (e.g. data from sequencing experiments). You
should take some time to organize it in a meaningful and consistent way.
resources directory contains data retrieved from external
sources/repositories, like annotation files (e.g. genome GFF) or geneset GMT
logs directory contains the log of each pipeline run.
results directory contains the result of experiments. You should take some
time to organize it in a meaningful and consistent way. It is strongly
recommended to results in a directory named like
readme.md file contains a description of the project and how the
results folder are organized.
The team MUST use the Google naming guidelines, specifically:
- Make file and directory names lowercase.
- Separate words with hyphens, not underscores.
- Use only standard ASCII alphanumeric characters in file and directory names.
IMPORTANT: Raw data or external resources are allowed to keep their naming standard if it makes them easier to identify.