OptimOTU pipeline, ITS2
This example data analyses follows OptimOTU workflow as implemented in PipeCraft2’s pre-compiled pipelines panel.
Note
OptimOTU pipeline is not much customizable; most important parametes to check/specify are the primers (CUT PRIMERS AND TRIM READS panel)
and model file (AMPLICON MODEL SETTINGS panel)
Starting point
This example dataset consists of ITS2 rRNA gene amplicon sequences; targeting fungi:
paired-end Illumina MiSeq data;
demultiplexed set (per-sample fastq files);
primers are not removed;
sequences in this set are 5’-3’ (fwd) oriented.
my_dir/
└── sequences/ # SELECT THIS FOLDER AS WORKING DIRECTORY (name here can be anything)
└── 01_raw/
├── Run1/ # name here can be anything (without spaces)
│ ├── sample1_R1.fastq.gz
│ ├── sample1_R2.fastq.gz
│ ├── sample2_R1.fastq.gz
│ └── sample2_R2.fastq.gz
├── Run2/ # name here can be anything (without spaces)
│ ├── sample3_R1.fastq.gz
│ ├── sample3_R2.fastq.gz
│ ├── sample4_R1.fastq.gz
│ └── sample4_R2.fastq.gz
└── Run3/ # name here can be anything (without spaces)
├── sample5_R1.fastq.gz
└── sample5_R2.fastq.gz
SELECT PIPELINE –> OptimOTU.SELECT WORKDIRsequence files extension as *.fastq.gz;sequencing read types as paired-end.Target taxa and sequence orientation
Here we are specifying that target taxa is fungi, and sequence orientation is fwd.
Control sequences
Control sequences are sequences that are not target taxa, but are used to estimate the error rate of the sequencing.
Cut primers and trim reads
The example dataset contains primer sequences. Generally, we need to remove these to proceed the analyses only with the variable metabarcode of interest. If there are some additional sequence fragments, from eg. sequencing adapters or poly-G tails, then clipping the primers will remove those fragments as well.
For the example data, the forward primer is ITS3 GCATCGATGAAGAACGCAGC and reverse primer is ITS4 TCCTCCGCTTATTGATATGC.
Quality filtering
Quality filtering here removes sequences which does not meet the threshold for the allowed maximum number of expected errors. See here for more inforamtion about sequence quality and here for the additional information about expected errors.
Denoising and merging paired-end reads
The denoising step are performed using the DADA2 package (Callahan et al. 2016) with default parameters optimized for Illumina amplicon data. Error profiles are learned separately for each sequencing run. Denoising is performed using the dada() function, and read pairs are merged using the mergePairs() function. There are no adjustable settings here.
Chimera filtering
The chimera filtering step is performed using the DADA2 package (Callahan et al. 2016) with default parameters (consensus method). There are no adjustable settings here.
Filter tag-jumps
Tag-jump events are evaluated the UNCROSS2 algorithm (Edgar 2018) are removed. The expected tag-jump rate (f-value) and the severity of the removal (p-value) can be specified. Here for dual-indexes and combinational indexing strategy (e.g. indexFwd_1-indexRev_1 and indexFwd_1-indexRev_2), we are using the default values: - f-value of 0.03 - p-value of 1
For single-indexes, use f-value of >=0.05.
Amplicon model setting
This example dataset has been generated usign primers ITS3-ITS4 for fungi, thus we are using the model_file = ITS3_ITS4.cm.
And the model_type = CM.
Protax classification
For fungi, a built-in database for protax classification can be used.
location = protaxFungi [Or specify a directory where protax is located.]
with_outgroup = UNITE_SHs this is an additionalonal database which contains also outgroup (non-target) sequences from the same locus. For fungi, default is UNITE_SHs, which is sh_matching_data_0_5_v9 sequences (included in the PipeCraft2 container)
Clustering
cluster thresholds = Fungi_GSSP this is the default pre-calculated thresholds for fungi
Save workflow
Once we have decided about the settings in our workflow, we can save the configuration file by pressing save workflow button on the right-ribbon

If you forget the save, then no worries, a pipecraft2_last_run_configuration.json file will be generated for you upon starting the workflow.
As the file name says, it is the workflow configuration file for your last PipeCraft run in this working directory.
This JSON file can be loaded into PipeCraft2 to automatically configure your next runs exactly the same way.
Start the workflow
Press START on the left ribbon to start the analyses.
when running the module for the first time …
… a docker image will be first pulled to start the process.
Examine the outputs
Several process-specific output folders are generated ![]()
Output file |
Description |
|---|---|
asv_table |
ASV table as a sparse matrix (long format) with five columns:
sample, seqrun, seq_id, seq_idx, and nread
|
asv2tax_<conf> |
Taxonomic assignments for each ASV at the 50% (plausible)
and 90% (reliable) probability thresholds <conf>
|
otu_taxonomy_<conf> |
Taxonomy for each OTU at the 50% (plausible) and 90% (reliable)
probability thresholds <conf>
|
otu_table_sparse_* |
OTU table as a sparse matrix (long format) with five columns:
sample, seqrun, seq_id, seq_idx, and nread
|
otu_table_<conf> |
OTU table as a dense matrix (wide format) with columns as samples
and rows as OTUs
|
otu_<conf>.fasta |
Representative OTU sequences for the 50% (plausible) and 90% (reliable)
probability thresholds <conf>
|
read_counts_<conf>.tsv || The number of reads in each sample present after each stage|| of the pipeline
|
|
|
R log file about the OptimOTU pipeline |


