Post-processing tools
Note
All post-processing tools accessible under ADD STEP -> POSTPROCESSING
LULU
LULU description from the LULU repository: the purpose of LULU is to reduce the number of erroneous OTUs in OTU tables to achieve more realistic biodiversity metrics. By evaluating the co-occurence patterns of OTUs among samples LULU identifies OTUs that consistently satisfy some user selected criteria for being errors of more abundant OTUs and merges these. It has been shown that curation with LULU consistently result in more realistic diversity metrics.
DEICODE
DEICODE (Martino et al., 2019) is used to perform beta diversity analysis by applying robust Aitchison PCA on the OTU/ASV table. To consider the compositional nature of data, it preprocesses data with rCLR transformation (centered log-ratio on only non-zero values, without adding pseudo count). As a second step, it performs dimensionality reduction of the data using robust PCA (also applied only to the non-zero values of the data), where sparse data are handled through matrix completion.
- Additional information:
Note
To START, specify working directory under SELECT WORKDIR
, but the file formats do not matter here (just click ‘Next’).
DEICODE_out
directory:Setting |
Tooltip |
---|---|
|
select OTU/ASV table. If no file is selected, then PipeCraft will
look OTU_table.txt or ASV_table.txt in the working directory.
See OTU table example below
|
|
select list of OTU/ASV IDs for analysing a subset from the full table
see subset_IDs file example below
|
|
cutoff for reads per OTU/ASV. OTUs/ASVs with lower reads then specified
cutoff will be excluded from the analysis
|
|
cutoff for reads per sample. Samples with lower reads then
specified cutoff will be excluded from the analysis
|
Example of input table
(tab delimited text file):
OTU_id |
sample1 |
sample2 |
sample3 |
sample4 |
---|---|---|---|---|
00fc1569196587dde |
106 |
271 |
584 |
20 |
02d84ed0175c2c79e |
81 |
44 |
88 |
14 |
0407ee3bd15ca7206 |
3 |
4 |
3 |
0 |
042e5f0b5e38dff09 |
20 |
83 |
131 |
4 |
07411b848fcda497f |
1 |
0 |
2 |
0 |
07e7806a732c67ef0 |
18 |
22 |
83 |
7 |
0836d270877aed22c |
1 |
1 |
0 |
0 |
0aa6e7da5819c1197 |
1 |
4 |
5 |
0 |
0c1c219a4756bb729 |
18 |
17 |
40 |
7 |
Example of input subset_IDs
:
07411b848fcda497f
042e5f0b5e38dff09
0836d270877aed22c
0c1c219a4756bb729
...
PERMANOVA and PERMDISP example using the robust Aitchison distance
library(vegan)
## Load distance matrix
dd <- read.table(file = "distance-matrix.tsv")
## You will also need to load the sample metadata
## However, for this example we will create a dummy data
meta <- data.frame(
SampleID = rownames(dd),
TestData = rep(c("A", "B", "C"), each = ceiling(nrow(dd)/3))[1:nrow(dd)])
## NB! Ensure that samples in distance matrix and metadata are in the same order
meta <- meta[ match(x = meta$SampleID, table = rownames(dd)), ]
## Convert distance matrix into 'dist' class
dd <- as.dist(dd)
## Run PERMANOVA
adon <- adonis2(formula = dd ~ TestData, data = meta, permutations = 1000)
adon
## Run PERMDISP
permdisp <- betadisper(dd, meta$TestData)
plot(permdisp)
Example of plotting the ordination scores
library(ggplot2)
## Load ordination scores
ord <- readLines("ordination.txt")
## Skip PCA summary
ord <- ord[ 8:length(ord) ]
## Break the data into sample and species scores
breaks <- which(! nzchar(ord))
ord <- ord[1:(breaks[2]-1)] # Skip biplot scores
ord_sp <- ord[1:(breaks[1]-1)] # species scores
ord_sm <- ord[(breaks[1]+2):length(ord)] # sample scores
## Convert scores to data.frames
ord_sp <- as.data.frame( do.call(rbind, strsplit(x = ord_sp, split = "\t")) )
colnames(ord_sp) <- c("OTU_ID", paste0("PC", 1:(ncol(ord_sp)-1)))
ord_sm <- as.data.frame( do.call(rbind, strsplit(x = ord_sm, split = "\t")) )
colnames(ord_sm) <- c("Sample_ID", paste0("PC", 1:(ncol(ord_sm)-1)))
## Convert PCA to numbers
ord_sp[colnames(ord_sp)[-1]] <- sapply(ord_sp[colnames(ord_sp)[-1]], as.numeric)
ord_sm[colnames(ord_sm)[-1]] <- sapply(ord_sm[colnames(ord_sm)[-1]], as.numeric)
## At this step, sample and OTU metadata could be added to the data.frame
## Example plot
ggplot(data = ord_sm, aes(x = PC1, y = PC2)) + geom_point()