marinemicrobes

Marine Microbes Data Tools

View the Project on GitHub martinostrowski/marinemicrobes

Sequence variants, denoising and the large scale Microbiome Initiatives: How important is a single base pair for addressing fundamental questions in ecology?

Large scale sequencing projects seek to describe the biodiversity of (micro)-organisms across a broad range of environmental and host associated systems. Often involve Dna extracted from a wide range of different samples, using different procedures and different operators, and sequencing data combined across multiple sequencing runs, potentially produced by different sequencing centres, using different primers for marker gene surveys and different post-sequencing analyses pipelines - all of these factors can impact upon the reproducibility and of the results and the interpretation of the data. This study we examine the impact of different filtering and denoising methods on the reported Sequence Variant Tables by comparing a version of the Unoise3 workflow*, with an ‘unfiltered’ dataset that has not been denoised or screened for chimeras, and the output of a DADA2 workflow. N.B This link seems to be directed to old version of the BASE workflow implemented by the AMI data team because there is no mention of primer removal or Unoise.

This repository contains code for a DADA2 reanalysis of the AMMBI and Marine Microbes amplicon datasets. The code is largely based on DADA2 tutorials and the work of Dr Anna Bramucci to implement the DADA2 workflow on the UTS HPCC.

The key advantages of Dada2 over other approaches (e.g. Accuracy, Interoperability, Scaling and Open Source) are highlighted on the DADA2 Github page, along with many useful and up-to-date ressources for the statistical analyses of sequencing data post processing.

By comparing these outputs of these three analyses workflows we should be able to determine:

  1. 18S Coastal Primary Analysis pipeline
  2. 16S Primary Analysis pipeline
  3. 18S Pelagic Primary Analysis pipeline
  4. 18S Pelagic Primary Analysis pipeline