Presented by: Jacob T. Nearing
View Abstract
Amplicon sequencing, a common strategy to taxonomically profile microbial communities, is relatively low cost and high throughput, but it can present various biases, including primer incompatibility within specific taxonomic groups and the inability to differentiate between certain microbes due to low sequence variability. The identifiable taxa are often specific to given variable regions, resulting in differential, and often challenging, downstream taxonomic assignments. To help address these issues, we developed Parathaa (Preserving and Assimilating Region-specific Ambiguities in Taxonomic Hierarchical Assignments for Amplicons), which directly models the sequence ambiguities (similarities) associated with specific amplicon regions and allows for assignment to multiple ambiguous taxonomic labels. Parathaa accomplishes this by leveraging full-length amplicon sequence databases to build primer-specific phylogenies. Then, using those phylogenies, it identifies optimal taxonomic distance thresholds and assigns taxonomy to new representative sequences by placing them into the tree using pplacer. Thus, Parathaa captures biological ambiguities specific to the sequenced variable region of interest. Parathaa had greater performance than DADA2’s taxonomic classifier when applied to a synthetic dataset from across the bacterial kingdom, and it identified a higher proportion of species when analyzing a mock community dataset. Overall, Parathaa’s approach allows users to retain more information and understand potential sources of bias (i.e., sequence ambiguity) when classifying amplicon reads.
If you have any questions regarding the poster, feel free to reach out here.