Integrating reference- and assembly-based methods for improved viral identification from metagenomes, metatranscriptomes, and viromes

Presented by: Jordan Jensen

View Abstract

Capturing an accurate representation of the viral members of a microbial community presents significant experimental and computational challenges. Sample preparation approaches for virus-like particle (VLP) enrichment vary greatly in their efficiency among protocols and environments, and sequences from any technology (metagenomic, metatranscriptomic, or VLP enrichments) can be difficult to identify computationally. Limitations include small viral genome size, and subsequently a small proportion of genetic content in samples; lack of universal marker genes; multiple nucleic acid backbone types; rapid evolution, recombination, and sequence divergence; and most prominently, a lack of well-characterized viral reference databases.
To address these limitations, we developed BAQLaVa (Bioinformatic Application for Quantification and Labeling of Viral taxonomy), which integrates tired reference-based profiling to provide viral profiles from shotgun DNA or RNA sequencing (with or without enrichment). Reads are compared with both nucleotide and protein (translated) databases that are pre-screened for viral identification using a modification of the MetaPhlAn algorithm and reconciled with the most recent International Committee on Taxonomy of Viruses (ICTV) taxonomic rankings. We hope these methods will unlock as-yet-unaccessed information on viral community members from thousands of existing metagenomes and metatranscriptomes, as well as enabling more accurate characterization of future VLPs from a variety of microbial environments.

If you have any questions regarding the poster, feel free to reach out here.