Presented by: Jordan Jensen
Capturing an accurate representation of the viral members of a microbial community presents significant experimental and computational challenges. Current metagenomic approaches include assembly-based reconstruction and detection of virus-like sequences (which is limited to highly abundant or enriched viruses) or detection by homology (which is limited by the rapid evolutionary rate and great diversity of viruses, and thus their representation in reference databases). Additionally, RNA viruses within microbial communities are both underrepresented and understudied. To address these limitations, we generated synthetic community sequencing data capturing mixtures of viral and bacterial DNA and RNA, and used these synthetic sample sets along with shotgun metagenomes, metatranscriptomes, and virus-like particle (VLP)-enriched viromes from the IBDMDB cohort of the Integrative Human Microbiome Project to systematically evaluate a series of assembly- and reference-based approaches to viral profiling. Using the synthetic and real samples, we evaluated nucleotide and translated mapping to gold-standard reference sets (RefSeq) and novel Metagenome-Assembled Genome viral databases (vMAG). Assembly of metagenomic reads accessed only a small fraction of likely viral sequences, although those observed by assembly were highly accurate. Conversely, reference-based methods were accurate for detection (although not generally quantification) of viruses represented in all three data types, particularly using translated protein mapping; however, RNA viral references in particular remain extremely sparse. Mapping to well-characterized reference sets such as RefSeq maintained high specificity across a wide range of bacterial contamination, but failed to capture highly novel viral content. vMAG reference sets varied in mapping rates of viral reads, but were able to expand mapping of simulated novel microbial sequences. This work is ongoing, including integration of machine learning approaches to fill gaps remaining after reference-based mapping. Ultimately, we expect this study to provide a set of single, unified best practices for virome profiling from diverse microbial community sequencing assays.