Harvard T.H. Chan Microbiome Analysis Core



The mission of the Microbiome Analysis Core is to provide formalized support, at the highest quality standards, for human microbiome-related studies and to foster collaborative initiatives in microbiome research.


The provision of Microbiome Analysis Core services is concentrated around the following aims:


The Microbiome Analysis Core is sustained by a fee-for-service model and grant-incorporated effort levels. The first five hours of consultation are free of charge, and subsequent services are invoiced at $150/hour of person-time for academic institutions (not only limited to Harvard) and $250/hour for industry institutions. This rate supports advanced consultation, analysis, administrative tasks, FASRC compute cluster cycles, and data storage.

Contact us for consultations, service requests, letter of support, or to discuss a collaboration.

Summary of common workflows of data analysis

Common WorkflowsAmplicon (16S rRNA / ITS) or shotgun metagenomic, metatranscriptomic (SG) sequencing data is passed through a quality control pipeline using the bioBakery workflows. 16S / ITS: The amplicon sequence data pipelines consist of two approaches, USEARCH / VSEARCH and DADA2 to identify operational taxonomic units (OTUs) and amplicon sequence variates (ASVs), respectively. These taxonomic profiles are then passed to PICRUSt, which infers gene content and abundance of taxa, to predict the metagenome composition of the 16S-resolved community. PICRUSt predicted metagenomes are amenable to similar downstream analysis as metagenomes identified from shotgun sequencing data, but with taxonomic resolution limited by 16S. In tiered-design studies, MicroPita takes as input results from 16S surveys to inform sample subset selection for SG follow-up work, governed by user-specified features of interest (clinical/environmental metadata, diversity measures, etc.). SG: Microbiome composition (bacteria, archaea, viruses and eukaryotic microbes) is gleaned from SG sequencing data using MetaPhlAn2, which resolves taxonomic diversity and abundance at the subspecies level.

Metagenomes, both PICRUSt-predicted and SG-sequenced, can further be passed through the HUMAnN2 pipeline. HUMAnN2 determines conservation and abundance of gene modules (sets of genes related by sequence and function) and biochemical pathways to reveal the metabolic potential of the microbial community.

Data features derived with these algorithms, including gene/pathway presence and abundance, gene expression, microbiome composition, OTUs, ASVs, or peptide identifications from metaproteomics and compound tables from meta-metabolomics, can be integrated with clinical and environmental metadata using LEfSe and MaAsLin2. LEfSe identifies those data features that are distinct between a pair of metadatums (e.g. differences between two sampling sites, two clinical outcomes, two biochemical markers, two modalities, etc.). MaAsLin2 extends the functionality of LEfSe to identify associations between data features and multiple metadata factors, which can be discrete and/or continuous and can include time series data.

Data Handling

For computing infrastructure, the Microbiome Analysis Core uses the FAS Research Computing cluster. Your meta’omic data is housed on dedicated and regularly backed-up network storage drives. With written consent of the Investigator, data will be removed from our storage after six months of inactivity following completion of collaboration.


Co-authorship is not required, but we ask Investigators to adhere to the standard practice of acknowledging Core services in peer-reviewed publications or grant proposals. The Microbiome Analysis Core follows accepted scientific criteria for authorship for statisticians in medical papers (see article), and authorship is discussed with the Investigator at the start of collaboration.  Examples of Core contribution that warrant authorship:

      • Major role in study conception and design
      • Development of custom analysis methods, tailored specifically for the project
      • Biological interpretation of analyzed results
      • Contribution of intellectual content to manuscript (not only description of methods used)
Intellectual Property

When authorship is not shared, all data prepared and compiled by the Microbiome Analysis Core shall be the property of the Investigator and deemed as works made for hire, upon settlement of invoices for services rendered. Source code for custom built analysis software is not the property of the Investigator, and is reused, distributed, and modified under a Harvard University approved open source license. For provision of services, the Microbiome Analysis Core requires that payment arrangements are made prior to start of collaboration, and that payment is remitted prior to release of results.