Presented by: Meghan Short
View Abstract
Tests of feature differential abundance are a cornerstone of microbiome analysis, and a wide variety of such tests is available. Reliable methods for power and sample size estimation for such tests, however, are lacking. Traditional parametric or rank-based sample size formulae do not account for the unique challenges posed by microbial feature data, including an abundance of biological and technical zero values, compositionality, and the potential for associations of clinical or environmental variables with feature abundance and/or prevalence. To benchmark existing power formulae, we use a rich simulation framework previously implemented in SparseDOSSA2 to fit zero-inflated log-linear models to microbial read counts and generate realistic synthetic feature tables. By simulating many feature tables with the same underlying distributions, we estimate power for various scenarios. We identified strong relationships between power and feature prevalence, which is unaccounted for in standard formulae for parametric tests. Use of an “effective” sample size accounting for feature prevalence improved power calculation accuracy (i.e., similarity to simulated power) in these cases. We plan to streamline best approaches in a new software, Sample Sizes for Microbiome Research (SSMoRe), which we will validate using resampling techniques in previously published data from 16S and metagenomic studies.
Meghan Short – Poster Description (Audio Clip)