Presented by: Ivan Duran
The gut microbiome is intrinsically dynamic and studies that collect longitudinal microbiome data to assess the dynamics of the gut microbiota during disease development or progression, or after a therapeutic intervention are increasing in frequency. However, efficient computational tools to harness multi-omics longitudinal microbiome data to predict clinical outcomes are underdeveloped. In this project, we aim to develop new machine learning (ML) tools to predict clinical outcomes by making use of time-series microbiome multi-omics data. As a case study, we used longitudinal metagenomic and metabolomic data from a prospective, longitudinal birth cohort study of children at high risk of Celiac Disease (CD) and sought to predict CD development in these subjects using pre-onset data. To this end, we trained Random Forest classifiers combined with an efficient feature selection scheme using several pieces of clinical
metadata along with species, strains, pathways, and metabolites abundance data before disease onset as features (predictors). Our analyses revealed that clinical metadata alone are not accurate predictors of disease development (F1-score = 68.67%, 10-fold C.V.). However, we were able to achieve a high prediction performance of 93% (F1-score, 10-fold C.V.) using the abundance of only one pathway at 9 months of age and 100% (F1-score, 10-fold C.V.) using the abundance of only seven microbial strains at 15 months of age. This pilot study demonstrates the utility of ML for inferring key temporal microbiome signatures that are highly predictive of host clinical status. It also lays the foundation for building early predictive tools that would enable physicians to plan for preventive strategies before the clinical manifestation of disease.
If you have any questions regarding the poster, feel free to reach out here.