Interpretable and Accurate Prediction Models for Metagenomics Data
Abstract
Biomarker discovery using metagenomic data is becoming more prevalent for patient diagnosis, prognosis and risk evaluation. Selected groups of microbial features provide signatures that characterize host disease states such as cancer or cardio-metabolic diseases. Yet, the current predictive models stemming from machine learning still behave as black boxes. Moreover, they seldom generalize well when learned on small datasets. Here, we introduce an original approach that focuses on three models inspired by microbial ecosystem interactions: the addition, subtraction, and ratio of microbial taxon abundances. While being extremely simple, their performance is surprisingly good and compares to or is better than Random Forest, SVM or Elastic Net. Such models besides being interpretable, allow distilling biological information of the predictive core-variables. Collectively, this approach builds up both reliable and trustworthy diagnostic decisions while agreeing with societal and legal pressure that require explainable AI models in the medical domain.
Related articles
Related articles are currently not available for this article.