From microbial diversity to function; evaluating dimensionality reduction methods
Abstract
Artificial Intelligence (AI), and more specifically Machine Learning (ML), have become an increasingly prevalent tool in microbial oceanography. The high dimensionality of microbial diversity data from ‘omics observations is highly suitable for ML analysis, with many recent studies showcasing their utility for exploratory ecological feature finding and process prediction. Here, we apply three well-documented dimensionality reduction methods including Principal Coordinate Analysis (PCoA), Self Organizing Maps (SOM), and Weighted Gene Correlation Network Analysis (WGCNA), to near daily 16S rRNA gene amplicon sequencing data from the 2019-2020 MOSAiC International Arctic Drift Expedition. We compare the k-means clustering outputs from these methods to extract functionally distinct seasonal microbial ecotypes in the surface Arctic Ocean. Our results indicate the SOM method outperforms a more traditional PCoA ordination, identifying a greater number of metabolically distinct functional groups. We then investigate the importance of including biological context in dimensionality reduction by comparing functional outputs to a taxa clustering approach using a k-means adapted WGCNA correlation network. Regardless of data input, all 3 methods identified 3-4 recurrent ecotypes with distinct taxonomic and functional cut-offs driven by seasonality, water mass, and substrate turnover. Ultimately, these results reinforce such methodologies as a meaningful translator in the mining of historical amplicon datasets to address modern mechanistic questions and incorporate greater ecotype diversity into mechanistic biogeochemical models.
Importance
Connecting microbial community structure to ecosystem function is an important step in accurately modeling climate-relevant biogeochemical processes yet remains a major challenge in microbial oceanography. This manuscript demonstrates how emerging machine learning approaches can establish this connection by uncovering recurrent ecological patterns in Arctic Ocean microbial communities. Using near-daily 16S rRNA gene and supplementary metagenome data from the MOSAiC drift expedition, we identified distinct “ecotypes,” or groups of microbes that perform differentiable functional roles within the ecosystem. Importantly, our methods reveal new connections between microbial identity and function that traditional analyses may overlook. It is possible such techniques could be applied to historical amplicon datasets, allowing scientists to revisit and reinterpret existing data to better understand how polar ecosystems are responding to environmental change and to improve future predictive climate models.
Related articles
Related articles are currently not available for this article.