Fast Optimization of Robust Transcriptomics Embeddings using Probabilistic Inference Autoencoder Networks for multi-Omics
Abstract
Advances in single-cell genomics technologies enable the routine acquisition of atlases with millions of cells. These datasets often include multiple covariates, such as donors, sequencing platforms, developmental timepoints, and species, which provide new opportunities for discovery and new challenges. To mitigate unwanted sources of variation, dataset integration is the starting point for most analyses. However, existing methods struggle with integrating large complex datasets. To address these limitations, we developed PIANO, a variational autoencoder framework that uses a negative binomial generalized linear model for stronger batch correction, and code compilation for ten times faster training than existing tools. We first demonstrate performant integration compared to commonly used methods on single-species datasets. We then show PIANO enables superior analyses of multiple atlases, solving challenging integration tasks across sequencing platforms, development, and species, while simultaneously preserving desired biological signals. Our contributions include a novel, high-performance integration method and recommendations for integration applications.
Related articles
Related articles are currently not available for this article.