Fast Optimization of Robust Transcriptomics Embeddings using Probabilistic Inference Autoencoder Networks for multi-Omics

Ning Wang
David Turner
Hannah Feinberg
Victor Eduardo Nieto Caballero
Dan Yuan
Nathaniel Scott
Christopher Cardenas
Michael DeBerardine
Shu Dan
Lakme Caceres
Jessica Schembri
Zizhen Yao
Changkyu Lee
Jonathan W. Pillow
Fenna M. Krienen

0 evaluations Published on Nov 27, 2025

This article on Sciety

Abstract

Advances in single-cell genomics technologies enable the routine acquisition of atlases with millions of cells. These datasets often include multiple covariates, such as donors, sequencing platforms, developmental timepoints, and species, which provide new opportunities for discovery and new challenges. To mitigate unwanted sources of variation, dataset integration is the starting point for most analyses. However, existing methods struggle with integrating large complex datasets. To address these limitations, we developed PIANO, a variational autoencoder framework that uses a negative binomial generalized linear model for stronger batch correction, and code compilation for ten times faster training than existing tools. We first demonstrate performant integration compared to commonly used methods on single-species datasets. We then show PIANO enables superior analyses of multiple atlases, solving challenging integration tasks across sequencing platforms, development, and species, while simultaneously preserving desired biological signals. Our contributions include a novel, high-performance integration method and recommendations for integration applications.

Related articles are currently not available for this article.