Information Geometry Reconciles Discrete and Continuous Variation in Single-Cell and Spatial Transcriptomic Analysis
Abstract
Single-cell and spatial transcriptomics provide high-resolution cellular characterization, yet standard analytical approaches remain theoretically misaligned with the probabilistic nature of the data. After UMI normalization, current pipelines rely on Euclidean or log-transformed Euclidean distance for similarity measurement. Both are fundamentally ill-suited to model the multinomial count data. Euclidean distance in normalized space overemphasizes high-variance genes, while log-transformation inverts this bias but at the cost of distorting subtle, continuous expression modulations. Neither approach naturally captures the dual nature of gene expression: both discrete presence/absence transitions and continuous quantitative variation. To overcome these limitations, we introduce GAIA (Geometric Analysis from an Information Aspect), an information-geometric framework for cell representation learning and inter-cell similarity measurement. By anchoring analysis in the true probabilistic model, treating cells as multinomial distributions over genes and projecting cells to a statistical manifold, GAIA organically reconciles both the presence/absence effect and the more continuous expression modulations. Mathematically, GAIA exploits the equivalence between Fisher-Rao distance in multinomial space and geodesic distance on the unit hypersphere, a property that enables both theoretical guarantees and computational efficiency. Experiments in synthetic and real scRNA-seq and spatial transcriptomic datasets demonstrate that GAIA preserves robust and consistent cell-to-cell relationships, delineates biologically nuanced sub-types, mitigates batch effects arising from sequencing depth variation, and eliminates the dependence on knowledge-restricted gene selection for learning meaningful cell representations. Overall, GAIA offers a knowledge-lean, variance-stabilizing framework for analyzing single-cell and spatial transcriptomic data, enhancing discrimination between nuanced cell sub-type and -states.
Related articles
Related articles are currently not available for this article.