Towards a Cytometry Foundation Model: Interpretable Sample-level Predictive Modelling via Pretrained Transformers
Abstract
Foundation models have transformed scientific data modelling across domains, yet flow cytometry has lacked one. Despite the abundance of high-dimensional cellular data, automated analysis remains bottlenecked by marker variability: prior studies are typically confined to fixed marker panels and homogeneous data, limiting scalability and generalisation due to architectural constraints. We present the Generalised Pretrained Cytometry Transformer (GPCT), an interpretable framework designed to learn from heterogeneous marker panels for sample-level predictive modelling. Through a novel cytometry-specific pretraining regime, GPCT learns transferable cellular representations that achieve high classification accuracy across diverse datasets. Notably, pretraining significantly boosts performance on data-scarce downstream tasks, marking a pivotal step towards a cytometry foundation model. Furthermore, GPCT maintains interpretability and identifies the specific cell subsets most influential to its predictions. This enables direct biological validation of learned patterns and provides a data-driven basis for refining traditional gating strategies.
Related articles
Related articles are currently not available for this article.