Normalization of single-cell RNA-seq counts by log(x+1)* or log(1+x)*
Abstract
Single-cell RNA-seq technologies have been successfully employed over the past decade to generate many high resolution cell atlases. These have proved invaluable in recent efforts aimed at understanding the cell type specificity of host genes involved in SARS-CoV-2 infections. While single-cell atlases are based on well-sampled highly-expressed genes, many of the genes of interest for understanding SARS-CoV-2 can be expressed at very low levels. Common assumptions underlying standard single-cell analyses don’t hold when examining low-expressed genes, with the result that standard workflows can produce misleading results.
Key Points
-
Lowly expressed genes in single-cell RNA-seq can be easliy misanalyzed.
-
log(1+x) count normalization introduces errors for lowly expressed genes
-
The average log(1+x) expression differs considerably from log(x) when x is small
-
An alternative approach is to use the fraction of cells with non-zero expression
Related articles
Related articles are currently not available for this article.