Optimizing bioinformatic workflows to extract clinically usable gene expression data from targeted RNA sequencing panels: comparison with total RNAseq
Abstract
Targeted RNA sequencing (RNAseq) is widely used to detect gene fusions in tumors but clinical diagnostic use of expression data from these panels in fusion-negative cases has been limited. To facilitate this application, we evaluated methods for sequence read counting and gene normalization to optimize them for smaller gene sets. We present comparative methods to derive differential gene expression (DGE) data using ~ 200-gene clinically validated RNAseq fusion panels and compared them to parallel full RNAseq. We compared five methods for read counting, demonstrating that featureCounts is the most rapid and robust. For normalization prior to DGE with DESeq2, we compared five different normalization strategies and showed normalization using the 5 most stably expressed genes provided optimal centralization for these smaller gene sets. DGE output was assessed by principal component analysis (PCA), t-SNE and heatmap-clustering. The final pipeline was validated using PCA and pathway analysis by comparison with full RNAseq separately performed on a common set of challenging tumors with comparable results observed with the targeted gene panel. Overall, we show using an optimized bioinformatic pipeline that usable gene expression data can be obtained from smaller targeted RNAseq panels to maximize the clinical utility of these assays.
Related articles
Related articles are currently not available for this article.