Accurate identification of medulloblastoma subtypes from diverse data sources with severe batch effects by RaMBat

This article has 0 evaluations Published on
Read the full article Related papers
This article on Sciety

Abstract

As the most common malignant pediatric brain cancer, medulloblastoma (MB) accounts for around 20% of all pediatric central nervous system (CNS) neoplasms. MB includes a complex array of distinct molecular subtypes, mainly including SHH, WNT, Group 3 and Group 4. Accurate identification of MB subtypes enables improved downstream risk stratification and tailored therapeutic treatment design. Existing methods demonstrated the feasibility of leveraging transcriptomics data for identifying MB subtypes. However, their performance may be poor due to limited cohorts and severe batch effects when integrating various MB data sources. To overcome these limitations, we propose a novel and accurate approach called RaMBat for accurate MB subtype identification from diverse transcriptomics data sources with severe batch effects. Specifically, RaMBat leverages intra-sample gene expression ranking information instead of absolute gene expression levels, which can efficiently tackle batch effects across diverse transcriptomics data cohorts. By intra-sample gene rank analysis, reversal ratio analysis, and feature selection, RaMBat can select MB subtype-specific gene features and finally accurately identify MB subtypes. Benchmarking tests based on 13 datasets with severe batch effects suggested that RaMBat achieved a median accuracy of 99%, significantly outperforming other state-of-the-art MB subtyping approaches and conventional machine learning classifiers. In addition, in terms of visualization, RaMBat could efficiently remove the batch effects and clearly separate samples from diverse data sources according to MB subtypes, whereas conventional visualization methods like tSNE suffered from severe batch effects. We believe that RaMBat is a promising MB subtyping tool that would have direct positive impacts on downstream MB risk stratification and tailored treatment design. To facilitate the use of RaMBat, we have developed an R package which is freely available at<ext-link xmlns:xlink="http://www.w3.org/1999/xlink" ext-link-type="uri" xlink:href="https://github.com/wan-mlab/RaMBat">https://github.com/wan-mlab/RaMBat</ext-link>.

Related articles

Related articles are currently not available for this article.