nanoDoc: RNA modification detection using Nanopore raw reads with Deep One-Class Classification
Abstract
Advances in Nanopore single-molecule direct RNA sequencing (DRS) have presented the possibility of detecting comprehensive post-transcriptional modifications (PTMs) as an alternative to experimental approaches combined with high-throughput sequencing. It has been shown that the DRS method can detect the change in the raw electric current signal of a PTM; however, the accuracy and reliability still require improvement. Here, I present a new software program, named as nanoDoc, for detecting PTMs from DRS data using a deep neural network. Current signal deviations caused by PTMs are analyzed via Deep One-Class Classification with a convolutional neural network. Using a ribosomal RNA dataset, the software archive displayed an area under the curve (AUC) accuracy of 0.96 for detecting 23 different types of modifications inEscherichia coliandSaccharomyces cerevisiae. Furthermore, I demonstrated a tentative classification of PTMs using unsupervised clustering. Finally, I applied this software to severe acute respiratory syndrome coronavirus 2 data and identified commonly modified sites among three groups. nanoDoc is an open source software (GPLv3) available at<ext-link xmlns:xlink="http://www.w3.org/1999/xlink" ext-link-type="uri" xlink:href="https://github.com/uedaLabR/nanoDoc">https://github.com/uedaLabR/nanoDoc</ext-link>
Author Summary
RNA post-transcriptional modifications (PTMs) is regulate multiple aspects of RNA function, including alternative splicing, export, stability, and translation, and the method to identify multiple types of PTMs is required for further advancement of this fields called ‘epitranscriptomics’. Nanopore singlemolecule direct RNA sequencing (DRS) can detect such PTMs, however the accuracy of the method needs to be improved. Detecting PTMs can be solved as a One-Class Classification problem, which is widely used in machine learning fields. Thus, a novel software named ‘nanoDoc’ for detecting PTMs was developed. The nanoDoc use convolutional neural network to extract the feature signal from nanopore sequencer and Deep One-Class Classification to detect PTMs as an anomaly. The software archive displayed an area under the curve (AUC) accuracy of 0.96 for detecting 23 different types of modifications inEscherichia coliandSaccharomyces cerevisiae.This software is applicable to different samples, and tested on severe acute respiratory syndrome coronavirus 2, and human transcript data as well.
Related articles
Related articles are currently not available for this article.