Optimizing the molecular diagnosis of Covid-19 by combining RT-PCR and a pseudo-convolutional machine learning approach to characterize virus DNA sequences

Juliana Carneiro Gomes
Aras Ismael Masood
Leandro Honorato de S. Silva
Janderson Ferreira
Agostinho A. F. Júnior
Allana Lais dos Santos Rocha
Letícia Castro
Nathália R. C. da Silva
Bruno J. T. Fernandes
Wellington Pinheiro dos Santos

1 evaluations Published on Sep 28, 2020

This article on Sciety

Abstract

The proliferation of the SARS-Cov-2 virus to the whole world caused more than 250,000 deaths worldwide and over 4 million confirmed cases. The severity of Covid-19, the exponential rate at which the virus proliferates, and the rapid exhaustion of the public health resources are critical factors. The RT-PCR with virus DNA identification is still the benchmark Covid-19 diagnosis method. In this work we propose a new technique for representing DNA sequences: they are divided into smaller sequences with overlap in a pseudo-convolutional approach, and represented by co-occurrence matrices. This technique analyzes the DNA sequences obtained by the RT-PCR method, eliminating sequence alignment. Through the proposed method, it is possible to identify virus sequences from a large database: 347,363 virus DNA sequences from 24 virus families and SARS-Cov-2. Experiments with all 24 virus families and SARS-Cov-2 (multi-class scenario) resulted 0.822222 ± 0.05613 for sensitivity and 0.99974 ± 0.00001 for specificity using Random Forests with 100 trees and 30% overlap. When we compared SARS-Cov-2 with similar-symptoms virus families, we got 0.97059 ± 0.03387 for sensitivity, and 0.99187 ± 0.00046 for specificity with MLP classifier and 30% overlap. In the real test scenario, in which SARS-Cov-2 is compared to Coronaviridae and healthy human DNA sequences, we got 0.98824 ± 001198 for sensitivity and 0.99860 ± 0.00020 for specificity with MLP and 50% overlap. Therefore, the molecular diagnosis of Covid-19 can be optimized by combining RT-PCR and our pseudo-convolutional method to identify SARS-Cov-2 DNA sequences faster with higher specificity and sensitivity.

Related articles are currently not available for this article.