Visual and Quantitative Analyses of Virus Genomic Sequences using a Metric-based Algorithm
Abstract
This work aims to study the virus RNAs using a novel algorithm for accelerated exploring any-length genomic fragments in sequences using Hamming distance between the binary-expressed characters of an RNA and query patterns. The found repetitive genomic sub-sequences of different lengths were placed on one plot as genomic trajectories (walks) to increase the effectiveness of geometrical multi-scale genomic studies. Primary attention was paid to the building and analysis of theatg-triplet walks composing the schemes or skeletons of the viral RNAs. The 1-D distributions of these codon-startingatg-triplets were built with the single-symbol walks for full-scale analyses. The visual examination was followed by calculating statistical parameters of genomic sequences, including the estimation of geometry deviation and fractal properties of inter-atgdistances. This approach was applied to the SARS CoV-2, MERS CoV, Dengue and Ebola viruses, whose complete genomic sequences are taken from GenBank and GISAID databases. The relative stability of these distributions for SARS CoV-2 and MERS CoV viruses was found, unlike the Dengue and Ebola distributions that showed an increased deviation of their geometrical and fractal characteristics ofatg-distributions. The results of this work can found in classification of the virus families and in the study of their mutation.
Related articles
Related articles are currently not available for this article.