Learning torus PCA based classification for multiscale RNA backbone structure correction with application to SARS-CoV-2
Abstract
Motivation
Reconstructions of structure of biomolecules, for instance via X-ray crystallography or cryo-EM frequently contain clashes of atomic centers. Correction methods are usually based on simulations approximating biophysical chemistry, making them computationally expensive and often not correcting all clashes.
Results
We propose a computationally fast data-driven statistical method yielding suites free from within-suite clashes: From such a clash free training data set, devising mode hunting after torus PCA on adaptive cutting average linkage tree clustering (MINTAGE), we learn RNA suite shapes. With classification based on multiscale structure enhancement (CLEAN), for a given clash suite we determine its neighborhood on a mesoscopic scale involving several suites. As corrected suite we propose the Fréchet mean on a torus of the largest classes in this neighborhood. We validate CLEAN MINTAGE on a benchmark data set, compare it to a state of the art correction method and apply it, as proof of concept, to two exemplary suites adjacent to helical pieces of the frameshift stimulation element of SARS-CoV-2 which are difficult to reconstruct. In contrast to a recent reconstruction proposing several different structure models, CLEAN MINTAGE unanimously proposes structure corrections within the same clash free class for all suites.
Code Availability
<ext-link xmlns:xlink="http://www.w3.org/1999/xlink" ext-link-type="uri" xlink:href="https://gitlab.gwdg.de/henrik.wiechers1/clean-mintage-code">https://gitlab.gwdg.de/henrik.wiechers1/clean-mintage-code</ext-link>
Related articles
Related articles are currently not available for this article.