Modeling the trajectory of SARS-CoV-2 spike protein evolution in continuous latent space using a neural network and Gaussian process
Abstract
Viral vaccines can lose their efficacy as the genomes of targeted viruses rapidly evolve, resulting in new variants that may evade vaccine-induced immunity. This process is apparent in the emergence of new SARS-CoV-2 variants which have the potential to undermine vaccination efforts and cause further outbreaks. Predictive vaccinology points to a future of pandemic preparedness in which vaccines can be developed preemptively based in part on predictive models of viral evolution. Thus, modeling the trajectory of SARS-CoV-2 spike protein evolution could have value for mRNA vaccine development. Traditionally, in silico sequence evolution has been modeled discretely, while there has been limited investigation into continuous models. Here we present the Viral Predictor for mRNA Evolution (VPRE), an open-source software tool which learns from mutational patterns in viral proteins and models their most statistically likely evolutionary trajectories. We trained a variational autoencoder with real-time and simulated SARS-CoV-2 genome data from Australia to encode discrete spike protein sequences into continuous numerical variables. To simulate evolution along a phylogenetic path, we trained a Gaussian process model with the numerical variables to project spike protein evolution up to five months in advance. Our predictions mapped primarily to a sequence that differed by a single amino acid from the most reported spike protein in Australia within the prediction timeframe, indicating the utility of deep learning and continuous latent spaces for modeling viral protein evolution. VPRE can be readily adapted to investigate and predict the evolution of viruses other than SARS-CoV-2 in temporal, geographic, and lineage-specific pathways.
Related articles
Related articles are currently not available for this article.