Effective sequence-to-expression prediction for membrane proteins using machine learning and computational protein design

This article has 0 evaluations Published on
Read the full article Related papers
This article on Sciety

Abstract

The recombinant expression of integral membrane proteins is notoriously challenging. One way to address this challenge is via computational genotype-to-phenotype models that determine how particular sequence features correlate with protein expression levels. The potential of such approaches is yet to be fully realised, at least partly because so few expression datasets are available. Here, we study the sequence-to-expression relationships of a library of 12,248 membrane proteins derived from combinatorial computational design. The expression phenotype of the entire library is assessed in the widely-used recombinant host Escherichia coli. We employed selected phenotypic data to train a sequence-to-expression predictor using supervised machine learning, which achieved high classification accuracy on held-out test sequences. This model was then used to infer the expression of >10,000 unmeasured sequences, and validation of the top predictions of both high and low expressers achieved 100% success rate. Using tools from explainable AI, we identified specific sequence positions and substitutions that are most important in dictating cellular expression levels. This analysis was validated by model-guided protein engineering that achieved an 8-fold increase in the purification yield of a poorly-expressing variant. We find that cells accumulate elevated levels of transcript mRNA for high-expressing proteins, and speculate that this arises from efficient translation-coupled membrane insertion which minimises transcript degradation. Our results show that computational protein design in tandem with supervised learning leads to effective models for the discovery of protein variants with improved expression phenotypes, and can decode the molecular basis of membrane protein expression.

Related articles

Related articles are currently not available for this article.