Discovery of Expression-Governing Residues in Proteins
Abstract
Understanding how amino acids influence protein expression is crucial for advancements in biotechnology and synthetic biology. In this study, we introduce Venus-TIGER, a deep learning model designed to accurately identify amino acids critical for expression. By constructing a two-dimensional matrix that links model representations to experimental fitness, Venus-TIGER achieves improved predictive accuracy and enhanced extrapolation capability. We validated our approach on both public deep mutational scanning datasets and low-throughput experimental datasets, demonstrating notable performance compared to traditional methods. Venus-TIGER exhibits robust trans-ferability in zero-shot predicting scenarios and enhanced predictive performance in few-shot learning, even with limited experimental data. This capability is particularly valuable for protein design aimed at enhancing expression, where generating large datasets can be costly and time-consuming. Additionally, we conducted a statistical analysis to identify expression-associated features, such as sequence and structural preferences, distinguishing between those linked to high and low expression. Our investigation also revealed a correlation among stability, activity and expression, providing insight into their interconnected roles and underlying mechanisms.
Related articles
Related articles are currently not available for this article.