Design and experimental characterization of specificity-switching mutational paths of WW domains
Abstract
Specific interactions between proteins and other biomolecules are ubiquitous in cellular processes. How specificity is encoded in the protein sequence and can be modified through a minimal set of concerted mutations is a complex issue. In this work, we focus on the WW protein domain, whose variants specifically bind to different classes of proline-rich peptides. Combining unsupervised learning of homologous WW sequence data with Restricted Boltzmann Machines (RBM) and path-sampling methods, we design mutational paths of putative WW domains interpolating between two natural WW domains with either distinct or similar specificities. Sequences along the designed paths are then experimentally validated with high-throughput in-vitro binding assays against 3 peptides of different classes. The vast majority (93%) of intermediate sequences along the designed paths are responsive to the initial or/and final peptides. On the contrary, domains along scrambled paths, in which the same mutations are introduced in random order are not functional, emphasizing how successful design crucially depends on the ability to model epistatic interactions. Interestingly, switch in specificity between classes I and IV whose representative peptides bind to different pockets on the WW domain appears to be smooth, with intermediates displaying some level of binding cross-reactivity with all tested peptides. We finally show that the RBM paths share a high identity with internal nodes obtained from ancestral sequence reconstruction based on the seed WW domains.
Significance Statement
Generative machine-learning models are nowadays used to design new protein sequences with desired functions. Here, we address a more demanding task: designing a full mutational path connecting two natural proteins with different binding specificities. We illustrate this problem with WW domains, a small protein unit capable of recognizing distinct classes of proline-rich peptides. We experimentally verify that most of the intermediate sequences along the designed path are functional and respond to the initial or/and final peptides. The designed sequences share significant homology with the sequences obtained as internal nodes of phylogenetic trees through ancestral sequence reconstruction.
Related articles
Related articles are currently not available for this article.