Toward De Novo Protein Design from Natural Language

This article has 3 evaluations Published on
Read the full article Related papers
This article on Sciety

Abstract

Programming biological function—designing bespoke proteins that execute specific tasks on demand—is a foundational goal of molecular engineering. Yet, current protein design paradigms remain fundamentally limited, typically requiring either an existing protein to evolve from, or deep, family-specific expertise to guide the design process. Here we introduce Pinal, a generative model that overcomes this barrier by translating functional descriptions in natural language directly into diverse and active proteins. This capability is built upon a 16-billion-parameter foundation model trained on an unprecedented synthetic corpus of 1.7 billion protein-text pairs, enabling it to ground functional language in the biophysical principles of protein structure. To provide definitive experimental validation, we tasked Pinal with designing four proteins from distinct functional classes: a fluorescent protein, a polyethylene terephthalate hydrolase, an alcohol dehydrogenase, and a metabolic H-protein. Remarkably, all four designs were functionally active and the two Pinal-designed enzymes achieved catalytic turnover for their respective reactions. Notably, the Pinal-designed H-protein even surpassed its natural counterpart, exhibiting 1.7-fold higher performance. Our results establish that natural language can serve as a programmable instruction set for biology, democratizing protein design and shifting the paradigm from the incremental modification of existing molecules to the direct creation of function from a conceptual description.

Related articles

Related articles are currently not available for this article.