Enabling the prediction of phage receptor specificity from genome data
Abstract
Predicting which receptor a phage binds to from genome sequence alone has remained an intractable challenge, principally because the experimental phenotypic data required to train and validate predictive models have not been available at sufficient scale. Here we address this by conducting 1,050 genome-wide genetic screens across 255 taxonomically diverse Escherichia coli dsDNA phages, assigning host receptors to 193 phages across 19 receptor classes. Comparative genomics and AlphaFold3 structural modelling resolved the sequence determinants of specificity to defined receptor-binding protein domains and individual residues. Machine learning models trained on this dataset predicted host receptor identity from phage genome sequence alone without prior annotation of receptor-binding genes, achieving perfect precision and greater than 80% recall on 49 independently validated phages, and yielding predictions for 1,060 of 1,875 E. coli phage genomes in NCBI. Domain swaps redirected receptor specificity as predicted, and a single amino acid substitution proved both necessary and sufficient to switch recognition between two distinct porins. These results demonstrate that systematic phenotyping at scale makes sequence-based prediction of molecular interaction specificity tractable, with direct implications for phage-based medicine, microbiome engineering and the broader challenge of inferring host-pathogen interaction outcomes from sequence.
Related articles
Related articles are currently not available for this article.