Enabling the prediction of phage receptor specificity from genome data

Lucas Moriniere
Avery J. C. Noonan
Alexey Kazakov
Melina Pena
Madeline Svab
Edwin O. Rivera-Lopez
Flavien Maucourt
Milo S. Johnson
Simon Roux
Britt Koskella
Adam M. Deutschbauer
Edward G. Dudley
Vivek K. Mutalik
Adam P. Arkin

2 evaluations Published on Apr 19, 2026

This article on Sciety

Abstract

Predicting which receptor a phage binds to from genome sequence alone has remained an intractable challenge, principally because the experimental phenotypic data required to train and validate predictive models have not been available at sufficient scale. Here we address this by conducting 1,050 genome-wide genetic screens across 255 taxonomically diverse Escherichia coli dsDNA phages, assigning host receptors to 193 phages across 19 receptor classes. Comparative genomics and AlphaFold3 structural modelling resolved the sequence determinants of specificity to defined receptor-binding protein domains and individual residues. Machine learning models trained on this dataset predicted host receptor identity from phage genome sequence alone without prior annotation of receptor-binding genes, achieving perfect precision and greater than 80% recall on 49 independently validated phages, and yielding predictions for 1,060 of 1,875 E. coli phage genomes in NCBI. Domain swaps redirected receptor specificity as predicted, and a single amino acid substitution proved both necessary and sufficient to switch recognition between two distinct porins. These results demonstrate that systematic phenotyping at scale makes sequence-based prediction of molecular interaction specificity tractable, with direct implications for phage-based medicine, microbiome engineering and the broader challenge of inferring host-pathogen interaction outcomes from sequence.

Related articles are currently not available for this article.