Annotating Metagenomically Assembled Bacteriophage from a Unique Ecological System using Protein Structure Prediction and Structure Homology Search
Abstract
Emergent long read sequencing technologies such as Oxford’s Nanopore platform are invaluable in constructing high quality and complete genomes from a metagenome, and are needed investigate unique ecosystems on a genetic level. However, generating informative functional annotations from sequences which are highly divergent to existing nucleotide and protein sequence databases is a major challenge. In this study, we present wet and dry lab techniques which allowed us to generate 5432 high quality sub-genomic sized metagenomic circular contigs from 10 samples of microbial communities. This unique ecological system exists in an environment enriched with naphthenic acid (NA), which is a major toxic byproduct in crude oil refining and the major carbon source to this community. Annotation by sequence homology alone was insufficient to characterize the community, so as proof of principle we took a subset of 227 putative bacteriophage and greatly improved our existing annotations by predicting the structures of hypothetical proteins with ColabFold and using structural homology searching with Foldseek. The proportion of proteins for each bacteriophage that were highly similar to known proteins increased from approximately 10% to about 50%, while the number of annotations with KEGG or GO terms increased from essentially 0% to 15%. Therefore, protein structure prediction and homology searches can produce more informative annotations for microbes in unique ecological systems. The characterization of novel microbial ecosystems involved in the bioremediation of crude oil-process-affected wastewater can be greatly improved and this method opens the door to the discovery of novel NA degrading pathways.
IMPORTANCE
Functional annotation of metagenomic assembled sequences from novel or unique microbial communities is challenging when the sequences are highly dissimilar to organisms or proteins in the known databases. This is a major obstacle for researchers attempting to characterize the functional capabilities of unique ecosystems. In this study, we demonstrate that including protein structure prediction and homology search based methods vastly improves the annotation of predicted genes identified in novel putative bacteriophage in a bacterial community that degrades naphthenic acids the major toxic component of oil refinery wastewater. This method can be extended to similar genomics studies of unique, uncharacterized ecosystems, to improve their annotations.
Please read the<ext-link xmlns:xlink="http://www.w3.org/1999/xlink" ext-link-type="uri" xlink:href="https://journals.asm.org/journal/msystems/submission-review-process">Instructions to Authors</ext-link>carefully, or browse the<ext-link xmlns:xlink="http://www.w3.org/1999/xlink" ext-link-type="uri" xlink:href="https://journals.asm.org/journal/msystems/faq">FAQs</ext-link>for further details.
Related articles
Related articles are currently not available for this article.