Characterising Protein Search Drift using exhaustive protein search and Alphafold2
Abstract
In this paper we present the first exhaustive analysis of iterative protein search drift and show how such results may impact downstream modelling. Assembling and extracting evolutionary information from families of related proteins is a core challenge in the studey of molecular evolution. For instance, iterative protein search is a common first step in a wide variety of bioinformatics tools and pipelines. And the output of such searches often form the inputs for modelling tools such as Alphafold2. Here we characterise profile drift; the tendency for some searches to become contaminated with sequences outside of the intended evolutionary family. We observe that drift occurs in nearly 15% of searches and can be observed to have measurable impacts on downstream predictive tasks such as structure prediction.
Related articles
Related articles are currently not available for this article.