Identification of residue inversions in large phylogenies of duplicated proteins

This article has 3 evaluations Published on
Read the full article Related papers
This article on Sciety

Abstract

Connecting protein sequence to function is becoming increasingly relevant since high-throughput sequencing studies accumulate large amounts of genomic data. In order to go beyond the existing database annotation, it is fundamental to understand the mechanisms underlying functional inheritance and divergence. If the homology relationship between proteins is known, can we determine whether the function diverged?

In this work, we analyze different possibilities of protein sequence evolution after gene duplication and identify “residue inversions”, i.e., sites where the relationship between the ancestry and the functional signal is decoupled. Residues in these sites are masked from being recognized by other prediction tools. Still, they play a role in functional divergence and could indicate a shift in protein function. We develop a method to specifically recognize residue inversions in a phylogeny and test it on real and simulated datasets. In a dataset built from the Epidermal Growth Factor Receptor (EGFR) sequences found in 88 fish species, we identify 19 positions that went through inversion after gene duplication, mostly located at the ligand-binding extracellular domain.

Our work uncovers a rare event of protein divergence that has direct implications in protein functional annotation and sequence evolution as a whole. The developed method is optimized to work with large protein datasets and can be readily included in a targeted protein analysis pipeline.

Related articles

Related articles are currently not available for this article.