ProtSpace: a tool for visualizing protein space

This article has 2 evaluations Published on
Read the full article Related papers
This article on Sciety

Abstract

Protein language models (pLMs) generate high-dimensional representations of proteins, so called embeddings, that capture complex information stored in the set of evolved sequences. Interpreting these embeddings remains an important challenge.ProtSpaceprovides one solution through an open-source Python package that visualizes protein embeddings interactively in 2D and 3D. The combination of embedding space with protein 3D structure view aids in discovering functional patterns readily missed by traditional sequence analysis.

We present two examples to showcaseProtSpace. First, investigations of phage data sets showed distinct clusters of major functional groups and a mixed region, possibly suggesting bias in today’s protein sequences used to train pLMs. Second, the analysis of venom proteins revealed unexpected convergent evolution between scorpion and snake toxins; this challenges existing toxin family classifications and added evidence refuting theaculeatoxin family hypothesis.

ProtSpaceis freely available as a pip-installable Python package (source code & documentation) with examples on GitHub (<ext-link xmlns:xlink="http://www.w3.org/1999/xlink" ext-link-type="uri" xlink:href="https://github.com/tsenoner/protspace">https://github.com/tsenoner/protspace</ext-link>) and as a web interface (<ext-link xmlns:xlink="http://www.w3.org/1999/xlink" ext-link-type="uri" xlink:href="https://protspace.rostlab.org">https://protspace.rostlab.org</ext-link>). The platform enables seamless collaboration through portable JSON session files.

Related articles

Related articles are currently not available for this article.