Affinity Map: Few-Shot Protein Family Classification via Prototypical Networks: Benchmarking Sequence Encoders and Episodic ESM-2 Fine-Tuning

This article has 0 evaluations Published on
Read the full article Related papers
This article on Sciety

Abstract

Protein family annotation is a cornerstone of computational biology, yet the acquisition of large, curated per-family corpora is laborious and often infeasible for rare families. We present Affinity Map, a meta-learning pipeline that frames protein family classification as a few-shot learning problem: given only K labelled examples from a previously unseen family, the model must correctly assign new sequences to that family. We systematically benchmark encoder quality under this episodic framework, ranging from a lightweight 1D-CNN trained from scratch through compositional k-mer baselines to a frozen ESM-2 protein language model and episodic LoRA fine-tuning, all evaluated under Prototypical Networks with N-way K-shot tasks sampled from the Pfam database. Evaluating on 24 held-out test families reveals: (1) CNN ProtoNet trained from scratch reaches 71.0% at K=5; (2) 3-mer frequency k-mer ProtoNet reaches 86.2%; (3) a frozen ESM-2 encoder reaches 88.7% at K=5; and (4) episodic LoRA fine-tuning of ESM-2 reveals a K-dependent interaction: LoRA gains +2.5 pp over frozen ESM-2 at K=1 (p < 0.001), but underperforms frozen ESM-2 at K >= 2, indicating that episodic adaptation improves single-shot retrieval at the cost of multi-shot prototype quality. All pairwise CNN vs. baseline differences are statistically significant (paired Wilcoxon, p < 0.001). Real per-epoch learning curves, a named confusion matrix, PCA/UMAP embedding visualisations, and comprehensive baseline comparisons provide biologically interpretable diagnostics throughout. All code and results are publicly available.

Related articles

Related articles are currently not available for this article.