Dimensionality Reduction of Genetic Data using Contrastive Learning

This article has 2 evaluations Published on
Read the full article Related papers
This article on Sciety

Abstract

We introduce a framework for using contrastive learning for dimensionality reduction on genetic datasets to create PCA-like population visualizations. Contrastive learning is an example of a self-supervised deep learning method that uses similarities between samples to train the neural network to discriminate between samples. Much of the advances in these types of models have been made for computer vision, but many of the heuristics developed do not translate well from image to genetic data. We define a loss function that outperforms other basic loss functions used in contrastive learning in our experiments, and a data augmentation scheme tailored specifically towards SNP genotype datasets.

In our experiments, our methods outperform PCA in terms of population classification. It is on par with t-SNE, while also providing greater generalization properties to unseen and missing data. A strength of the deep learning framework is the possibility of projecting new samples using a trained model, and the ability to incorporate more domain-specific information in the model. We show examples of population classification on two datasets of dog and human genotypes.

Related articles

Related articles are currently not available for this article.