Vision Transformer Autoencoders for Unsupervised Representation Learning: Capturing Local and Non-Local Features in Brain Imaging to Reveal Genetic Associations

Samia R. Islam
Ziqian Xie
Wei He
Degui Zhi

0 evaluations Published on Mar 25, 2025

This article on Sciety

Abstract

The discovery of genetic loci associated with brain architecture can provide deeper insights into neuroscience and improved personalized medicine outcomes. Previously, we designed the Unsupervised Deep learning-derived Imaging Phenotypes (UDIPs) approach to extract endophenotypes from brain imaging using a convolutional (CNN) autoencoder, and conducted brain imaging GWAS on UK Biobank (UKBB). In this work, we leverage a vision transformer (ViT) model due to a different inductive bias and its ability to potentially capture unique patterns through its pairwise attention mechanism. Our approach based on 128 endophenotypes derived from average pooling discovered 10 loci previously unreported by CNN-based UDIP model, 3 of which were not found in the GWAS catalog to have had any associations with brain structure. Our interpretation results demonstrate the ViT’s capability in capturing non-local patterns such as left-right hemisphere symmetry within brain MRI data, by leveraging its attention mechanism and positional embeddings. Our results highlight the advantages of transformer-based architectures in feature extraction and representation for genetic discovery.

Related articles are currently not available for this article.