Ali-U-Net: A Convolutional Transformer Neural Net for Multiple Sequence Alignment of DNA Sequences. A proof of concept

Petar Arsic
Christoph Mayer

0 evaluations Published on Mar 2, 2025

This article on Sciety

Abstract

We report a convolutional transformer neural network that is capable of aligning multiple nucleotide sequences. The neural network is based on the U-Net commonly used in image segmentation which we employ to transform unaligned sequences to aligned sequences. For alignment scenarios our Ali-U-Net neural network has been trained on, it is in most cases more accurate than programs such as MAFFT, T-Coffee, MUSCLE, and Clustal Omega, while being considerably faster than similarly accurate programs on a single CPU core. Limitations are that the neural network is still trained specifically for certain alignment problems and can perform poorly for gap distributions it has not seen before. Furthermore, the algorithm currently works with fixed-size alignment windows of 48×48 or 96×96 nucleotides. At this stage, we view our study as a proof of concept, confident that the present findings can be extended to larger alignments and more complex alignment scenarios in the near future.

Related articles are currently not available for this article.