SqueezeCall: Nanopore basecalling using a Squeezeformer network

This article has 2 evaluations Published on
Read the full article Related papers
This article on Sciety

Abstract

Nanopore sequencing, a novel third-generation sequencing technique, offers significant advantages over other sequencing approaches, owing especially to its capabilities for direct RNA sequencing, real-time analysis, and long-read length. During nanopore sequencing, the sequencer measures changes in electrical current that occur as each nucleotide passes through the nanopores. A basecaller identifies the base sequences according to the raw current measurements. However, due to variations in DNA and RNA molecules, noise from the sequencing process, and limitations in existing methodology, accurate basecalling remains a challenge. In this paper, we introduce SqueezeCall, a novel approach that uses an end-to-end Squeezeformer-based model for accurate nanopore basecalling. In SqueezeCall, convolution layers are used to down sample raw signals and to model local dependencies. A Squeezeformer network is employed to capture the global context. Finally, a connectionist temporal classification (CTC) decoder generates the DNA sequence by a beam search algorithm. Inspired by the Wav2vec2.0 model, we masked a proportion of the time steps of the convolution outputs before feeding them to the Squeezeformer network and replaced them with a trained feature vector shared between all masked time steps. Experimental results demonstrate that this method enhances our model’s ability to resist noise and allows for improved basecalling accuracy. We trained SqueezeCall using a combination of three types of loss: CTC-CRF loss, intermediate CTC-CRF loss, and KL loss. Ablation experiments show that all three types of loss contribute to basecalling accuracy. Experiments on multiple species further demonstrate the potential of the Squeezeformer-based model to improve basecalling accuracy and its superiority over a recurrent neural network (RNN)-based model and Transformer-based models.

Related articles

Related articles are currently not available for this article.