Streamlining remote nanopore data access withslow5curl
Abstract
As adoption of nanopore sequencing technology continues to advance, the need to maintain large volumes of raw current signal data for reanalysis with updated algorithms is a growing challenge. Here we introduceslow5curl, a software package designed to streamline nanopore data sharing, accessibility and reanalysis.Slow5curlallows a user to fetch a specified read or group of reads from a raw nanopore dataset stored on a remote server, such as a public data repository, without downloading the entire file.Slow5curluses an index to quickly fetch specific reads from a large dataset in SLOW5/BLOW5 format and highly parallelised data access requests to maximise download speeds. Using all public nanopore data from the Human Pangenome Reference Consortium (>22 TB), we demonstrate howslow5curlcan be used to quickly fetch and reanalyse signal reads corresponding to a set of target genes from each individual in large cohort dataset (n= 91), minimising the time, egress costs, and local storage requirements for their reanalysis. We provideslow5curlas a free, open-source package that will reduce frictions in data sharing for the nanopore community:<ext-link xmlns:xlink="http://www.w3.org/1999/xlink" ext-link-type="uri" xlink:href="https://github.com/BonsonW/slow5curl">https://github.com/BonsonW/slow5curl</ext-link>
Related articles
Related articles are currently not available for this article.