MBGC: Multiple Bacteria Genome Compressor

This article has 3 evaluations Published on
Read the full article Related papers
This article on Sciety

Abstract

Summary

Genomes within the same species reveal large similarity, exploited by specialized multiple genome compressors. The existing algorithms and tools are however targeted at large, e.g., mammalian, genomes, and their performance on bacteria strains is mediocre. In this work, we propose MBGC, a specialized genome compressor making use of specific redundancy of bacterial genomes. Our tool is not only compression efficient, but also fast. On a collection of 168,311 bacterial genomes, totalling 587 GB, we achieve the compression ratio around the factor of 730, and the compression (resp. decompression) speed around 1070 MB/s (resp. 740 MB/s) using 8 hardware threads, on a computer with a 6-core / 12-thread CPU and a fast SSD, being about 4 times more succinct and more than an order of magnitude faster in the compression than our main competitors.

Availability and implementation

MBGC is freely available at github.com/kowallus/mbgc.

Related articles

Related articles are currently not available for this article.