A practical DNA data storage using expanded alphabet introducing 5-methylcytosine
Abstract
DNA molecular is a promising next-generation data storage medium. Recently, it has been theoretically proposed that non-natural or modified bases can serve as extra molecular letters to increase the information density. However, the feasibility of the strategy is challenging due to the difficulty in synthesizing and the complex structure of non-natural DNA sequences. Here, we described a practical DNA data storage transcoding scheme named R+ based on expanded molecular alphabet by introducing 5-methlcytosine(5mC). We also demonstrated the experimental validation by encoding one representative file into several 1.3~1.6 kbpsin vitroDNA fragments for nanopore sequencing. The results show an average data recovery rate of 98.97% and 86.91% with and without reference respectively. This work validates the practicability of 5mC in DNA storage systems, with a potentially wide range of applications.
Availability & Implementation
R+ is implemented in Python and the code is available under the MIT license at<ext-link xmlns:xlink="http://www.w3.org/1999/xlink" ext-link-type="uri" xlink:href="https://github.com/Incpink-Liu/DNA-storage-R_plus">https://github.com/Incpink-Liu/DNA-storage-R_plus</ext-link>
Related articles
Related articles are currently not available for this article.