SeroBA(v2.0) and SeroBAnk: a robust genome-based serotyping scheme and comprehensive atlas of capsular diversity in Streptococcus pneumoniae

This article has 0 evaluations Published on
Read the full article Related papers
This article on Sciety

Abstract

The unprecedented number ofStreptococcus pneumoniae(the pneumococcus) genomes sequenced in recent years has accelerated the discovery of novel serotypes and highlighted the genetic diversity both between and within each serotype. A novel serotype should demonstrate a distinctcpslocus, capsular structure, and serological profile. In only the past four years, nine new serotypes have been identified. Accurate and timely serotyping of pneumococcal isolates is key to understanding its global distribution, evolution, and the response of the bacterial population to vaccination. However, current bioinformatics serotyping tools are infrequently updated, and struggle to accommodate the rapid discovery of new serotypes in a timely manner. To address these limitations, we built a comprehensive and curated library (SeroBAnk) encompassing all known pneumococcal serotypes; this resource is presented as an atlas on a dedicated publicly accessible webpage (<ext-link xmlns:xlink="http://www.w3.org/1999/xlink" ext-link-type="uri" xlink:href="https://www.pneumogen.net/gps/#/serobank">https://www.pneumogen.net/gps/#/serobank</ext-link>). Building upon this resource, we developed SeroBA(v2.0), a tool with an easy-to-update database that can accurately identify 102 of 107 known pneumococcal serotypes (except for serotypes 24B, 24C, 24F, 7D and 6H) and 18 genetic subtypes within serotypes 6A, 6B, 11A, 19A, 19F and 33F. We validated SeroBA(v2.0) on 26,306 genomes from the Global Pneumococcal Sequencing project, reference isolates and simulated reads derived from the reference genetic sequences of capsular polysaccharide biosynthetic (cps) locus and showed that SeroBA(v2.0) can reliably detect the nine recently discovered serotypes. Additionally, we show thatin silicoserotypes inferred by SeroBA(v2.0) had high concordance with phenotypic serotypes determined by either Quellung or latex agglutination at the serotype level (88.9%; 15,945/17,933), and at the serogroup level (91.9%; 16,480/17,933). Finally, we propose a community-contribution based approach to ensure that SeroBA(v2.0) is maintained and updated as novel serotypes continue to be discovered. The global community can submit putative novel serotypes through our public repository on GitHub (<ext-link xmlns:xlink="http://www.w3.org/1999/xlink" ext-link-type="uri" xlink:href="https://github.com/GlobalPneumoSeq/seroba/issues">https://github.com/GlobalPneumoSeq/seroba/issues</ext-link>). The submitted putative novel serotypes will be curated based on the genetic sequence ofcpsregion, capsular structure and serological profile by people of relevant expertise in the field. SeroBA(v2.0) can be accessed at<ext-link xmlns:xlink="http://www.w3.org/1999/xlink" ext-link-type="uri" xlink:href="https://github.com/GlobalPneumoSeq/seroba">https://github.com/GlobalPneumoSeq/seroba</ext-link>.

Data summary

Genome sequences are available in the European Nucleotide Archive (ENA) and are also available alongside metadata on the Monocle Database available at<ext-link xmlns:xlink="http://www.w3.org/1999/xlink" ext-link-type="uri" xlink:href="https://data.monocle.sanger.ac.uk/">https://data.monocle.sanger.ac.uk/</ext-link>. The authors confirm all supporting data, code and protocols have been provided within the article or through supplementary data files.

Impact Statement

The polysaccharide capsule has been an effective vaccine antigen against diseases caused byStreptococcus pneumoniae(the pneumococcus). The pneumococcal conjugate vaccine has been estimated to have halved pneumococcal-related childhood mortality over 15 years (2000-2015). We collated the genetic locus and capsular structure of each known capsule type (serotype), alongside with pneumococcal vaccine formulation and licensure history, into a single webpage (SeroBAnk), providing a valuable resource for basic research and vaccine development. With increasing use of whole genome sequencing in clinical and public health laboratories, we also provided a fast and accurate bioinformatics tool, SeroBA(v2.0), to identify 102 pneumococcal serotypes, alongside a proposed system to expand SeroBA(v2.0) to include new serotypes as they are discovered, ensuring that the tool remains valuable to the global research community in the long-term.

Related articles

Related articles are currently not available for this article.