pmparser and PMDB: resources for large-scale, open studies of the biomedical literature

Joshua L. Schoenbachler
Jacob J. Hughey

3 evaluations Published on Oct 5, 2020

This article on Sciety

Abstract

PubMed is an invaluable resource for the biomedical community. Although PubMed is freely available, the existing API is not designed for large-scale analyses and the XML structure of the underlying data is inconvenient for complex queries. We developed an R package called pmparser to convert the data in PubMed to a relational database. Our implementation of the database, called PMDB, currently contains data on over 31 million PubMed Identifiers (PMIDs) and is updated regularly. Together, pmparser and PMDB can enable large-scale, reproducible, and transparent analyses of the biomedical literature. pmparser is licensed under GPL-2 and available at <ext-link xmlns:xlink="http://www.w3.org/1999/xlink" ext-link-type="uri" xlink:href="https://pmparser.hugheylab.org">https://pmparser.hugheylab.org</ext-link> . PMDB is stored in PostgreSQL and compressed dumps are available on Zenodo ( <ext-link xmlns:xlink="http://www.w3.org/1999/xlink" ext-link-type="uri" xlink:href="https://doi.org/10.5281/zenodo.4008109">https://doi.org/10.5281/zenodo.4008109</ext-link> ).

Related articles are currently not available for this article.