A regulatory medical device dataset with risk labels and an image-linked subset from the NMPA registry

This article has 0 evaluations Published on
Read the full article Related papers
This article on Sciety

Abstract

We present NMPA-MedDevice, a regulatory dataset derived from China's National Medical Products Administration (NMPA) Unique Device Identification (UDI) registry. The release comprises four components: (1) a frozen raw snapshot of the NMPA UDI registry (66,472 records, July 2024); (2) a reproducibly cleaned text-and-metadata corpus of approximately 52,000 unique device records with risk class labels deterministically derived from the ninth character of the NMPA registration number; (3) a curated image-linked subset of 1,005 devices (Class I/II/III, 39/462/504) with precomputed text and image feature embeddings; and (4) an external temporal validation set of 300 devices from a later registry update (October--November 2025). All textual data, derived labels, the cleaned corpus, preprocessing scripts, dataset splits, and precomputed features are publicly deposited. Raw product images are not redistributed due to copyright restrictions; precomputed embeddings and image retrieval scripts are provided instead.

Related articles

Related articles are currently not available for this article.