Extracting Carotid Stenosis Severity from Clinical Notes Using Natural Language Processing: Development, Validation, and Application in a Nationwide Veteran Cohort
Abstract
Importance
Carotid stenosis, which is atherosclerotic narrowing of the extracranial carotid arteries, is an important risk factor for ischemic stroke. The prevalence of asymptomatic carotid stenosis is generally low, with moderate and severe stenosis present in up to ~6% and ~2% of the population, respectively. Prior studies of carotid stenosis have been small, and risk factors for carotid stenosis severity have been incompletely described. We sought to leverage the rich electronic health record data within the Veterans Health Administration to assess the prevalence and risk factors for carotid stenosis at the population level.
Objective
Develop and validate a natural language processing (NLP) tool to extract the ratio of peak systolic velocity of the internal carotid artery to that of the common carotid artery (ICA/CCA ratio) from carotid duplex ultrasound reports. Identify significant risk factors, presence and severity of carotid stenosis.
Design
Retrospective cross-sectional analysis
Setting
Veterans Health Administration (VHA) from 2001 to 2020
Participants
Veterans who underwent carotid artery duplex scans in the VHA from 2001 to 2020 and who had at least one valid ICA/CCA ratio. We excluded patients who had undergone carotid endarterectomy or stenting or who had a stroke or transient ischemic attack prior to the index date.
Exposure(s)
Carotid artery duplex scan and cardiovascular disease risk factors including age, sex, self-identified race and ethnicity, healthcare utilization, smoking status, body mass index (BMI), blood pressure, hypertension, coronary heart disease, type 2 diabetes, and selected laboratory measures (i.e., hemoglobin A1c, low-density lipoprotein cholesterol, high-density lipoprotein cholesterol, triglyceride, and creatinine).
Main outcome
A categorical variable indicating carotid stenosis severity (<50%, 50-69%, ≥70%) based on ICA/CCA ratio (<2, ≥2 to <4, ≥4).
Results
The harmonic F1 score of the NLP tool was 0.907 for right value, 0.882 for left value, and 0.920 for max value. Among the 290,517 Veterans in the cohort, the median age was 68.2 years (IQR 61.9–75.0). 277,934 (95.7%) were males and 28,348 (10%) were of self-reported Black race. Black patients had 16% decreased risk of more severe carotid stenosis (OR 0.84, 95% CI 0.81–0.87, p<0.001). Yet, sensitivity analysis showed that among those with hemodynamically significant carotid stenosis (≥70% vs. 50–69%), Black race was associated with a 33% increased risk of severe carotid stenosis compared to White race (OR 1.33, 95% CI 1.23-1.43, p<0.001). All patient-level risk factors except high-density lipoprotein cholesterol were significantly associated with carotid stenosis severity.
Conclusion and relevance
The NLP tool performed well, and the study performed with our NLP-created cohort largely validates the risk factors identified by previous smaller studies, speaking to the validity of our tool to create usable cohorts for future large-scale research.
Related articles
Related articles are currently not available for this article.