Characterizing Dementia Phenotypes from Unstructured EHR Notes with Generative AI and Interpretable Machine Learning
Abstract
Dementia encompasses diverse clinical syndromes where diseases of the brain can manifest as impaired cognitive abilities, such as in Alzheimer's disease (AD) and behavioral-variant frontotemporal dementia (bvFTD). The diversity of symptom presentations often results in challenges in diagnosis. Crucial clinical information remains in unstructured narrative notes within electronic health records (EHRs). We leverage large language models (LLMs) for symptom phenotyping from notes in UCSF Information Commons, focusing on patients with expert dementia syndrome diagnosed from a multidisciplinary team of specialists from the UCSF Memory and Aging Center. We developed a pipeline to extract findings in a validated structured output, clustered into symptom groups, and then classified patients into syndromes with traditional machine learning paradigms. From over 9,000 cross-referenced patients and over 350,000 specialty-related notes, matched cohorts of bvFTD (122 patients) and AD (170) syndromes were identified. From notes, 12,637 distinct symptom phrases were extracted, with clustering analysis revealing 51 symptom groups. A logistic regression model separated AD and bvFTD with an AUC of 0.83. Disinhibition and obsessive-compulsive behaviors favored bvFTD, while anxiety and visuospatial abnormalities favored AD. This novel approach, combining LLM-based structured information extraction with traditional interpretable prediction paradigms, demonstrates a promising approach for enhanced symptom characterization in dementia. Our findings suggest potential future applications in improving diagnostic accuracy, developing prediction models, and optimizing treatment strategies in dementia care.
Related articles
Related articles are currently not available for this article.