Towards Evaluating the Diagnostic Ability of LLMs

This article has 1 evaluations Published on
Read the full article Related papers
This article on Sciety

Abstract

On average, one in ten patients die because of a diagnostic error and medical errors are the third largest cause of death in the US. While LLMs have been proposed to help doctors with diagnoses, no research results have been published on comparing the diagnostic ability of many popular LLMs on an openly accessible real-patient cohort. In thus study, we compare LLMs from Google, OpenAI, Meta, Mistral, Cohere and Anthropic using our previously published evaluation methodology and explore improving their accuracy with RAG.

Related articles

Related articles are currently not available for this article.