A large language model (LLM), or new artificial intelligence (AI) system, called ChatGPT is intended to produce writing that resembles that of a person by anticipating future word sequences. ChatGPT is unable to conduct online searches, unlike most chatbots. Instead, it produces text based on word relationships that are predicted by internal processes.
Kung and colleagues tested ChatGPT's performance on the USMLE, a highly standardized and regulated series of three exams (Steps 1, 2CK, and 3) required for medical licensure in the United States. Taken by medical students and physicians-in-training, the USMLE assesses knowledge spanning most medical disciplines, ranging from biochemistry, to diagnostic reasoning, to bioethics.
After screening to remove image-based questions, the authors tested the software on 350 of the 376 public questions available from the June 2022 USMLE release.
After indeterminate responses were removed, ChatGPT scored between 52.4 per cent and 75.0 per cent across the three USMLE exams. The passing threshold each year is approximately 60 per cent. ChatGPT also demonstrated 94.6 per cent concordance across all its responses and produced at least one significant insight (something that was new, non-obvious, and clinically valid) for 88.9 per cent of its responses. Notably, ChatGPT exceeded the performance of
While the relatively small input size restricted the depth and range of analyses, the authors note their findings provide a glimpse of ChatGPT's potential to enhance medical education, and eventually, clinical practice. For example, they add, clinicians at AnsibleHealth already use ChatGPT to rewrite jargon-heavy reports for easier patient comprehension.
"Reaching the passing score for this notoriously difficult expert exam, and doing so without any human reinforcement, marks a notable milestone in clinical AI maturation," say the authors.
Author