- ChatGPT has already passed the US Medical Licensing Exam, the test all doctors must take.
- Now radiologists say it can also pass their specialist board exam.
ChatGPT is becoming a great medical test taker.
Its latest and most advanced version, ChatGPT-4, can already pass the US Medical Licensing Exam with flying colors. And now, it's moved one step closer to becoming a specialized physician. Well, sort of.
On Tuesday, scientists announced that the newest chatbot software from OpenAI can handily pass a Canadian or US-style radiology board exam – scoring more than ten points above the 70% passing threshold.
But there's a catch: Because ChatGPT is only designed to process language (so far), the AI's radiology exam didn't include any images. Feels like kind of a major oversight for a branch of medicine concerned with diagnosing based on X-rays, MRIs, and other body pictures.
ChatGPT did well on 'challenging' questions, but got some of the basics of radiology wrong
ChatGPT has shown itself to be a formidable test taker — it's passed exams including the SAT, the bar exam, and even the challenging master sommelier tests.
On the radiology exam, ChatGPT-4 delivered a passing score of 81%. The bot scored particularly well on higher-order thinking questions that require skills beyond just memory recall, like analysis, synthesis, and evaluation; ChatGPT-4 did well at describing imaging findings (85%), and applying concepts (90%). But it didn't do so great on some of the more straightforward questions on the test, and actually got 12 of those questions wrong that its predecessor, GPT 3.5, got right.
"We were initially surprised by ChatGPT's accurate and confident answers to some challenging radiology questions," study author Dr. Rajesh Bhayana, an abdominal radiologist at Toronto General Hospital, said in a press release. "But then equally surprised by some very illogical and inaccurate assertions."
Why ChatGPT is increasingly good at acing hard tests
At its core, ChatGPT is simply designed to predict the best next word to spit out – in other words, it's built to chat you up. That chattiness can easily be harnessed to pontificate on exams, producing a very confident (but sometimes wrong) test taker.
Brown computer science professor Ellie Pavlick, a natural language processing expert, says this issue isn't unique to chatbots. She's always struggled to tell whether students really grasp the concepts she's taught based on their written exam answers.
"If you give some well-constructed language, it seems like you know, but maybe you do, maybe you don't," Pavlick said during a recent ChatGPT roundtable at Brown University. "This is a nice indication of why we kind of want to attribute much more knowledge and awareness to ChatGPT than it actually really has, because there's just something about well-constructed language that can really mask [poor] understanding."
Doctors say that ChatGPT shouldn't be used to diagnose or treat patients, and should always have its medical accuracy checked by a person. But medical experts are also discovering how ChatGPT can be a nice tool for improving a doctor's communication with their patients. It is precisely because ChatGPT excels at banter that it is often regarded as more compassionate than hurried doctors.