AI outperforms resident physicians in emergency medicine exams, study shows

Conducted in Qatar, the study sparked interest among medical journal readers

Published: 15 Dec 2024 - 07:52 am | Last Updated: 15 Dec 2024 - 08:01 am

Doha, Qatar: A unique study conducted in Qatar reveals that ChatGPT, a generative artificial intelligence chatbot, has demonstrated remarkable proficiency in theoretical aspects of emergency medicine (EM), outperforming resident physicians in examination settings.

The study, titled “Performance of ChatGPT in Emergency Medicine Residency Exams in Qatar: A Comparative Analysis with Resident Physicians,” has sparked interest within the readers. It is one of the most-read articles on the Qscience website this month.

Published in the recent edition of the Qatar Medical Journal, the study underscores the growing potential of artificial intelligence (AI) as a supplementary tool in medical education. The findings suggest that AI could play a crucial role in enhancing learning and assessment methods in medical fields, particularly in emergency medicine.

The study, conducted by a team of experts in Qatar, focuses on evaluating ChatGPT’s performance in EM residency examinations. By comparing the AI’s results to those of resident physicians, the research highlights the potential of AI to contribute to medical training and assessment.

The examination format consisted of multiple-choice questions (MCQs) designed by the same faculty responsible for Qatari Board EM exams. In August 2023, a retrospective descriptive study with a mixed-methods design was carried out. The performance of 238 emergency department residents from various postgraduate years (PGY1 to PGY4) was assessed. The participants’ examination scores were then compared with those of ChatGPT, which completed the same exams.

The results revealed that ChatGPT consistently outperformed residents across all examination categories. However, senior residents (PGY3 and PGY4) experienced a notable decline in passing rates, raising concerns about the alignment of theoretical exam performance with practical competencies.

One possible explanation for this trend is the impact of the COVID-19 pandemic on senior residents’ learning experiences and knowledge consolidation.

The study sample included 238 residents, with a diverse demographic spread across PGY1 to PGY4. Specifically, 58 PGY1 residents (23.8%), 61 PGY2 residents (25.1%), 66 PGY3 residents (27.2%), and 53 PGY4 residents (21.8%) participated. The gender distribution was approximately two male residents for every female. The passing scores required for each postgraduate year were as follows: PGY4 (60%), PGY3 (55%), PGY2 (50%), and PGY1 (45%).

Each examination included 40 questions, with a maximum score of 40 points. The findings of this study not only suggest that AI models like ChatGPT can excel in theoretical examinations but also highlight the evolving role of AI in medical education.

The inclusion of AI tools in healthcare has already begun transforming medical practices, from diagnosis to treatment strategies, and now, as demonstrated in this study, extends to educational methods. “ChatGPT demonstrated significant proficiency in the theoretical knowledge of EM, outperforming resident physicians in examination settings. This finding suggests the potential of AI as a supplementary tool in medical education,” says the study conclusion.