Miao, Jing; Thongprayoon, Charat; Cheungpasitporn, Wisit; Cornell, Lynn D

doi:10.1093/ajcp/aqae030

Back to matches

Your institution may have rights to this item. Sign in to continue.

Title: Performance of GPT-4 Vision on kidney pathology exam questions.
Authors: Miao, Jing; Thongprayoon, Charat; Cheungpasitporn, Wisit; Cornell, Lynn D
Abstract: Objectives ChatGPT (OpenAI, San Francisco, CA) has shown impressive results across various medical examinations, but its performance in kidney pathology is not yet established. This study evaluated proficiencies of GPT-4 Vision (GPT-4V), an updated version of the platform with the ability to analyze images, on kidney pathology questions and compared its responses with those of nephrology trainees. Methods Thirty-nine questions (19 text-based questions and 20 with various kidney biopsy images) designed specifically for the training of nephrology fellows were employed. Results GPT-4V displayed comparable accuracy rates in the first and second runs (67% and 72%, respectively, P = .50). The aggregated accuracy, however—particularly, the consistent accuracy—of GPT-4V was lower than that of trainees (72% and 67% vs 79%). Both GPT-4V and trainees displayed comparable accuracy in responding to image-based and text-only questions (55% vs 79% and 81% vs 78%, P = .11 and P = .67, respectively). The consistent accuracy in image-based, directly asked questions for GPT-4V was 29%, much lower than its 88% consistency on text-only, directly asked questions (P = .02). In contrast, trainees maintained similar accuracy in directly asked image-based and text-based questions (80% vs 77%, P = .65). Although the aggregated accuracy for correctly interpreting images was 69%, the consistent accuracy across both runs was only 39%. The accuracy of GPT-4V in answering questions with correct image interpretation was significantly higher than for questions with incorrect image interpretation (100% vs 0% and 100% vs 33% for the first and second runs, P = .001 and P = .02, respectively). Conclusions The performance of GPT-4V in handling kidney pathology questions, especially those including images, is limited. There is a notable need for enhancement in GPT-4V proficiency in interpreting images.
Subjects: IMAGE analysis; GENERATIVE pre-trained transformers; CHATGPT; RENAL biopsy; PERIODIC health examinations
Publication: American Journal of Clinical Pathology, 2024, Vol 162, Issue 3, p220
ISSN: 0002-9173
Publication type: Article
DOI: 10.1093/ajcp/aqae030

We found a match

Performance of GPT-4 Vision on kidney pathology exam questions.

Miao, Jing; Thongprayoon, Charat; Cheungpasitporn, Wisit; Cornell, Lynn D

IMAGE analysis; GENERATIVE pre-trained transformers; CHATGPT; RENAL biopsy; PERIODIC health examinations

American Journal of Clinical Pathology, 2024, Vol 162, Issue 3, p220

0002-9173

Article

10.1093/ajcp/aqae030