Benary, Manuela; Wang, Xing David; Schmidt, Max; Soll, Dominik; Hilfenhaus, Georg; Nassir, Mani; Sigler, Christian; Knödler, Maren; Keller, Ulrich; Beule, Dieter; Keilholz, Ulrich; Leser, Ulf; Rieke, Damian T.

doi:10.1001/jamanetworkopen.2023.43689

Back to matches

Your institution may have access to this item. Find your institution then sign in to continue.

Title: Leveraging Large Language Models for Decision Support in Personalized Oncology.
Authors: Benary, Manuela; Wang, Xing David; Schmidt, Max; Soll, Dominik; Hilfenhaus, Georg; Nassir, Mani; Sigler, Christian; Knödler, Maren; Keller, Ulrich; Beule, Dieter; Keilholz, Ulrich; Leser, Ulf; Rieke, Damian T.
Abstract: Key Points: Question: Can current conversational large language models (LLMs) be used as a tool for personalized decision-making in precision oncology? Findings: In this diagnostic study, treatment option identification from 4 LLMs for 10 fictional patients deviated substantially from expert recommendations. Nevertheless, LLMs correctly identified several important treatment strategies and partly provided reasonable suggestions that were not easily found by experts. Meaning: These results suggest that LLMs are not yet applicable as a routine tool for aiding personalized clinical decision-making in oncology, but do improve upon existing LLM-based methods. This diagnostic study evaluates treatment recommendations for 10 fictional patients with advanced cancer made by 4 large language models, as evaluated by members of a molecular tumor board. Importance: Clinical interpretation of complex biomarkers for precision oncology currently requires manual investigations of previous studies and databases. Conversational large language models (LLMs) might be beneficial as automated tools for assisting clinical decision-making. Objective: To assess performance and define their role using 4 recent LLMs as support tools for precision oncology. Design, Setting, and Participants: This diagnostic study examined 10 fictional cases of patients with advanced cancer with genetic alterations. Each case was submitted to 4 different LLMs (ChatGPT, Galactica, Perplexity, and BioMedLM) and 1 expert physician to identify personalized treatment options in 2023. Treatment options were masked and presented to a molecular tumor board (MTB), whose members rated the likelihood of a treatment option coming from an LLM on a scale from 0 to 10 (0, extremely unlikely; 10, extremely likely) and decided whether the treatment option was clinically useful. Main Outcomes and Measures: Number of treatment options, precision, recall, F1 score of LLMs compared with human experts, recognizability, and usefulness of recommendations. Results: For 10 fictional cancer patients (4 with lung cancer, 6 with other; median [IQR] 3.5 [3.0-4.8] molecular alterations per patient), a median (IQR) number of 4.0 (4.0-4.0) compared with 3.0 (3.0-5.0), 7.5 (4.3-9.8), 11.5 (7.8-13.0), and 13.0 (11.3-21.5) treatment options each was identified by the human expert and 4 LLMs, respectively. When considering the expert as a criterion standard, LLM-proposed treatment options reached F1 scores of 0.04, 0.17, 0.14, and 0.19 across all patients combined. Combining treatment options from different LLMs allowed a precision of 0.29 and a recall of 0.29 for an F1 score of 0.29. LLM-generated treatment options were recognized as AI-generated with a median (IQR) 7.5 (5.3-9.0) points in contrast to 2.0 (1.0-3.0) points for manually annotated cases. A crucial reason for identifying AI-generated treatment options was insufficient accompanying evidence. For each patient, at least 1 LLM generated a treatment option that was considered helpful by MTB members. Two unique useful treatment options (including 1 unique treatment strategy) were identified only by LLM. Conclusions and Relevance: In this diagnostic study, treatment options of LLMs in precision oncology did not reach the quality and credibility of human experts; however, they generated helpful ideas that might have complemented established procedures. Considering technological progress, LLMs could play an increasingly important role in assisting with screening and selecting relevant biomedical literature to support evidence-based, personalized treatment decisions.
Subjects: BIOMARKERS; CLINICAL decision support systems; NATURAL language processing; RESEARCH methodology; QUANTITATIVE research; RETROSPECTIVE studies; ARTIFICIAL intelligence; QUALITATIVE research; DECISION making; DESCRIPTIVE statistics; STATISTICAL models; ONCOLOGY
Publication: JAMA Network Open, 2023, Vol 6, Issue 11, pe2343689
ISSN: 2574-3805
Publication type: Article
DOI: 10.1001/jamanetworkopen.2023.43689

We found a match

Leveraging Large Language Models for Decision Support in Personalized Oncology.

Benary, Manuela; Wang, Xing David; Schmidt, Max; Soll, Dominik; Hilfenhaus, Georg; Nassir, Mani; Sigler, Christian; Knödler, Maren; Keller, Ulrich; Beule, Dieter; Keilholz, Ulrich; Leser, Ulf; Rieke, Damian T.

BIOMARKERS; CLINICAL decision support systems; NATURAL language processing; RESEARCH methodology; QUANTITATIVE research; RETROSPECTIVE studies; ARTIFICIAL intelligence; QUALITATIVE research; DECISION making; DESCRIPTIVE statistics; STATISTICAL models; ONCOLOGY

JAMA Network Open, 2023, Vol 6, Issue 11, pe2343689

2574-3805

Article

10.1001/jamanetworkopen.2023.43689