Evaluation of the Performance of ChatGPT 4.5 in LI-RADS Categorization and Management Suggestion: Zero-shot versus Few-shot Prompting Method


Abstract views: 143 / PDF downloads: 294

Authors

DOI:

https://doi.org/10.58600/eurjther2699

Keywords:

ChatGPT, LI-RADS, artificial intelligence, liver, few-shot

Abstract

Objective: To evaluate whether soft-prompt-based conditioning through “Few-shot” prompting improves the accuracy and clinical utility of ChatGPT 4.5 in classifying hepatic lesions and management recommendations according to the Liver Imaging Reporting and Data System (LI-RADS).

Methods: This cross-sectional observational study assessed ChatGPT 4.5 using fifty fictional radiology reports covering eight LI-RADS categories. The reports were evaluated under Zero-shot and “Few-shot” prompting conditions. Two board-certified radiologists independently scored the model’s LI-RADS categories and management suggestions using a binary correct/incorrect system. The model performance was compared to that of a radiologist, and statistical analysis was conducted using McNemar’s test, with p<0.05 considered significant.

Results: With zero-shot prompting, ChatGPT 4.5 correctly classified 84% of the LI-RADS categories and 70% of the management suggestions. “Few-shot” prompting improved performance, with 92% correct LI-RADS classification and 84% accurate management recommendations. Although the improvement in categorization was not statistically significant (p=0.125), the enhancement in management suggestions was significant (p=0.016). The radiologist comparator achieved 82% accuracy for the LI-RADS classification and 60% for management suggestions. Notably, ChatGPT 4.5, when supported by “Few-shot” prompting, outperformed the radiologist in recommending appropriate management.

Conclusion: “Few-shot” prompting transforms ChatGPT 4.5 from a diagnostic assistant into a powerful tool for clinical decision-making, significantly enhancing its ability to generate patient-centered management recommendations. This study is among the earliest to benchmark ChatGPT 4.5 against a radiologist in LI-RADS-based diagnostic and management tasks, underscoring its potential not only to streamline reporting but also to elevate the quality of patient care. As LLMs continue to evolve, they may become supportive tools in radiology, bridging between image interpretation and clinical decision.

References

Akinci D’Antonoli T, Stanzione A, Bluethgen C, Vernuccio F, Ugga L, Klontzas ME, Cuocolo R, Cannella R, Koçak B (2024) Large language models in radiology: fundamentals, applications, ethical considerations, risks, and future directions. Diagn Interv Radiol. 30(2):80–90. https://doi.org/10.4274/dir.2023.232417

Kim S, Lee CK, Kim SS (2024) Large Language Models: A Guide for Radiologists. Korean J Radiol. 25(2):126–133. https://doi.org/10.3348/kjr.2023.0997

Bhayana R (2024) Chatbots and Large Language Models in Radiology: A Practical Primer for Clinical and Research Applications. Radiology. 310(1). https://doi.org/10.1148/radiol.232756

Zaki HA, Aoun A, Munshi S, Abdel-Megid H, Nazario-Johnson L, Ahn SH (2024) The Application of Large Language Models for Radiologic Decision Making. J Am Coll Radiol. 21(7):1072–8. https://doi.org/10.1016/j.jacr.2024.01.007

Gomez E (2025) Large Language Models with Image Processing Capabilities: An Inevitable yet Undetermined Presence in Radiology Practice and Education. Acad Radiol. 32(5):3103-5. https://doi.org/10.1016/j.acra.2025.03.027

Busch F, Hoffmann L, dos Santos DP, Makowski MR, Saba L, Prucker P, Hadamitzky M, Navab N, Kather JN, Truhn D, Cuocolo R, Adams LC, Bressem KK (2024) Large language models for structured reporting in radiology: past, present, and future. Eur Radiol. 35:2589-602. https://doi.org/10.1007/s00330-024-11107-6

Salam B, Stüwe C, Nowak S, Sprinkart AM, Theis M, Kravchenko D, Mesropyan N, Dell T, Endler C, Pieper CC, Kuetting DL, Luetkens JA, Isaak A (2025) Large language models for error detection in radiology reports: a comparative analysis between closed-source and privacy-compliant open-source models. Eur Radiol. 35:4549-57. https://doi.org/10.1007/s00330-025-11438-y

Nakaura T, Ito R, Ueda D, Nozaki T, Fushimi Y, Matsui Y, Yanagawa M, Yamada A, Tsuboyama T, Fujima N, Tatsugami F, Hirata K, Fujita S, Kamagata K, Fujioka T, Kawamura M, Naganawa S (2024) The impact of large language models on radiology: a guide for radiologists on the latest innovations in AI. Jpn J Radiol. 42(7):685–96. https://doi.org/10.1007/s11604-024-01552-0

Serapio A, Chaudhari G, Savage C, Lee YJ, Vella M, Sridhar S, Schroeder JL, Liu J, Yala A, Sohn JH (2024) An open-source fine-tuned large language model for radiological impression generation: a multi-reader performance study. BMC Med Imaging. 24(1):254. https://doi.org/10.1186/s12880-024-01435-w

Chen L, Teotia R, Verdone A, Cardall A, Tyagi L, Shen Y, Chopra S (2024) Fine-Tuning In-House Large Language Models to Infer Differential Diagnosis from Radiology Reports. https://doi.org/10.48550/arXiv.2410.09234

American College of Radiology. (n.d.) Liver Imaging Reporting & Data System (LI-RADS). Available from https://www.acr.org/Clinical-Resources/Clinical-Tools-and-Reference/Reporting-and-Data-Systems/LI-RADS

Kamaya A, Fetzer DT, Seow JH, Burrowes DP, Choi HH, Dawkins AA, Fung C, Gabriel H, Hong CW, Khurana A, McGillen KL, Morgan TA, Sirlin CB, Tse JR, Rodgers SK (2024) LI-RADS US Surveillance Version 2024 for Surveillance of Hepatocellular Carcinoma: An Update to the American College of Radiology US LI-RADS. Radiology. 313(3):e240169. https://doi.org/10.1148/radiol.240169

Choi SH, Fowler KJ, Chernyak V, Sirlin CB (2024) LI-RADS: Current Status and Future Directions. Korean J Radiol. 25(12):1039–46. https://doi.org/10.3348/kjr.2024.0161

Shi Y, Shu P, Liu Z, Wu Z, Ren H, Li Q, Liu T, Liu N, Li X (2024) MGH Radiology Llama: A Llama 3 70B Model for Radiology. https://doi.org/10.48550/arXiv.2408.11848

Kanemaru N, Yasaka K, Fujita N, Kanzawa J, Abe O (2024) The Fine-Tuned Large Language Model for Extracting the Progressive Bone Metastasis from Unstructured Radiology Reports. J Imaging Inform Med. 38(2):865–72. https://doi.org/10.1007/s10278-024-01242-3

Yasaka K, Kanzawa J, Kanemaru N, Koshino S, Abe O (2024) Fine-Tuned Large Language Model for Extracting Patients on Pretreatment for Lung Cancer from a Picture Archiving and Communication System Based on Radiological Reports. J Imaging Inform Med. 38(1):327–34. https://doi.org/10.1007/s10278-024-01186-8

Kanemaru N, Yasaka K, Okimoto N, Sato M, Nomura T, Morita Y, Katayama A, Kiryu S, Abe O (2025) Efficacy of Fine-Tuned Large Language Model in CT Protocol Assignment as Clinical Decision-Supporting System. J Digit Imaging. Inform. med. https://doi.org/10.1007/s10278-025-01433-6

Martín-Noguerol T, López-Úbeda P, Luna A (2024) Large language models in Radiology: The importance of fine-tuning and the fable of the luthier. Eur J Radiol. 178:111627. https://doi.org/10.1016/j.ejrad.2024.111627

SadraeiJavaeri M, Asgari E, McHardy AC, Rabiee HR (2024) SuperPos-Prompt: Enhancing Soft Prompt Tuning of Language Models with Superposition of Multi Token Embeddings. Available from https://arxiv.org/pdf/2406.05279v1

Bossuyt PM, Reitsma JB, Bruns DE, Glasziou PP, Irwig L, Lijmer JG, Moher D, Rennie D, de Vet HCW, Kressel HY, Rifai N, Golub RM, Altman DG, Hooft L, Korevaar DA, Cohen JF (2015) STARD 2015: An updated list of essential items for reporting diagnostic accuracy studies. Radiology. 277(3):826–32. https://doi.org/10.1148/radiol.2015151516

Russe MF, Reisert M, Bamberg F, Rau A (2024) Improving the use of LLMs in radiology through prompt engineering: from precision prompts to zero-shot learning. Rofo. 196(11). https://doi.org/10.1055/a-2264-5631

Nayem J, Hasan SS, Amina N, Das B, Ali MS, Ahsan MM, Raman S (2023) Few Shot Learning for Medical Imaging: A Comparative Analysis of Methodologies and Formal Mathematical Framework. In: Data Driven Approaches Med Imaging.

Galán-Cuenca A, Gallego AJ, Saval-Calvo M, Pertusa A (2024) Few-shot learning for COVID-19 chest X-ray classification with imbalanced data: an inter vs. intra domain study. Pattern Anal Appl. 27(3):1–15. https://doi.org/10.1007/s10044-024-01285-w

Fink A, Rau A, Kotter E, Bamberg F, Russe MF (2025) Optimized interaction with Large Language Models: A practical guide to Prompt Engineering and Retrieval-Augmented Generation. Radiologie. 65(4). https://doi.org/10.1007/s00117-025-01416-2

Pachetti E, Colantonio S (2024) A systematic review of few-shot learning in medical imaging. Artif Intell Med. 156:102949. https://doi.org/10.1016/j.artmed.2024.102949

Kaba E, (2024) Zero-, Single-, and Few-Shot Learning in Large Language Models to Identify Incidental Findings From Radiology Reports. AJR Am J Roentgenol. 222(3). https://doi.org/10.2214/AJR.24.31014

Zhu L, Mou W, Chen R (2023) Can the ChatGPT and other large language models with internet-connected database solve the questions and concerns of patient with prostate cancer and help democratize medical knowledge? J Transl Med. 21(1):1–4. https://doi.org/10.1186/s12967-023-04123-5

Coskun B, Ocakoglu G, Yetemen M, Kaygisiz O (2023) Can ChatGPT, an Artificial Intelligence Language Model, Provide Accurate and High-quality Patient Information on Prostate Cancer? Urology. 180:35–58. https://doi.org/10.1016/j.urology.2023.05.040

Sorin V, Glicksberg BS, Artsi Y, Barash Y, Konen E, Nadkarni GN, Klang E (2024) Utilizing large language models in breast cancer management: systematic review. J Cancer Res Clin Oncol. 150(3):140. https://doi.org/10.1007/s00432-024-05678-6

Alasker A, Alsalamah S, Alshathri N, Almansour N, Alsalamah F, Alghafees M, AlKhamees M, Alsaikhan B (2024) Performance of large language models (LLMs) in providing prostate cancer information. BMC Urol. 24(1):177. https://doi.org/10.1186/s12894-024-01570-0

Figure 3. “Zero-shot” vs “Few-shot” Confusion Matrices for Management Suggestion and LI-RADS Category Classification

Downloads

Published

2025-09-05

How to Cite

Çamur, E., & Güneş, Y. C. (2025). Evaluation of the Performance of ChatGPT 4.5 in LI-RADS Categorization and Management Suggestion: Zero-shot versus Few-shot Prompting Method. European Journal of Therapeutics. https://doi.org/10.58600/eurjther2699

Issue

Section

Original Articles

Categories