Comparative Analysis of Large Language Models in Simplifying Turkish Ultrasound Reports to Enhance Patient Understanding

Yasin Celal Güneş; Turay Cesur; Eren Çamur

doi:10.58600/eurjther2225

Authors

Yasin Celal Güneş Department of Radiology, Kırıkkale Yuksek Ihtisas Hospital, Kırıkkale, Türkiye https://orcid.org/0000-0001-7631-854X
Turay Cesur Department of Radiology, Mamak State Hospital, Ankara, Türkiye https://orcid.org/0000-0002-2726-8045
Eren Çamur Department of Radiology, Ankara 29 Mayis State Hospital, Ankara, Türkiye https://orcid.org/0000-0002-8774-5800

DOI:

https://doi.org/10.58600/eurjther2225

Keywords:

large language models, chatGPT, claude 3 opus, ultrasound, simplify

Abstract

Objective: To evaluate and compare the abilities of Language Models (LLMs) in simplifying Turkish ultrasound (US) findings for patients.

Methods: We assessed the simplification performance of four LLMs: ChatGPT 4, Gemini 1.5 Pro, Claude 3 Opus, and Perplexity, using fifty fictional Turkish US findings. Comparison was based on Ateşman’s Readability Index and word count. Three radiologists rated medical accuracy, consistency, and comprehensibility on a Likert scale from 1 to 5. Statistical tests (Friedman, Wilcoxon, and Spearman correlation) examined differences in LLMs' performance.

Results: Gemini 1.5 Pro, ChatGPT-4, and Claude 3 Opus received high Likert scores for medical accuracy, consistency, and comprehensibility (mean: 4.7–4.8). Perplexity scored significantly lower (mean: 4.1, p<0.001). Gemini 1.5 Pro achieved the highest readability score (mean: 61.16), followed by ChatGPT-4 (mean: 58.94) and Claude 3 Opus (mean: 51.16). Perplexity had the lowest readability score (mean: 47.01). Gemini 1.5 Pro and ChatGPT-4 used significantly more words compared to Claude 3 Opus and Perplexity (p<0.001). Linear correlation analysis revealed a positive correlation between word count of fictional US findings and responses generated by Gemini 1.5 Pro (correlation coefficient = 0.38, p<0.05) and ChatGPT-4 (correlation coefficient = 0.43, p<0.001).

Conclusion: This study highlights strong potential of LLMs in simplifying Turkish US findings, improving accessibility and clarity for patients. Gemini 1.5 Pro, ChatGPT-4, and Claude 3 Opus performed well, highlighting their effectiveness in healthcare communication. Further research is required to fully understand the integration of LLMs into clinical practice and their influence on patient comprehension and decision-making.

Metrics

Metrics Loading ...

References

Aydin Ö, Karaarslan E (2023) Is ChatGPT Leading Generative AI? What is Beyond Expectations? Academic Platform Journal of Engineering and Smart Systems 11:118-134. https://doi.org/10.21541/apjess.1293702

Lee H (2023) The rise of ChatGPT: Exploring its potential in medical education. Anatomical sciences education. https://doi.org/10.1002/ase.2270

Kuang Y-R, Zou M-X, Niu H-Q, Zheng B-Y, Zhang T-L, Zheng B-W (2023) ChatGPT encounters multiple opportunities and challenges in neurosurgery. International Journal of Surgery 109:2886-2891. https://doi.org/10.1097/JS9.0000000000000571

Griewing S, Gremke N, Wagner U, Lingenfelder M, Kuhn S, Boekhoff J (2023) Challenging ChatGPT 3.5 in senology—an assessment of concordance with breast cancer tumor board decision making. Journal of Personalized Medicine 13:1502. https://doi.org/10.3390/jpm13101502

Suthar PP, Kounsal A, Chhetri L, Saini D, Dua SG (2023) Artificial intelligence (AI) in radiology: a deep dive into ChatGPT 4.0's accuracy with the American Journal of Neuroradiology's (AJNR)" Case of the Month". Cureus 15. https://doi.org/10.7759/cureus.43958

Jeblick K, Schachtner B, Dexl J, Mittermeier A, Stüber AT, Topalis J, Weber T, Wesp P, Sabel BO, Ricke J (2023) ChatGPT makes medicine easy to swallow: an exploratory case study on simplified radiology reports. Eur Radiol:1-9. https://doi.org/10.1007/s00330-023

Scheschenja M, Viniol S, Bastian MB, Wessendorf J, König AM, Mahnken AH (2024) Feasibility of GPT-3 and GPT-4 for in-depth patient education prior to interventional radiological procedures: a comparative analysis. Cardiovasc Intervent Radiol 47:245-250. https://doi.org/10.1007/s00270-023-03563-2

Elkassem AA, Smith AD (2023) Potential use cases for ChatGPT in radiology reporting. American Journal of Roentgenology 221:373-376. https://doi.org/10.2214/AJR.23.29198

Chan V, Perlas A (2011) Basics of ultrasound imaging. Atlas of ultrasound-guided procedures in interventional pain management:13-19. https://doi.org/10.1007/978-1-4419-1681-5_2

Barratt A, Copp T, McCaffery K, Moynihan R, Nickel B (2017) Words do matter: a systematic review on how different terminology for the same condition influences management preferences. https://doi.org/10.1136/bmjopen-2016-014129

Johnson AJ, Frankel RM, Williams LS, Glover S, Easterling D (2010) Patient access to radiology reports: what do physicians think? Journal of the American College of Radiology 7:281-289. https://doi.org/10.1016/j.jacr.2009.10.011

Amin K, Khosla P, Doshi R, Chheang S, Forman HP (2023) Focus: Big Data: Artificial Intelligence to Improve Patient Understanding of Radiology Reports. The Yale Journal of Biology and Medicine 96:407. https://doi.org/10.59249/NKOY5498

Ateşman E (1997) Türkçede okunabilirliğin ölçülmesi. Dil Dergisi 58

Bossuyt PM, Reitsma JB, Bruns DE, Gatsonis CA, Glasziou PP, Irwig L, Lijmer JG, Moher D, Rennie D, De Vet HC (2015) STARD 2015: an updated list of essential items for reporting diagnostic accuracy studies. Radiology 277:826-832. https://doi.org/10.1136/bmj.h5527

Khan R, Gupta N, Sinhababu A, Chakravarty R (2023) Impact of Conversational and Generative AI Systems on Libraries: A Use Case Large Language Model (LLM). Science & Technology Libraries:1-15. https://doi.org/10.1080/0194262x.2023.2254814

Doshi R, Amin KS, Khosla P, Bajaj SS, Chheang S, Forman HP (2024) Quantitative evaluation of large language models to streamline radiology report impressions: a multimodal retrospective analysis. Radiology 310:e231593. https://doi.org/10.1148/radiol.231593

Haver HL, Gupta AK, Ambinder EB, Bahl M, Oluyemi ET, Jeudy J, Yi PH (2024) Evaluating the Use of ChatGPT to Accurately Simplify Patient-centered Information about Breast Cancer Prevention and Screening. Radiology: Imaging Cancer 6:e230086. https://doi.org/10.1148/rycan.230086

Chung EM, Zhang SC, Nguyen AT, Atkins KM, Sandler HM, Kamrava M (2023) Feasibility and acceptability of ChatGPT generated radiology report summaries for cancer patients. Digital Health 9:20552076231221620. https://doi.org/10.1177/20552076231221620

Li H, Moon JT, Iyer D, Balthazar P, Krupinski EA, Bercu ZL, Newsome JM, Banerjee I, Gichoya JW, Trivedi HM (2023) Decoding radiology reports: potential application of OpenAI ChatGPT to enhance patient understanding of diagnostic reports. Clin Imaging 101:137-141. https://doi.org/10.1016/j.clinimag.2023.06.008

Lyu Q, Tan J, Zapadka ME, Ponnatapura J, Niu C, Myers KJ, Wang G, Whitlow CT (2023) Translating radiology reports into plain language using ChatGPT and GPT-4 with prompt learning: results, limitations, and potential. Visual Computing for Industry, Biomedicine, and Art 6:9. https://doi.org/10.1186/s42492-023-00136-5

Tepe M, Emekli E (2024) Decoding medical jargon: The use of AI language models (ChatGPT-4, BARD, microsoft copilot) in radiology reports. Patient Educ Couns:108307. https://doi.org/10.1016/j.pec.2024.108307