Evaluation of the Readability, Understandability, and Accuracy of Artificial Intelligence Chatbots  in Terms of Biostatistics Literacy

İlkay Doğan; Pınar Günel; İhsan Berk; Buket İpek Berk

doi:10.58600/eurjther2569

Authors

İlkay Doğan Department of Biostatistics, Faculty of Medicine, Gaziantep University, Gaziantep, Türkiye https://orcid.org/0000-0001-7552-6478
Pınar Günel Department of Biostatistics, Faculty of Medicine, SANKO University, Gaziantep, Türkiye https://orcid.org/0000-0003-3768-2351
İhsan Berk Department of Biostatistics, Faculty of Medicine, SANKO University, Gaziantep, Türkiye https://orcid.org/0000-0002-4008-2480
Buket İpek Berk Department of Biostatistics, Graduate Education Institute, SANKO University, Gaziantep, Türkiye https://orcid.org/0000-0003-4250-7427

DOI:

https://doi.org/10.58600/eurjther2569

Keywords:

Artificial intelligence, Chatbots, Biostatistics Literacy, Readability, Understandability, Accuracy

Abstract

Objective: Chatbots have been frequently used in many different areas in recent years, such as diagnosis and imaging, treatment, patient follow-up and support, health promotion, customer service, sales, marketing, information and technical support. The aim of this study is to evaluate the readability, comprehensibility, and accuracy of queries made by researchers in the field of health through artificial intelligence chatbots in biostatistics.

Methods: A total of 10 questions from the topics frequently asked by researchers in the field of health in basic biostatistics were determined by 4 experts. The determined questions were addressed to the artificial intelligence chatbots by one of the experts and the answers were recorded. In this study, free versions of most widely preferred ChatGPT4, Gemini and Copilot chatbots were used. The recorded answers were independently evaluated as “Correct”, “Partially correct” and “Wrong” by three experts who blinded to which chatbot the answers belonged to. Then, these experts came together and examined the answers together and made the final evaluation by reaching a consensus on the levels of accuracy. The readability and understandability of the answers were evaluated with the Ateşman readability formula, Sönmez formula, Çetinkaya-Uzun readability formula and Bezirci-Yılmaz readability formulas.

Results: According to the answers given to the questions addressed to the artificial intelligence chatbots, it was determined that the answers were at the “difficult” level according to the Ateşman readability formula, “insufficient reading level” according to the Çetinkaya-Uzun readability formula, and “academic level” according to the Bezirci-Yılmaz readability formula. On the other hand, the Sönmez formula gave the result of “the text is understandable” for all chatbots. It was determined that there was no statistically significant difference (p=0.819) in terms of accuracy rates of the answers given by the artificial intelligence chatbots to the questions.

Conclusion: It was determined that although the chatbots tended to provide accurate information, the answers given were not readable, understandable and their accuracy levels were not high.

Metrics

Metrics Loading ...

References

Turing AM (1950) Computing Machinery and Intelligence. Mind 59(236):433–460. https://doi.org/10.1093/mind/LIX.236.433

McCarthy J, Minsky ML, Rochester N, Shannon CE (2006) A Proposal for the Dartmouth Summer Research Project on Artificial Intelligence. AI Mag. 27(4):12-14. https://doi.org/10.1609/aimag.v27i4.1904

Samuel AL (1959) Some studies in machine learning using the game of checkers. IBM J Res Dev. 3(3):210-229. https://doi: 10.1147/rd.33.0210

Pirim AGH (2006) Artificial intelligence [Yapay Zeka]. Yaşar University E-Journal 1(1):81-93. ([In Turkish])

Ozturk K, Sahin ME (2018) An overview of artificial neural networks and artificial intelligence [Yapay Sinir Ağları ve Yapay Zekâ’ya Genel Bir Bakış]. Takvim-i Vekayi 6(2):25-36. ([In Turkish])

Lillicrap D, Morrissey JH (2023) Artificial intelligence, science, and learning. J Thromb Haemost. 21(4):709. https://doi.org/ 10.1016/j.jtha.2023.01.026

Vedantham S, Shazeeb MS, Chiang A, Vijayaraghavan GR (2023) Artificial Intelligence in Breast X-Ray Imaging. Semin Ultrasound CT MR. 44(1):2–7. https://doi.org/10.1053/j.sult.2022.12.002

Yoon C, Jones K, Goker B, Sterman J, Mardakhaev E (2025) Artificial Intelligence Applications in MR Imaging of the Hip. Magn Reson Imaging Clin N Am. 33(1):9–18. https://doi.org/10.1016/j.mric.2024.05.003

Huang S, Yang J, Shen N, Xu Q, Zhao Q (2023) Artificial intelligence in lung cancer diagnosis and prognosis: Current application and future perspective. Semin Cancer Biol. 89:30–37. https://doi.org/10.1016/j.semcancer.2023.01.006

Lotter W, Hassett MJ, Schultz N, Kehl KL, Van Allen EM, Cerami E (2024) Artificial Intelligence in Oncology: Current Landscape, Challenges, and Future Directions. Cancer Discov. 14(5):711–726. https://doi.org/10.1158/2159-8290.CD-23-1199

Itchhaporia D (2022) Artificial intelligence in cardiology. Trends Cardiovasc Med. 32(1):34–41. https://doi.org/10.1016/j.tcm.2020.11.007

Miller RJH (2023) Artificial Intelligence in Nuclear Cardiology. Cardiol Clin. 41(2):151–161. https://doi.org/10.1016/j.ccl.2023.01.004

Jacobson BC (2023) The Use of Artificial Intelligence in Gastroenterology: A Glimpse Into the Present. Clin Transl Gastroenterol. 14(10):e00653. https://doi.org/10.14309/ctg.0000000000000653

Ahmed T, Rabinowitz LG, Rodman A, Berzin TM (2024) Generative Artificial Intelligence Tools in Gastroenterology Training. Clin Gastroenterol Hepatol. 22(10):1975–1978. https://doi.org/10.1016/j.cgh.2024.05.050

Srivastava O, Tennant M, Grewal P, Rubin U, Seamone M (2023) Artificial intelligence and machine learning in ophthalmology: A review. Indian J Ophthalmol. 71(1):11–17. https://doi.org/10.4103/ijo.IJO_1569_22

Honavar SG (2022) Artificial intelligence in ophthalmology - Machines think!. Indian J Ophthalmol. 70(4):1075–1079. https://doi.org/10.4103/ijo.ijo_644_22

Scheer JK, Ames CP (2024) Artificial Intelligence in Spine Surgery. Neurosurg Clin N Am. 35(2):253–262. https://doi.org/10.1016/j.nec.2023.11.001

Benzakour A, Altsitzioglou P, Lemée JM, Ahmad A, Mavrogenis AF, Benzakour T (2023) Artificial intelligence in spine surgery. Int Orthop. 47(2):457–465. https://doi.org/10.1007/s00264-022-05517-8

Eric A, Ozgur EG, Asker OF, Bekiroglu N (2024) ChatGPT and its Use in Health Sciences. CBU-SBED 11(1):176-182. https://doi.org/10.34087/cbusbed.1262811

Rokhshad R, Zhang P, Mohammad-Rahimi H, Pitchika V, Entezari N, Schwendicke F (2024) Accuracy and consistency of chatbots versus clinicians for answering pediatric dentistry questions: A pilot study. J Dent. 144:104938. https://doi.org/ 10.1016/j.jdent.2024.104938

Issaiy M, Zarei D, Saghazadeh A (2023) Artificial Intelligence and Acute Appendicitis: A Systematic Review of Diagnostic and Prognostic Models. World J Emerg Surg. 18(1):59. https://doi.org/10.1186/s13017-023-00527-2

Gore JC (2020) Artificial intelligence in medical imaging. Magn Reson Imaging. 68:A1–A4. https://doi.org/10.1016/j.mri.2019.12.006

Kim ES, Eun SJ, Kim KH (2023) Artificial Intelligence-Based Patient Monitoring System for Medical Support. Int Neurourol J. 27(4):280–286. https://doi.org/10.5213/inj.2346338.169

Smith A, Arena R, Bacon SL, Faghy MA, Grazzi G, Raisi A, Vermeesch AL, Ong'wen M, Popovic D, Pronk NP (2024) Recommendations on the use of artificial intelligence in health promotion. Prog Cardiovasc Dis. 87:37-43. https://doi.org/10.1016/j.pcad.2024.10.003

Zhao T, Cui J, Hu J, Dai Y, Zhou Y (2022) Is Artificial Intelligence Customer Service Satisfactory? Insights Based on Microblog Data and User Interviews. Cyberpsychol Behav Soc Netw. 25(2):110–117. https://doi.org/10.1089/cyber.2021.0155

Bawack RE, Wamba SF, Carillo KDA, Akter S (2022) Artificial intelligence in E-Commerce: a bibliometric study and literature review. Electron Mark. 32(1):297–338. https://doi.org/10.1007/s12525-022-00537-z

Mohammadi SS, Khatri A, Jain T, Thng ZX, Yoo WS, Yavari N, Bazojoo V, Mobasserian A, Akhavanrezayat A, Tuong Than NT, Elaraby O, Ganbold B, El Feky D, Nguyen BT, Yasar C, Gupta A, Hung JH, Nguyen QD (2024) Evaluation of the Appropriateness and Readability of ChatGPT-4 Responses to Patient Queries on Uveitis. Ophthalmol Sci. 5(1):100594. https://doi.org/10.1016/j.xops.2024.100594

Hancı V, Ergün B, Gül Ş, Uzun Ö, Erdemir İ, Hancı FB (2024) Assessment of readability, reliability, and quality of ChatGPT®, BARD®, Gemini®, Copilot®, Perplexity® responses on palliative care. Medicine 103(33):e39305. https://doi.org/10.1097/MD.0000000000039305

Golan R, Ripps SJ, Reddy R, Loloi J, Bernstein AP, Connelly ZM, Golan NS, Ramasamy R (2023) ChatGPT's Ability to Assess Quality and Readability of Online Medical Information: Evidence From a Cross-Sectional Study. Cureus 15(7):e42214. https://doi.org/10.7759/cureus.42214

Gibson D, Jackson S, Shanmugasundaram R, Seth I, Siu A, Ahmadi N, Kam J, Mehan N, Thanigasalam R, Jeffery N, Patel MI, Leslie S (2024) Evaluating the Efficacy of ChatGPT as a Patient Education Tool in Prostate Cancer: Multimetric Assessment. J Med Internet Res. 26:e55939. https://doi.org/10.2196/55939

Hershenhouse JS, Mokhtar D, Eppler MB, Rodler S, Storino Ramacciotti L, Ganjavi C, Hom B, Davis R J, Tran J, Russo GI, Cocci A, Abreu A, Gill I, Desai M, Cacciamani GE (2024) Accuracy, readability, and understandability of large language models for prostate cancer information to the public. Prostate Cancer Prostatic Dis. https://doi.org/10.1038/s41391-024-00826-y

Onder C, Koc G, Gokbulut P, Taskaldiran I, Kuskonmaz S (2024) Evaluation of the reliability and readability of ChatGPT-4 responses regarding hypothyroidism during pregnancy. Sci Rep. 14:243. https://doi.org/10.1038/s41598-023-50884-w

Kalyoncu MR, Memiş M (2024) Comparison of Readability Formulas Created and Consistency Query for Turkish [Türkçe İçin Oluşturulmuş Okunabilirlik Formüllerinin Karşılaştırılması ve Tutarlılık Sorgusu]. Journal of Mother Tongue Education 12:417-436. ([In Turkish]) https://doi.org/10.16916/aded.1434650

Çetinkaya G (2010) Identification and classification of readability levels of Turkish texts (Unpublished Doctoral Thesis)[Türkçe Metinlerin Okunabilirlik Düzeylerinin Tanimlanmasi ve Siniflandirilmasi]. Ankara University, Ankara. ([In Turkish])

Bezirci B, Yılmaz AE (2010) A software library for measuring the readability of texts and a new readability criterion for Turkish [Metinlerin Okunabilirliğinin Ölçülmesi Üzerine Bir Yazilim Kütüphanesi Ve Türkçe İçin Yeni Bir Okunabilirlik Ölçütü]. DEUFMD. 12(3):49-62. ([In Turkish])

Doğan İ, Doğan N (2014) Adım adım çözümlü parametrik olmayan istatistiksel yöntemler, 1st edn. Detay Yayıncılık, Ankara

Guven Y, Ozdemir OT, Kavan MY (2024) Performance of Artificial Intelligence Chatbots in Responding to Patient Queries Related to Traumatic Dental Injuries: A Comparative Study. Dent Traumatol. https://doi.org/10.1111/edt.13020

Gajjar AA, Kumar RP, Paliwoda ED, Kuo CC, Adida S, Legarreta AD, Deng H, Anand SK, Hamilton DK, Buell TJ, Agarwal N, Gerszten PC, Hudson JS (2024) Usefulness and Accuracy of Artificial Intelligence Chatbot Responses to Patient Questions for Neurosurgical Procedures. Neurosurgery. https://doi.org/10.1227/neu.0000000000002856

Ayo-Ajibola O, Davis RJ, Lin ME, Vukkadala N, O'Dell K, Swanson MS, Johns MM 3rd, Shuman EA (2024) TrachGPT: Appraisal of tracheostomy care recommendations from an artificial intelligent Chatbot. Laryngoscope Investig Otolaryngol. 9(4):e1300. https://doi.org/10.1002/lio2.1300

Gondode P, Duggal S, Garg N, Sethupathy S, Asai O, Lohakare P (2024) Comparing patient education tools for chronic pain medications: Artificial intelligence chatbot versus traditional patient information leaflets. Indian J Anaesth. 68(7):631–636. https://doi.org/10.4103/ija.ija_204_24

Steimetz E, Minkowitz J, Gabutan EC, Ngichabe J, Attia H, Hershkop M, Ozay F, Hanna M G, Gupta R (2024) Use of Artificial Intelligence Chatbots in Interpretation of Pathology Reports. JAMA Netw Open. 7(5):e2412767. https://doi.org/10.1001/jamanetworkopen.2024.12767

Carlson JA, Cheng RZ, Lange A, Nagalakshmi N, Rabets J, Shah T, Sindhwani P (2024) Accuracy and Readability of Artificial Intelligence Chatbot Responses to Vasectomy-Related Questions: Public Beware. Cureus 16(8):e67996. https://doi.org/10.7759/cureus.67996

Pradhan F, Fiedler A, Samson K, Olivera-Martinez M, Manatsathit W, Peeraphatdit T (2024) Artificial intelligence compared with human-derived patient educational materials on cirrhosis. Hepatol Commun. 8(3):e0367. https://doi.org/10.1097/HC9.0000000000000367