Evaluation of the Readability, Understandability, and Accuracy of Artificial Intelligence Chatbots in Terms of Biostatistics Literacy
Abstract views: 90 / PDF downloads: 19
DOI:
https://doi.org/10.58600/eurjther2569Keywords:
Artificial intelligence, Chatbots, Biostatistics Literacy, Readability, Understandability, AccuracyAbstract
Objective: Chatbots have been frequently used in many different areas in recent years, such as diagnosis and imaging, treatment, patient follow-up and support, health promotion, customer service, sales, marketing, information and technical support. The aim of this study is to evaluate the readability, comprehensibility, and accuracy of queries made by researchers in the field of health through artificial intelligence chatbots in biostatistics.
Methods: A total of 10 questions from the topics frequently asked by researchers in the field of health in basic biostatistics were determined by 4 experts. The determined questions were addressed to the artificial intelligence chatbots by one of the experts and the answers were recorded. In this study, free versions of most widely preferred ChatGPT4, Gemini and Copilot chatbots were used. The recorded answers were independently evaluated as “Correct”, “Partially correct” and “Wrong” by three experts who blinded to which chatbot the answers belonged to. Then, these experts came together and examined the answers together and made the final evaluation by reaching a consensus on the levels of accuracy. The readability and understandability of the answers were evaluated with the Ateşman readability formula, Sönmez formula, Çetinkaya-Uzun readability formula and Bezirci-Yılmaz readability formulas.
Results: According to the answers given to the questions addressed to the artificial intelligence chatbots, it was determined that the answers were at the “difficult” level according to the Ateşman readability formula, “insufficient reading level” according to the Çetinkaya-Uzun readability formula, and “academic level” according to the Bezirci-Yılmaz readability formula. On the other hand, the Sönmez formula gave the result of “the text is understandable” for all chatbots. It was determined that there was no statistically significant difference (p=0.819) in terms of accuracy rates of the answers given by the artificial intelligence chatbots to the questions.
Conclusion: It was determined that although the chatbots tended to provide accurate information, the answers given were not readable, understandable and their accuracy levels were not high.
Metrics
References
Turing AM (1950) Computing Machinery and Intelligence. Mind 59(236):433–460. https://doi.org/10.1093/mind/LIX.236.433
McCarthy J, Minsky ML, Rochester N, Shannon CE (2006) A Proposal for the Dartmouth Summer Research Project on Artificial Intelligence. AI Mag. 27(4):12-14. https://doi.org/10.1609/aimag.v27i4.1904
Samuel AL (1959) Some studies in machine learning using the game of checkers. IBM J Res Dev. 3(3):210-229. https://doi: 10.1147/rd.33.0210
Pirim AGH (2006) Artificial intelligence [Yapay Zeka]. Yaşar University E-Journal 1(1):81-93. ([In Turkish])
Ozturk K, Sahin ME (2018) An overview of artificial neural networks and artificial intelligence [Yapay Sinir Ağları ve Yapay Zekâ’ya Genel Bir Bakış]. Takvim-i Vekayi 6(2):25-36. ([In Turkish])
Lillicrap D, Morrissey JH (2023) Artificial intelligence, science, and learning. J Thromb Haemost. 21(4):709. https://doi.org/ 10.1016/j.jtha.2023.01.026
Vedantham S, Shazeeb MS, Chiang A, Vijayaraghavan GR (2023) Artificial Intelligence in Breast X-Ray Imaging. Semin Ultrasound CT MR. 44(1):2–7. https://doi.org/10.1053/j.sult.2022.12.002
Yoon C, Jones K, Goker B, Sterman J, Mardakhaev E (2025) Artificial Intelligence Applications in MR Imaging of the Hip. Magn Reson Imaging Clin N Am. 33(1):9–18. https://doi.org/10.1016/j.mric.2024.05.003
Huang S, Yang J, Shen N, Xu Q, Zhao Q (2023) Artificial intelligence in lung cancer diagnosis and prognosis: Current application and future perspective. Semin Cancer Biol. 89:30–37. https://doi.org/10.1016/j.semcancer.2023.01.006
Lotter W, Hassett MJ, Schultz N, Kehl KL, Van Allen EM, Cerami E (2024) Artificial Intelligence in Oncology: Current Landscape, Challenges, and Future Directions. Cancer Discov. 14(5):711–726. https://doi.org/10.1158/2159-8290.CD-23-1199
Itchhaporia D (2022) Artificial intelligence in cardiology. Trends Cardiovasc Med. 32(1):34–41. https://doi.org/10.1016/j.tcm.2020.11.007
Miller RJH (2023) Artificial Intelligence in Nuclear Cardiology. Cardiol Clin. 41(2):151–161. https://doi.org/10.1016/j.ccl.2023.01.004
Jacobson BC (2023) The Use of Artificial Intelligence in Gastroenterology: A Glimpse Into the Present. Clin Transl Gastroenterol. 14(10):e00653. https://doi.org/10.14309/ctg.0000000000000653
Ahmed T, Rabinowitz LG, Rodman A, Berzin TM (2024) Generative Artificial Intelligence Tools in Gastroenterology Training. Clin Gastroenterol Hepatol. 22(10):1975–1978. https://doi.org/10.1016/j.cgh.2024.05.050
Srivastava O, Tennant M, Grewal P, Rubin U, Seamone M (2023) Artificial intelligence and machine learning in ophthalmology: A review. Indian J Ophthalmol. 71(1):11–17. https://doi.org/10.4103/ijo.IJO_1569_22
Honavar SG (2022) Artificial intelligence in ophthalmology - Machines think!. Indian J Ophthalmol. 70(4):1075–1079. https://doi.org/10.4103/ijo.ijo_644_22
Scheer JK, Ames CP (2024) Artificial Intelligence in Spine Surgery. Neurosurg Clin N Am. 35(2):253–262. https://doi.org/10.1016/j.nec.2023.11.001
Benzakour A, Altsitzioglou P, Lemée JM, Ahmad A, Mavrogenis AF, Benzakour T (2023) Artificial intelligence in spine surgery. Int Orthop. 47(2):457–465. https://doi.org/10.1007/s00264-022-05517-8
Eric A, Ozgur EG, Asker OF, Bekiroglu N (2024) ChatGPT and its Use in Health Sciences. CBU-SBED 11(1):176-182. https://doi.org/10.34087/cbusbed.1262811
Rokhshad R, Zhang P, Mohammad-Rahimi H, Pitchika V, Entezari N, Schwendicke F (2024) Accuracy and consistency of chatbots versus clinicians for answering pediatric dentistry questions: A pilot study. J Dent. 144:104938. https://doi.org/ 10.1016/j.jdent.2024.104938
Issaiy M, Zarei D, Saghazadeh A (2023) Artificial Intelligence and Acute Appendicitis: A Systematic Review of Diagnostic and Prognostic Models. World J Emerg Surg. 18(1):59. https://doi.org/10.1186/s13017-023-00527-2
Gore JC (2020) Artificial intelligence in medical imaging. Magn Reson Imaging. 68:A1–A4. https://doi.org/10.1016/j.mri.2019.12.006
Kim ES, Eun SJ, Kim KH (2023) Artificial Intelligence-Based Patient Monitoring System for Medical Support. Int Neurourol J. 27(4):280–286. https://doi.org/10.5213/inj.2346338.169
Smith A, Arena R, Bacon SL, Faghy MA, Grazzi G, Raisi A, Vermeesch AL, Ong'wen M, Popovic D, Pronk NP (2024) Recommendations on the use of artificial intelligence in health promotion. Prog Cardiovasc Dis. 87:37-43. https://doi.org/10.1016/j.pcad.2024.10.003
Zhao T, Cui J, Hu J, Dai Y, Zhou Y (2022) Is Artificial Intelligence Customer Service Satisfactory? Insights Based on Microblog Data and User Interviews. Cyberpsychol Behav Soc Netw. 25(2):110–117. https://doi.org/10.1089/cyber.2021.0155
Bawack RE, Wamba SF, Carillo KDA, Akter S (2022) Artificial intelligence in E-Commerce: a bibliometric study and literature review. Electron Mark. 32(1):297–338. https://doi.org/10.1007/s12525-022-00537-z
Mohammadi SS, Khatri A, Jain T, Thng ZX, Yoo WS, Yavari N, Bazojoo V, Mobasserian A, Akhavanrezayat A, Tuong Than NT, Elaraby O, Ganbold B, El Feky D, Nguyen BT, Yasar C, Gupta A, Hung JH, Nguyen QD (2024) Evaluation of the Appropriateness and Readability of ChatGPT-4 Responses to Patient Queries on Uveitis. Ophthalmol Sci. 5(1):100594. https://doi.org/10.1016/j.xops.2024.100594
Hancı V, Ergün B, Gül Ş, Uzun Ö, Erdemir İ, Hancı FB (2024) Assessment of readability, reliability, and quality of ChatGPT®, BARD®, Gemini®, Copilot®, Perplexity® responses on palliative care. Medicine 103(33):e39305. https://doi.org/10.1097/MD.0000000000039305
Golan R, Ripps SJ, Reddy R, Loloi J, Bernstein AP, Connelly ZM, Golan NS, Ramasamy R (2023) ChatGPT's Ability to Assess Quality and Readability of Online Medical Information: Evidence From a Cross-Sectional Study. Cureus 15(7):e42214. https://doi.org/10.7759/cureus.42214
Gibson D, Jackson S, Shanmugasundaram R, Seth I, Siu A, Ahmadi N, Kam J, Mehan N, Thanigasalam R, Jeffery N, Patel MI, Leslie S (2024) Evaluating the Efficacy of ChatGPT as a Patient Education Tool in Prostate Cancer: Multimetric Assessment. J Med Internet Res. 26:e55939. https://doi.org/10.2196/55939
Hershenhouse JS, Mokhtar D, Eppler MB, Rodler S, Storino Ramacciotti L, Ganjavi C, Hom B, Davis R J, Tran J, Russo GI, Cocci A, Abreu A, Gill I, Desai M, Cacciamani GE (2024) Accuracy, readability, and understandability of large language models for prostate cancer information to the public. Prostate Cancer Prostatic Dis. https://doi.org/10.1038/s41391-024-00826-y
Onder C, Koc G, Gokbulut P, Taskaldiran I, Kuskonmaz S (2024) Evaluation of the reliability and readability of ChatGPT-4 responses regarding hypothyroidism during pregnancy. Sci Rep. 14:243. https://doi.org/10.1038/s41598-023-50884-w
Kalyoncu MR, Memiş M (2024) Comparison of Readability Formulas Created and Consistency Query for Turkish [Türkçe İçin Oluşturulmuş Okunabilirlik Formüllerinin Karşılaştırılması ve Tutarlılık Sorgusu]. Journal of Mother Tongue Education 12:417-436. ([In Turkish]) https://doi.org/10.16916/aded.1434650
Çetinkaya G (2010) Identification and classification of readability levels of Turkish texts (Unpublished Doctoral Thesis)[Türkçe Metinlerin Okunabilirlik Düzeylerinin Tanimlanmasi ve Siniflandirilmasi]. Ankara University, Ankara. ([In Turkish])
Bezirci B, Yılmaz AE (2010) A software library for measuring the readability of texts and a new readability criterion for Turkish [Metinlerin Okunabilirliğinin Ölçülmesi Üzerine Bir Yazilim Kütüphanesi Ve Türkçe İçin Yeni Bir Okunabilirlik Ölçütü]. DEUFMD. 12(3):49-62. ([In Turkish])
Doğan İ, Doğan N (2014) Adım adım çözümlü parametrik olmayan istatistiksel yöntemler, 1st edn. Detay Yayıncılık, Ankara
Guven Y, Ozdemir OT, Kavan MY (2024) Performance of Artificial Intelligence Chatbots in Responding to Patient Queries Related to Traumatic Dental Injuries: A Comparative Study. Dent Traumatol. https://doi.org/10.1111/edt.13020
Gajjar AA, Kumar RP, Paliwoda ED, Kuo CC, Adida S, Legarreta AD, Deng H, Anand SK, Hamilton DK, Buell TJ, Agarwal N, Gerszten PC, Hudson JS (2024) Usefulness and Accuracy of Artificial Intelligence Chatbot Responses to Patient Questions for Neurosurgical Procedures. Neurosurgery. https://doi.org/10.1227/neu.0000000000002856
Ayo-Ajibola O, Davis RJ, Lin ME, Vukkadala N, O'Dell K, Swanson MS, Johns MM 3rd, Shuman EA (2024) TrachGPT: Appraisal of tracheostomy care recommendations from an artificial intelligent Chatbot. Laryngoscope Investig Otolaryngol. 9(4):e1300. https://doi.org/10.1002/lio2.1300
Gondode P, Duggal S, Garg N, Sethupathy S, Asai O, Lohakare P (2024) Comparing patient education tools for chronic pain medications: Artificial intelligence chatbot versus traditional patient information leaflets. Indian J Anaesth. 68(7):631–636. https://doi.org/10.4103/ija.ija_204_24
Steimetz E, Minkowitz J, Gabutan EC, Ngichabe J, Attia H, Hershkop M, Ozay F, Hanna M G, Gupta R (2024) Use of Artificial Intelligence Chatbots in Interpretation of Pathology Reports. JAMA Netw Open. 7(5):e2412767. https://doi.org/10.1001/jamanetworkopen.2024.12767
Carlson JA, Cheng RZ, Lange A, Nagalakshmi N, Rabets J, Shah T, Sindhwani P (2024) Accuracy and Readability of Artificial Intelligence Chatbot Responses to Vasectomy-Related Questions: Public Beware. Cureus 16(8):e67996. https://doi.org/10.7759/cureus.67996
Pradhan F, Fiedler A, Samson K, Olivera-Martinez M, Manatsathit W, Peeraphatdit T (2024) Artificial intelligence compared with human-derived patient educational materials on cirrhosis. Hepatol Commun. 8(3):e0367. https://doi.org/10.1097/HC9.0000000000000367
Downloads
Published
How to Cite
License
Copyright (c) 2024 European Journal of Therapeutics
This work is licensed under a Creative Commons Attribution-NonCommercial 4.0 International License.
The content of this journal is licensed under a Creative Commons Attribution-NonCommercial 4.0 International License.