Evaluating the Use of Large Language Models in Radiology and Histopathology Reporting: Expert-Based Assessment of Diagnostic Support and Patient-Oriented Simplification

ÇELİK, SÜMEYYE; KURAN, ALİCAN; BAYSAL, OĞUZ; SEKİ, Umut; SOLUK TEKKEŞİN, Merva; SİNANOĞLU, ENVER

doi:10.7126/cumudj.1588132

Evaluating the Use of Large Language Models in Radiology and Histopathology Reporting: Expert-Based Assessment of Diagnostic Support and Patient-Oriented Simplification

ÇELİK S., KURAN A., BAYSAL O., SEKİ U., SOLUK TEKKEŞİN M., SİNANOĞLU E. A.

Cumhuriyet Dental Journal, cilt.28, sa.2, ss.141-156, 2025 (Scopus, TRDizin)

Yayın Türü: Makale / Tam Makale
Cilt numarası: 28 Sayı: 2
Basım Tarihi: 2025
Doi Numarası: 10.7126/cumudj.1588132
Dergi Adı: Cumhuriyet Dental Journal
Derginin Tarandığı İndeksler: Scopus, Directory of Open Access Journals, TR DİZİN (ULAKBİM)
Sayfa Sayıları: ss.141-156
Anahtar Kelimeler: ChatGPT, Cone-Beam Computed Tomography, Large language models
Kocaeli Üniversitesi Adresli: Evet

Özet

Objectives: The aim of this study was to evaluate the effectiveness of two different versions of Chat-GPT, one of the large language models (LLMs), in the diagnosis and interpretation of cone beam computed tomography (CBCT) and histopathology reports. Materials and Methods: In this study, Chat-GPT 3.5 and Chat-GPT 4 were tasked with generating preliminary diagnoses and differential diagnoses based on the findings from ten CBCT reports and ten histopathology reports. Additionally, both versions were asked to simplify these reports to a level understandable by patients. Dentomaxillofacial radiologists and pathologists, with varying levels of expertise, evaluated the responses of the LLMs and the performance of Chat-GPT 3.5 and Chat-GPT 4 in these tasks was subsequently compared based on these expert assessments. Results: A comparison of diagnostic performance for radiology reports showed that Chat-GPT 4 was statistically superior to Chat-GPT 3.5 (p < 0.001), while no significant difference was observed between the two models in terms of report simplification scores (P>0.05). In contrast, when evaluating histopathology reports, Chat-GPT 4 performed significantly better than Chat-GPT 3.5 in terms of both diagnostic accuracy and report simplification (p < 0.05). Conclusions: The results demonstrated that Chat-GPT 4 achieved superior performance in the interpretation and evaluation of CBCT reports by LLMs. The strong performance of this latest version highlights the potential for LLMs to become valuable tools in the reporting processes of radiology and histopathology, as well as in numerous other fields, as advancements in technology continue to improve their capabilities.