Cumhuriyet Dental Journal, cilt.28, sa.2, ss.141-156, 2025 (Scopus)
Objectives: The aim of this study was to evaluate the effectiveness of two different versions of Chat-GPT, one of the large language models (LLMs), in the diagnosis and interpretation of cone beam computed tomography (CBCT) and histopathology reports. Materials and Methods: In this study, Chat-GPT 3.5 and Chat-GPT 4 were tasked with generating preliminary diagnoses and differential diagnoses based on the findings from ten CBCT reports and ten histopathology reports. Additionally, both versions were asked to simplify these reports to a level understandable by patients. Dentomaxillofacial radiologists and pathologists, with varying levels of expertise, evaluated the responses of the LLMs and the performance of Chat-GPT 3.5 and Chat-GPT 4 in these tasks was subsequently compared based on these expert assessments. Results: A comparison of diagnostic performance for radiology reports showed that Chat-GPT 4 was statistically superior to Chat-GPT 3.5 (p < 0.001), while no significant difference was observed between the two models in terms of report simplification scores (P>0.05). In contrast, when evaluating histopathology reports, Chat-GPT 4 performed significantly better than Chat-GPT 3.5 in terms of both diagnostic accuracy and report simplification (p < 0.05). Conclusions: The results demonstrated that Chat-GPT 4 achieved superior performance in the interpretation and evaluation of CBCT reports by LLMs. The strong performance of this latest version highlights the potential for LLMs to become valuable tools in the reporting processes of radiology and histopathology, as well as in numerous other fields, as advancements in technology continue to improve their capabilities.