Better together: what can state-of-the-art ML models add to thyroid nodule diagnostics?


TATAR O. C., ÇUBUKCU A., CANTÜRK N. Z.

Updates in Surgery, 2026 (SCI-Expanded, Scopus) identifier identifier identifier

  • Yayın Türü: Makale / Tam Makale
  • Basım Tarihi: 2026
  • Doi Numarası: 10.1007/s13304-026-02600-2
  • Dergi Adı: Updates in Surgery
  • Derginin Tarandığı İndeksler: Science Citation Index Expanded (SCI-EXPANDED), Scopus, MEDLINE
  • Anahtar Kelimeler: Bethesda system, Machine learning, Malignancy prediction, Thyroid nodules, TI-RADS
  • Kocaeli Üniversitesi Adresli: Evet

Özet

Thyroid nodules are common, but most are benign. Preoperative stratification is critical, especially for cytologically indeterminate cases (Bethesda III–IV), which often lead to unnecessary surgery. While ACR TI-RADS ultrasound scoring and Bethesda cytology independently aid risk assessment, each has limited diagnostic power alone. Machine learning (ML) offers a potential solution by integrating multi-dimensional inputs. To develop and validate ML models that integrate ACR TI-RADS and Bethesda cytology categories for improved prediction of thyroid nodule malignancy. We retrospectively analyzed 384 adult patients undergoing thyroid surgery with complete preoperative data on ACR TI-RADS category, Bethesda cytology, nodule size, age, and sex. The final histopathological diagnosis served as the gold standard. Five ML models were trained using nested cross-validation; LightGBM demonstrated highest performance. Feature-set ablations and a prespecified subgroup analysis for Bethesda III/IV were performed. SHAP analysis assessed model interpretability. The full LightGBM model achieved excellent discrimination (AUC = 0.96), outperforming models using only Bethesda (AUC = 0.91), TI-RADS (AUC = 0.78), or both combined (AUC = 0.93). In Bethesda III/IV cases (n = 164), the full model achieved AUC = 0.81, versus 0.53 (Bethesda), 0.68 (TI-RADS), and 0.66 (Bethesda + TI-RADS). Integrating TI-RADS, Bethesda, and clinical covariates via machine learning substantially improves malignancy prediction, especially in indeterminate cases. The model is interpretable, accurate, and holds promise for reducing unnecessary surgeries. Prospective multicenter validation is warranted before clinical implementation.