TB5I: A Multimodal Turkish Interview Dataset and AI-Based Big Five Personality Trait Prediction


Sarikaya B., KÜÇÜKMANİSA A.

IEEE Access, cilt.14, ss.38038-38056, 2026 (SCI-Expanded, Scopus) identifier identifier

  • Yayın Türü: Makale / Tam Makale
  • Cilt numarası: 14
  • Basım Tarihi: 2026
  • Doi Numarası: 10.1109/access.2026.3672037
  • Dergi Adı: IEEE Access
  • Derginin Tarandığı İndeksler: Science Citation Index Expanded (SCI-EXPANDED), Scopus, Compendex, INSPEC, Directory of Open Access Journals
  • Sayfa Sayıları: ss.38038-38056
  • Anahtar Kelimeler: automatic personality detection, big five, dataset, Language-based personality detection
  • Kocaeli Üniversitesi Adresli: Evet

Özet

Personality detection has gained increasing attention in recent years due to its potential impact on fields such as psychology, human–computer interaction, recruitment, and personalized services. However, research progress has been limited by the lack of high-quality datasets. To address this gap, we introduce TB5I, the first multimodal Turkish Big-Five interview dataset, comprising synchronized video, audio, and text recordings annotated with Big Five personality scores and enriched with socio-demographic attributes. In this study, we focus on text-based personality prediction and evaluate the predictive capability of TB5I using five traditional regression models (Random Forest, Lasso, Ridge, ElasticNet, and Decision Tree) and a BERT-based regression architecture trained in both multi-trait and single-trait configurations. The results demonstrate that BERT-based models consistently outperform traditional approaches by effectively capturing contextual information, achieving an RMSE as low as 2.67 when minimal preprocessing is applied, as aggressive morphological normalization may remove semantic and contextual cues that are critical for transformer-based language models such as BERT. Moreover, the findings show that text-based regression models can achieve low prediction error for several personality traits from Turkish interview transcripts, highlighting TB5I as a valuable resource for advancing personality computing research.