Multilingual and Multi-Class Sentiment Classification Using Machine Learning, BERT, and GPT-4o-mini


Tosun Pataci T., GÖZ F.

7th International Congress on Human-Computer Interaction, Optimization and Robotic Applications, ICHORA 2025, Ankara, Türkiye, 23 - 24 Mayıs 2025, (Tam Metin Bildiri) identifier identifier

  • Yayın Türü: Bildiri / Tam Metin Bildiri
  • Doi Numarası: 10.1109/ichora65333.2025.11017018
  • Basıldığı Şehir: Ankara
  • Basıldığı Ülke: Türkiye
  • Anahtar Kelimeler: GPT-4o-mini, multilingual bert, sentence embeddings, sentiment classification
  • Kocaeli Üniversitesi Adresli: Evet

Özet

In this study, we investigate multilingual and multiclass sentiment classification by analyzing datasets in Turkish, English, and Italian. The proposed approach consists of three main stages: sentence representation extraction, classification, and performance evaluation. First, sentence representations were extracted from these datasets using the distiluse-base-multilingual-cased-v1, sentence-transformers/LaBSE, and Alibaba-NLP/gte-multilingual-base models. These representations were then used as input for Logistic Regression (LR), Support Vector Machines (SVM), Random Forest (RF), and Naive Bayes (NB). Additionally, fine-tuned BERT-base-multilingual-cased and GPT-4o-mini were directly employed as end-to-end models. The classification performance of the models was evaluated using accuracy, F1-score, precision, and recall. Additionally, a confusion matrix analysis was conducted for each dataset to examine classification performance in detail. The experimental results show the influence of embedding models and learning algorithms on multilingual multi-class sentiment classification.