Extended Feature Spaces Based Classifier Ensembles for Sentiment Analysis of Short Texts


Creative Commons License

Kilimci Z. H., Omurca S.

INFORMATION TECHNOLOGY AND CONTROL, cilt.47, sa.3, ss.457-470, 2018 (SCI-Expanded) identifier identifier

  • Yayın Türü: Makale / Tam Makale
  • Cilt numarası: 47 Sayı: 3
  • Basım Tarihi: 2018
  • Doi Numarası: 10.5755/j01.itc.47.3.20935
  • Dergi Adı: INFORMATION TECHNOLOGY AND CONTROL
  • Derginin Tarandığı İndeksler: Science Citation Index Expanded (SCI-EXPANDED), Scopus
  • Sayfa Sayıları: ss.457-470
  • Anahtar Kelimeler: Word embedding, ant colony optimization, information gain, sentiment analysis, classifier ensembles, extended spaces, RANDOM SUBSPACE METHOD
  • Kocaeli Üniversitesi Adresli: Evet

Özet

Sentiment classification has become very popular to analyze opinions about events, products, and so on, especially for social networks such as Twitter. Due to the size limitation of expressing ideas on social networks, the classification performance needs to be boosted by proposing various techniques. In this work, the enhancement of feature space with word embedding based features is proposed to deal with the size limitation issues and the classification success of sentiment analysis is improved by employing classifier ensembles. The contributions of this paper are fivefold. First, the representative capabilities of features are enriched by using a semantic word embedding model and followingly the conventional feature selection techniques are compared. Second, traditional machine learning algorithms, namely naive Bayes, support vector machine, and random forest are carried out to select baseline classifier for the proposed ensemble system. Third, three ensemble strategies namely, bagging, boosting, and random subspace are introduced to ensure the diversity of ensemble learning. Fourth, experiments are conducted to compare the performance of the models with the word embedding baseline. Eventually, a wide range of comparative experiments on Twitter datasets demonstrate that the classification performance of the proposed model significantly outperforms the state-of-the-art studies.