Extended Feature Spaces Based Classifier Ensembles for Sentiment Analysis of Short Texts

Kilimci, ZEYNEP; Omurca, SEVİNÇ

doi:10.5755/j01.itc.47.3.20935

Extended Feature Spaces Based Classifier Ensembles for Sentiment Analysis of Short Texts

Kilimci Z. H., Omurca S.

INFORMATION TECHNOLOGY AND CONTROL, cilt.47, sa.3, ss.457-470, 2018 (SCI-Expanded, Scopus)

Yayın Türü: Makale / Tam Makale
Cilt numarası: 47 Sayı: 3
Basım Tarihi: 2018
Doi Numarası: 10.5755/j01.itc.47.3.20935
Dergi Adı: INFORMATION TECHNOLOGY AND CONTROL
Derginin Tarandığı İndeksler: Science Citation Index Expanded (SCI-EXPANDED), Scopus
Sayfa Sayıları: ss.457-470
Anahtar Kelimeler: Word embedding, ant colony optimization, information gain, sentiment analysis, classifier ensembles, extended spaces, RANDOM SUBSPACE METHOD
Açık Arşiv Koleksiyonu: AVESİS Açık Erişim Koleksiyonu
Kocaeli Üniversitesi Adresli: Evet

Özet

Sentiment classification has become very popular to analyze opinions about events, products, and so on, especially for social networks such as Twitter. Due to the size limitation of expressing ideas on social networks, the classification performance needs to be boosted by proposing various techniques. In this work, the enhancement of feature space with word embedding based features is proposed to deal with the size limitation issues and the classification success of sentiment analysis is improved by employing classifier ensembles. The contributions of this paper are fivefold. First, the representative capabilities of features are enriched by using a semantic word embedding model and followingly the conventional feature selection techniques are compared. Second, traditional machine learning algorithms, namely naive Bayes, support vector machine, and random forest are carried out to select baseline classifier for the proposed ensemble system. Third, three ensemble strategies namely, bagging, boosting, and random subspace are introduced to ensure the diversity of ensemble learning. Fourth, experiments are conducted to compare the performance of the models with the word embedding baseline. Eventually, a wide range of comparative experiments on Twitter datasets demonstrate that the classification performance of the proposed model significantly outperforms the state-of-the-art studies.