The Effectiveness of Homogenous Ensemble Classifiers for Turkish and English Texts

International Symposium on Innovations in Intelligent Systems and Applications (INISTA), Sinaia, Romanya, 2 - 05 Ağustos 2016

Yayın Türü: Bildiri / Tam Metin Bildiri
Cilt numarası:
Doi Numarası: 10.1109/inista.2016.7571854
Basıldığı Şehir: Sinaia
Basıldığı Ülke: Romanya
Kocaeli Üniversitesi Adresli: Evet

Özet

Text categorization has become more and more popular and important problem day by day because of the large proliferation of documents in many fields. To come up with this problem, several machine learning techniques are used for categorization such as naive Bayes, support vector machines, artificial neural networks, etc. In this study, we concentrate on ensemble of multiple classifiers instead of using only a single one. We perform a comparative analysis of the impact of the ensemble techniques for text categorization domain. To carry out this, the same type of base classifiers but diversified training sets are used which is referred as homogenous ensembles. In order to diversify the training dataset, various ensemble algorithms are utilized such as Bagging, Boosting, Random Subspace and Random Forest. Multivariate Bernoulli Naive Bayes is preferred as a base classifier due to its superior classification performance compared to the success of the other single classifiers. A wide range of comparative and extensive empirical studies are conducted on four widely-used datasets in text categorization domain in both Turkish and English. Finally, the effectiveness of ensemble algorithms is discussed.