Improve offensive language detection with ensemble classifiers

Ekinci, Ekin; Omurca, SEVİNÇ; Sevim, Semih

doi:10.18201/ijisae.2020261592

Improve offensive language detection with ensemble classifiers

Atıf İçin Kopyala

Ekinci E., Omurca S., Sevim S.

International Journal of Intelligent Systems and Applications in Engineering, cilt.8, sa.2, ss.109-115, 2020 (Scopus)

Yayın Türü: Makale / Tam Makale
Cilt numarası: 8 Sayı: 2
Basım Tarihi: 2020
Doi Numarası: 10.18201/ijisae.2020261592
Dergi Adı: International Journal of Intelligent Systems and Applications in Engineering
Derginin Tarandığı İndeksler: Scopus, TR DİZİN (ULAKBİM)
Sayfa Sayıları: ss.109-115
Kocaeli Üniversitesi Adresli: Evet

Özet

© 2020, Ismail Saritas. All rights reserved.Sharing content easily on social media has become an important communication choice in the world we live. However, in addition to the conveniences it provides, some problems have been emerged because content sharing is not bounded by predefined rules. Consequently, offensive language has become a big problem for both social media and its users. In this article, it is aimed to detect offensive language in short text messages on Twitter. Since short texts do not contain sufficient statistical information, they have some drawbacks. To cope with these drawbacks of the short texts, semantic word expansion based on concept and word-embedding vectors are proposed. Then for classification task, decision tree and decision tree based ensemble classifiers such as Adaptive Boosting, Bootstrap Aggregating, Random Forest, Extremely Randomized Decision Tree and Extreme Gradient Boosting algorithms are used. Also the imbalanced dataset problem is solved by oversampling. Experiments on datasets have shown that the extremely randomized trees which takes word-embedding vectors as input are the most successful with an F-score of 85.66%.