N-Gram Pattern Recognition using Multivariate-Bernoulli Model with Smoothing Methods for Text Classification

24th Signal Processing and Communication Application Conference (SIU), Zonguldak, Türkiye, 16 - 19 Mayıs 2016, ss.597-600

Yayın Türü: Bildiri / Tam Metin Bildiri
Cilt numarası:
Doi Numarası: 10.1109/siu.2016.7495811
Basıldığı Şehir: Zonguldak
Basıldığı Ülke: Türkiye
Sayfa Sayıları: ss.597-600
Anahtar Kelimeler: Naive Bayes, text pattern recognition, smoothing methods, classification, n-gram, LANGUAGE, PROBABILITIES
Kocaeli Üniversitesi Adresli: Hayır

Özet

In this paper, we mainly study on n-gram models on text classification domain. In order to measure impact of n-gram models on the classification performance, we carry out Naive Bayes classifier with various smoothing methods. Naive Bayes classifier has generally used two main event models for text classification which are Bernoulli and multinomial models. Researchers usually address multinomial model and Laplace smoothing on text classification and similar domains. The objective of this study is to demonstrate the classification performance of event models of Naive Bayes by analyzing both event models with different smoothing methods and using n-gram models from a different perspective. In order to find various patterns between two event models, we carry on experiments a large Turkish dataset. Experiment results indicate that Bernoulli event model with an appropriate smoothing method can outperform on most of the n-gram models.